An entropic associative memory

Natural memories are associative, declarative and distributed, and memory retrieval is a constructive operation. In addition, cues of objects that are not contained in the memory are rejected directly. Symbolic computing memories resemble natural memories in their declarative character, and information can be stored and recovered explicitly; however, they are reproductive rather than constructive, and lack the associative and distributed properties. Sub-symbolic memories developed within the connectionist or artificial neural networks paradigm are associative and distributed, but lack the declarative property, the capability of rejecting objects that are not included in the memory, and memory retrieval is also reproductive. In this paper we present a memory model that sustains the five properties of natural memories. We use Relational-Indeterminate Computing to model associative memory registers that hold distributed representations of individual objects. This mode of computing has an intrinsic computing entropy which measures the indeterminacy of representations. This parameter determines the operational characteristics of the memory. Associative registers are embedded in an architecture that maps concrete images expressed in modality specific buffers into abstract representations and vice versa. The framework has been used to model a visual memory holding the representations of hand-written digits. The system has been tested with a set of memory recognition and retrieval experiments with complete and severely occluded images. The results show that there is a range of entropy values, not too low and not too high, in which associative memory registers have a satisfactory performance. The experiments were implemented in a simulation using a standard computer with a GPU, but a parallel architecture may be built where the memory operations would take a very reduced number of computing steps.


Associative Memory
Natural memories of humans and other animals with a developed enough neural system are associative [1].An image, a word or an odor can start a chain of remembrances on the basis of their meanings or contents.Natural memories contrast strongly with standard computer memories in that the latter consists of place-holders -containing strings of symbols that are interpreted as representations-that are accessed through their addresses.Computational models of associative memories have been extremely difficult to create within the symbolic paradigm, and although there have been important attempts using semantic networks since very early [2], and production systems more recently [3], practical symbolic associative memories are still lacking.This limitation was one of the original motivations for the parallel distributed processing program, including connectionist systems and neural networks [4], which questioned explicitly the capability of Turing Machines to properly address associative memories, among other high level cognitive functions. 4The subject has been one main subject matter within artificial neural networks and there have been very influential proposals, such as Hopfield's model [5] or the Bidirectional Associative Memory [6].There have been also attempts to represent semantic networks through neural networks [7].However, neural networks cannot hold symbolic or structured information, and associative memories in this paradigm are rather transfer functions mapping inputs into outputs for classification and prediction among other similar tasks.In a recent model presented by Graves et al. [8,9] the output of the neural network is a vector that is read and written on external tables that can be accessed by location and content, and some symbolic procedures can be learned, but information cannot be stored and/or recovered from a key or a cue, and the contention that such systems are not proper declarative memories still holds [10].
In this paper we address such limitation and present an associative memory mechanism constituted by a number of associative memory registers that hold distributed representations of objects, and yet these can be registered, recognized and retrieveed on the basis of a cue, and the recovered objects can be expressed declaratively and interpreted as symbolic information.

Relational-Indeterminate Computing
The present associative memory systems is defined with a novel mode of computing that is referred to here as Relational-Indeterminate Computing (RIC) [11,12].The basic object of computing in this mode is the mathematical relation, such that an object in the domain may be related to several objects in the codomain.The specification is presented by Pineda [11] as follows: 5Let the sets A = {a 1 , ..., a n } and V = {v 1 , ..., n m }, of cardinalities n and m, be the domain and the codomain of a finite relation r : A → V .The objects in the domain and codomain are referred to here as the arguments and the values respectively.For purposes of notation, for any relation r we define a function R : A × V → {0, 1} -the relation in lower case and the function in upper case letters-such that R(a i , v j ) = 1 or true if the argument a i is related to the value v j in r, and R(a i , v j ) = 0 or false otherwise.
In this formalism, evaluating a relation is construed as selecting randomly one among the values associated to the given argument.In the same way that "f (a i ) = v j " is interpreted as stating that the value of the function f for the argument a i is v j , "r(a i ) = v j " states that the value of the relation r for the argument a i is an object v j that is selected randomly -with an appropriate distribution-among the values for which R(a i , v j ) is true.
RIC has three basic operations: abstraction, containment and reduction.Let r f and r a be two arbitrary relations from A to V , and f a be a function with the same domain and codomain.The operations are defined as follows: for all a i ∈ A and v j ∈ V (i.e., material implication), and false otherwise.
all a i , where the random distribution is centered around f a , as elaborated below.If η(f a , r f ) does not hold, β(f a , r f ) is undefined -i.e., f v (a i ) is undefined-for all a i .Abstraction is a construction operation that produces the union of two relations.A function is a relation and can be an input to the abstraction operation.
Any relation can be constructed out of the incremental abstraction of an appropriate set of functions.The construction can be pictured graphically by overlapping the graphical representation of the included functions on an empty table, such that the columns correspond to the arguments, the rows to the values and the functional relation is depicted by a mark in the intersecting cells.
The containment operation verifies whether all the values associated to an argument a i in r a are associated to the same argument in r f for all the arguments, such that r a ⊆ r f .The containment relation is false only in case The set of functions that are contained in a relation, which is referred to here as the constituent functions, may be larger than the set used in its construction.
The constituent functions are the combinations that can be formed by taking one value among the ones that the relation assigns to an argument, for all the arguments.The table format allows to perform the abstraction operation by direct manipulation and the containment test by inspection.The construction consists on forming a function by taking a value corresponding to a marked cell of each column, for all values and for all columns.The containment is carried on by verifying whether the table representing the function is contained within the table representing the relation by testing all the corresponding cells through material implication.
For this, the abstraction operation and the containment test are productive.This is analogous to the generalization power of standard supervised machine learning algorithms that recognize not only the objects included in the training set but also other objects that are similar enough to the objects in such set.
Reduction is the functional application operation.If the argument function f a is contained in the relation r f , reduction generates a new function such that its value for each of its arguments is selected from the values in the relation r f for the same argument.In the basic case, the selection function is the identity function -i.e., β(f a , r f ) = f a .However, β is a constructive operation such that the argument function f a is the cue for another function recovered from r f , such that v j is selected from {v j |(a i , v j ) ∈ r f } using and appropriate random distribution function centered around f a (a i ).If f a is not contained in r f the value of such functional application operation is not defined.
Relations have an associated entropy, which is defined here as the average indeterminacy of the relation r.Let µ i be the number of values assigned to the argument a i in r; let ν i = 1/µ i and n the number of arguments in the domain.In case r is partial, we define ν i = 1 for all a i not included in r.The computational entropy e(r) -or the entropy of a relation-is defined here as: A function is a relation that has at most one value for any of its arguments, and its entropy is zero.Partial functions do not have a value for all the arguments, but this is fully determined and the entropy of partial functions is also zero.

Table Computing
The implementation of RIC in table format is referred to as , where j and k may be equal.This corresponds to the standard assignment operator of imperative programming languages.The machine also includes the conditional operator if relating a condition pred to the operations op 1 and op 2 -i.e., if pred then op 1 else op 2 , where op 2 is optional.The initialization of a register R such that all its cells are set to 0 or to 1 is denoted R ← 0 or R ← 1 respectively, and R ← f denotes that a function f : A → V is input into the register R. The system also includes the operators λ, η and β for computing the corresponding operations.These are all the operations in table computing.
as follows: The recognition of an object o ∈ O k -or of class K-represented by the function f , is performed by the algorithm Memory Recognize as follows: The retrieve or recovery of an object is performed through the reduction operation.This is a constructive operation that recovers the content of the memory in the basis of the cue.The procedure is as follows: The standard input configuration of this machine specifies that the auxiliary register contains the function to be registered or recognized, or the cue of a retrieve, at the initial state of the corresponding memory operation.This condition is enforced at step (1) of the three memory procedures.The standard output specifies the content of the auxiliary register when the computation is ended.It is 0 when the memory register operation is ended and when the cue is rejected in a retrieve operation, and 1 if recognition is successful; otherwise the object to be recognized is rejected and if the retrieve operation is successful it contains the function recovered out of the cue.The interpretation conventions state that the content of the associative register is interpreted as an abstract concept, and the content of the input auxiliary register is interpreted as a concrete concept -the concept of an individual object.

Architecture
Table Computing was used to implement a novel associative memory system.
The conceptual architecture is illustrated in Figure 1.The main components are: and m ≥ 0 with their corresponding Auxiliary Registers (Aux.Reg.) each having a Control I/O unit.
• A central control unit sending the operation to be performed to all AMRs, and receiving the final status of the operation (i.e., whether it was successful or not) and the entropy of each AMR (not shown in the diagram).
• A bus with n tracks, each representing a characteristic or feature, with its corresponding value: a binary number from 0 to 2 m − 1 at a particular time.
• An input processing unit constituted by: -A input modal pixel buffer with the concrete representation of images produced by the observations made by the computing agent directly.
For instance, the input buffer can contain k pixels with 256 gray levels represented by integers from 0 to 255.
-An analysis module mapping concrete representations to abstract modality-independent representations constituted by n characteristics with their corresponding real values.
-A quantizing and sampling module mapping the real values of the n characteristics into 2 m levels, represented by binary digits of length m, which are written on the corresponding track of the bus.
• An output processing unit constituted by: -A Digital/Real conversion module that maps binary numbers in a track of the bus to real numbers, for the n tracks.
-A synthesis module mapping abstract modality-independent representations constituted by n characteristics with their corresponding values into concrete representations with k pixels with their values.
-An output modal buffer with the concrete representation of images produced by the synthesis module.The contents of the output buffer are rendered by an appropriate device, and constitute the actions of the system.
The bus holds the representations of functions with domain A = {a 1 , ..., a n } and range V = {v 1 , ..., v n } where 0 ≤ v j ≤ 2 m − 1 that represent the objects that are stored, recognized or recovered from the AMRs by the corresponding operations.Objects are placed on the bus through an input protocol; and recovered objects from the AMR are rendered through an output protocol, as follows: • Input Protocol : 1. Sense the object o i and place it in the input modal buffer; 2. Produce its abstract representation f i through the analysis module; The core operations of the memory register, memory recognition and memory retrieve algorithms are carried on directly on the AMRs and their corresponding auxiliary registers in two or three computing steps -i.e., the operations λ, η and β in addition to the corresponding assignment operations.The The memory operations use the input and output protocols, and are performed as follows: • M emory Register(f i , R k ): The register AM R k is set on, and the remaining AMRs are set off; the input protocol is performed; the M emory Register operation is performed.
• M emory Recognize(f i , R k ): All AMRs are set on; the input protocol is performed; the M emory Recognize operation is performed; all AMRs send its status and entropy to the control unit; if no AMR's recognition operation is successful the object is rejected.
• M emory Retrieve(f i , R k ): All AMRs are set on; the input protocol is performed; the M emory Retrieve operation is performed; all AMRs send its status and entropy to the control unit; all AMRs but the selected one are set off; the output protocol is executed; the recovered object is placed on the output buffer.

Analysis and Synthesis
The Analysis module maps the concrete information that is sensed from the environment and placed in the input buffer -where the characteristics stand for external signals-into the abstract representation that characterizes the membership of the object within a class.Both concrete and abstract representations are expressed as functions but while in the former case the arguments have a spatial interpretation -such as the pixels of an image-in the latter the functions stand for modality-independent information.
The analysis module in the present architecture is constituted by a neural network with three convolutional layers [14].The training phase was configured by adding a classifier, which was a fully connected neural network (FCNN) with two layers.The analysis module was trained in a standard supervised manner with back-propagation, as illustrated in Figure 2.
Once the analysis module has been trained the FCNN is removed and the objects in the input buffer can be mapped into their corresponding representation as a set of abstract features through a bottom-up "analysis operation".The information is feed into the AMRs through the output of the convolutional layer.
The purpose of the analysis module is to map concrete images into abstract representations, but not to perform classification.The diagram shows the case in which the input has 784 inputs, corresponding to a pixel buffer of size 28 × 28, each taking one out of 256 gray levels, while its output is a function with 64 arguments with real values.
The objects recovered from the AMRs are mapped into the corresponding concrete representations and placed in the output buffer by the synthesis module.This consists of a transposed convolutional network that computes the inverse function of the input convolutional network.The two neural networks together conform what is known as a convolutional autoencoder [15,16].
If the function produced by the analysis module is input directly into the synthesis module, the image in the output buffer should be the same as the one originally placed in the input buffer.However, the transposed convulational network was trained independently and the recovered image is slightly different.

A Visual Memory for Hand Written Digits
The associative memory system was simulated through the construction of a visual memory for storing distributed representations of hand written digits from "0" to "9".The system was built and tested using the MNIST database. 6n this resource each digit is defined as a 28 x 28 pixel array with 256 gray levels.
There are 70,000 instances available.The instances of the ten digit types are mostly balanced.same practically for m ≤ 4. The reason of this decrease is that when the size of the AMR is small, there is a large number of false positives, and several AMRs different from the right one may accept the object; however, one register must be selected for the memory retrieve operation, and there is no information to decide which one.This decision was made using the AMR with the minimal entropy, although this choice does not improve over a random choice using a normal distribution.
The average number of accepting AMRs for each instance per AMR size is shown in Figure 3 (c).As can be seen this number goes from 10 for AMRs with one row to 1 for AMRs with 8 and 16 rows, where the precision is very high because every AMR recognizes only one instance in average.This effect is further illustrated in Figure 3 (d).

Experiment 2
In this experiment each associative register holds the representation of two different digits: "0" and "1", "2" and "3", "4" and "5", "6" and "7" and "8" and that it is possible to create associative memories holding overlapped distributed representations of more than one individual object, that have a satisfactory performance.

Experiment 3
The purpose of this experiment was to investigate the performance of an AMR with satisfactory operational characteristics in relation to its entropy or information content.Experiment 1 shows that AMRs with sizes 64 × 32 and 64 × 64 satisfy this requirement.As their performance are practically the same, the smallest one was chosen for a basic economy criteria.
The AMRs were filled up with varying proportions of the RemCorpus -1 %, 2 %, 4 %, 8 %, 16 %, 32 %, 64 % and 100 %-as shown in Figure 5.The entropy increases according to the amount of remembered data, as expected.Precision is very high for very low entropy values and it decreases slightly when the entropy is increased, although it remains very high when the whole of RemCorpus is considered.Recall, on its part, is very low for very low levels of entropy but increases very rapidly when the AMR is filled up with more data.

Experiment 4
The final experiment consists on assessing the similarity of the objects retrieved from the memory out of a cue.In the basic scenario, if the cue matches perfectly the recovered object, the image in the output buffer should be the same as the image placed in the input.However, memory retrieve is a constructive operation that renders a novel object which may be somewhat different than the cue.The object is constructed by the β operation, as described above.In the present experiment a random triangular distribution is used for selecting the values of the arguments of the retrieved object out of the potential values of the AMR for the corresponding arguments.
The hypothesis is that the increase of memory recall goes in hand with higher entropy, but the space of indeterminacy of the AMR impacts negatively in the resemblance or similarity between the cue and the retrieved object.Figure The results of this experiment are shown in Figure 6.The first row contains the cue for the retrieval operation for the ten digits; the second is the decoded image, which corresponds to the one resulting from synthesizing the output of the analysis module directly.This is equivalent to specify β as the identity function -although such choice would remove the constructive aspect of the memory retrieve operation, and memory recognition and memory retrieve would amount to the same operation.The decoded image is very similar to the cue, but it is not an exact copy.The synthesis module should compute the inverse function of the one computed by the analysis one but the convolutional and the transposed networks are trained independently, and this is only an approximation.
The remaining images, from top to bottom, correspond to the retrieved objects for the nine levels of the RemCorpus that are considered (the codified image corresponds to e = 0).The rows for the ten digits suggest that the highest similarity is achieved when the entropy is very low.
The overall behavior of the system suggests that the AMRs are very tolerant for memory recognition, but very restrictive for retrieving objects.This is consistent to the general intuition that recognizing objects in memory is much easier than retrieving them.

Experimental Setting
The programming for all the experiments was carried out in Python 3.8 on the Anaconda distribution.The neural networks were implemented with TensorFlow 2.3.0, and most of the graphs produced using Matplotlib.The experiments were run on an Alienware Aurora R5 with an Intel Core i7-6700 Processor, 16 GBytes of RAM and NVIDIA GeForce GTX 1080 graphics card.
The images shown in Figure 6 were selected one column per experimental run, while the criteria for selection was, when possible, to have some resemblance to the corresponding digit in the first (1%) and last (100%) stage.
Regarding the neural networks, the classifier and the decoder were trained separately.Firstly, the classifier was trained (convolutional section plus fully connected section) using MNIST data.Secondly, the fully connected layers were removed from the classifier and the features for all images in MNIST were generated and stored using the trained convolutional part only.Thirdly, the decoder (transposed convolutional) was trained using the features as input data and the original images as labels.Finally, the trained decoder was used to generate the images from the features produced by the memory retrieve algorithm in Experiment 4.

Discussion
In this paper a memory system that is associative and distributed but declarative is presented.Individual instances of represented objects are characterized at three different levels: i) as concrete modality-specific representations in the input and output buffers -which can be sensed or rendered directly-or, alternatively, as functions from pixels to values; ii) as abstract modality-independent representations in a space of characteristics, which are functions standing in a one-to-one relation to their corresponding concrete representations in the first level; and iii) as distributed representations holding the disjunctive abstraction of a number, possibly large, of instance objects expressed in the second level.
The first level can be considered as declarative and symbolic; the second is input and output processes in terms of the indeterminacy of the computing objects.The analysis and synthesis modules compute a function whose domain and range are sets of functions, and these processes are fully determined: provide the same value for the same argument always.Hence, these are zero entropy computations.However, the distributed representations stored in the memory registers have a degree of indeterminacy, which is measured by the computing entropy.
The entropy is a parameter of the system performance, as can be seen in the four experiments.First, it measures the operational range of the associative registers, as shown in experiments 1 and 2. If the entropy is too low precision and recall are low overall, but if it is too high, recall is also diminished.However, there is an entropy level in which both precision and recall are pretty satisfactory.The experiments showed that memory registers with sizes of 64 × 32 and 64 × 64, with entropy of 3.1 and 3.8 respectively, have satisfactory operational characteristics.The smaller register was chosen for the further experiments due to basic economy considerations.
The experiment 2 shows that a single associative memory register can hold the distributed representation of more than one object.The cost is that the entropy is increased, and larger registers with a large amount of information are required for the construction of operational memories.However, this functionality is essential, both for the construction of higher abstractions, and possibly for the definition of composite concepts.This is also left for further work.The point to stress here is that the measure involved is again the entropy.
The experiment 3 addressed the question of what is the amount of information and the level of entropy that is required for effective memory recognition and retrieval, given an operational memory register.The results show that recognition precision is very high regardless the amount of information that is feed into the memory register.Hence, whenever something is accepted one can be pretty sure that it belongs to the corresponding class.However, recognition recall is very low for low levels of entropy but becomes very high even with a moderate amount of information.Again, for the present domain, the register is useful for entropies of around 3.4.However, if the information is increased with noise, the entropy has a very large value, and although recognition recall will not decrease, the information is confused and recognition precision will lower significantly.Hence, there is again a range of entropy, not too low and not too high, in which the amount of information is rich, and the memory is effective.
The experiment 4 asked the question of how similar are the objects recovered from the memory retrieval operation to the cue or key used as the retrieval descriptor.The results show that high similarity is only achieved for very low levels of entropy.In the basic case, when the entropy is zero, the retrieved object is the same as the cue, and memory recognition and memory retrieval are not distinguished.This corresponds to Ramdom Access Memories (RAM) of standard digital computers, where the content of a RAM register is "copied" but not really extracted or recovered in a memory read operation.
Natural memories are constructive in the sense that the memory retrieve operation renders a genuine novel object.This is the reason to define the β operator using a random distribution.Whenever the cue is accepted the retrieval operation selects an object whose representation is within the relation's constituent functions.The retrieved object may or may not have been registered explicitly, but it is the product of a construction operation always.
However, the similarity experiment showed that high resemblance between the cue and the recovered object is only achieved when the entropy has very low values.If the entropy is zero, the retrieved object is a "photographic copy" -Figure 6 shows some distortions, but these are due to the disparity between the analysis and synthesis modules.The reconstructions resemble well the cue for entropy around 2 -using only 1% or 2% of the test corpus, but then on the similarity is quite random.This result suggests that although there is flexibility in memory recognition, memory retrieval is quite constrained, and hence a much harder operation.The entropy range also suggests that retrieval goes from "photographic" to "recovered objects" to "imaged objects" to noise.Once again, operational memories have an entropy range in which the entropy is not too low and not too high.This pattern seems to be very general and is referred to as The Entropy Trade-off [11].
The study of memory mechanisms for storing the representations of individual objects is central to cognition and computer applications, such as information systems and robotics.The sense data may be presented to cognition in a natural format in which spatial and temporal information may be accessed directly, but may be stored as a highly abstract modality-independent representation.Such representations may be retrieveed directly by perception in the production of interpretations, by thought in decision making and planning, and by the motricity or motor module when motor abilities are deployed.
Associative memory mechanisms should support long term memory, both episodic and semantic, and may be accessed on demand by working memory.
Such devices may be used in the construction of episodic memories and composite concepts that may be stored in associative memory registers themselves, or in higher level structures that rely on basic memory units.Associative memory models may be essential for the construction of lexicons, encyclopedic memories, and modality-specific memories, such as faces, prototypical shapes or voices, both in cognitive studies and applications.
The present investigation addressed the design and construction of the full associative memory system for a simple domain, and the current result can be seen as a proof of concept.We left the investigation of larger and more realistic domains for further work.
The present investigation addressed also the case in which images where complete objects; however, this is only the basic condition, as interpretation is often performed in noisy environments and with incomplete information.For instance, the objects may be partially covered and/or seen from different perspectives.The cues or descriptors generated in such conditions would be much poorer information, memory recognition would be harder, and the objects recovered from memory would involve larger reconstructions than the present case.
The investigation of the associative memory system for incomplete information is also left for further research.

Let K be a
class, O k a set of objects of class K, and F O a set of functions with n arguments and m values, such that each function f i ∈ F O represents a concrete instance o i ∈ O k in terms of n features, each associated to one of m possible discrete values.The function f i may be partial -i.e., some features may have no value.Let R k be an associative memory register and R k−i/o an auxiliary input and output register, both of size n × m.The distributed representation of the objects O k is created by the algorithm Memory Register

3 . 1 .
Produce its corresponding quantized representation and write it on the bus; 4. Write the value of all n arguments diagrammatically into the corresponding row of the auxiliary register R k−i/o ; • Output Protocol : Write the content of the auxiliary register R k−i/o on the bus as binary numbers with m digits for all n arguments; 2. Transform the digital values of the n characteristics into real values; 3. Generate the concrete representation of the object and place it on the output buffer through the synthesis module.

Figure 2 :
Figure 2: Training the Analysis Module

Figure 6 :
Figure 6: Similarity between the cue and the recovered digits as a function of the entropy

Table Computing [ 12
].The representation consists of a set of tables with n columns and m rows, where each table is an Associative Register that contains a relation between a set of arguments A = {a 1 , ..., a n } and a set of values domain {a 1 , ..., a 64 } where each argument a i is mapped to a real value v i -i.e.f i (a i ) = v j .The values are quantized in 2 m levels.The tables or associative registers have sizes of 64 × 2 m .The parameter m determines the granularity of the table.The experiments one and two were performed with granularities