Simulation platform for pattern recognition based on reservoir computing with memristor networks

Memristive systems and devices are potentially available for implementing reservoir computing (RC) systems applied to pattern recognition. However, the computational ability of memristive RC systems depends on intertwined factors such as system architectures and physical properties of memristive elements, which complicates identifying the key factor for system performance. Here we develop a simulation platform for RC with memristor device networks, which enables testing different system designs for performance improvement. Numerical simulations show that the memristor-network-based RC systems can yield high computational performance comparable to that of state-of-the-art methods in three time series classification tasks. We demonstrate that the excellent and robust computation under device-to-device variability can be achieved by appropriately setting network structures, nonlinearity of memristors, and pre/post-processing, which increases the potential for reliable computation with unreliable component devices. Our results contribute to an establishment of a design guide for memristive reservoirs toward the realization of energy-efficient machine learning hardware.


Introduction
Machine learning (ML) has been becoming a vital technology in many industries for promoting artificial intelligence (AI) during the last decade.For further penetration and expansion of practical applications based on ML methods, it is often demanded to enhance their computational efficiency by reducing computational time and resources while maintaining desired performance.Reservoir computing (RC) is one of the ML frameworks that can meet such a demand [1][2][3][4] .An RC system is normally composed of an unadaptable dynamic reservoir for transforming input time series data (or sequential data) into a high-dimensional feature space and a trainable readout for performing a pattern analysis with a simple learning algorithm.The reservoir needs to be well designed for achieving high computational performance in exchange for the simplicity and speediness of the learning process in the readout.For reservoirs constructed with recurrent neural networks [5][6][7] , there are some practical design guides which are mainly helpful in software computation 8,9 .
In the exploration of hardware implementation of the adaptability-free reservoirs, various physical reservoirs have been developed based on electronics, photonics, spintronics, mechanics, material engineering, robotics, and biology 10 .A unified viewpoint can be obtained by categorizing the reservoir architectures into the network type [11][12][13][14] , the single nonlinear node plus a delayed-feedback type [15][16][17][18] , and the excitable continuous medium type [19][20][21] .However, it is still challenging to derive a design guide for each type of physical reservoir.This is partly because of a difficulty in comprehensively understanding how the computational performance of physical RC systems depends on possible influential factors, such as the system architecture, the physical characteristics of system components, and the signal processing method.
In this study, we tackle the above-mentioned issue by focusing on memristive reservoirs.Memristive systems and devices (or memristors) 22,23 are suitable for constructing physical reservoirs, because their dynamics can show inherent nonlinearity and their resistance, called memristance, can be timevarying based on the history of an applied voltage signal.The nonlinearity and the input history-dependent reaction of memristors are favorable in solving linearly inseparable problems with time series data 3 .Previous studies have demonstrated the potential of memristive reservoirs for temporal pattern recognition, which can be mainly divided into two types: memristor networks 14,[24][25][26] and memristor arrays [27][28][29][30] .Memristor networks use complex dynamic behavior of interacting memristors as a reservoir state, whereas memristor arrays leverage a set of nonlinear responses of independent memristors with optional delayed feedback loops.Our target in this study is the network-type memristive reservoirs which are more difficult to design and control compared with the array-type ones.We aim to develop a systematic approach for examining the effects of system components, such as the network architecture and the nonlinear characteristics of memristors, on the computational performance of the memristor-network-based RC systems toward establishing a practical design guide and facilitating their hardware implementation.
We mathematically formulate a general system of memristor networks and develop a simulation platform for performing temporal pattern classification in the RC framework.

arXiv:2112.00248v2 [cs.ET] 19 Jun 2022
In temporal classification tasks, the dataset is given as a set of multiple time series data and the corresponding class labels.Our purpose is to construct a pattern classifier having a high generalization ability by using the memristor-networkbased RC system, which can well predict the true class label even for unknown time series data after learning.With our simulation platform, we can examine the effects of various system factors on the computational performance of the memristive RC systems.Numerical experiments are conducted to evaluate the classification performance of the RC systems composed of simple memristor models under different system conditions for three temporal pattern classification tasks: the waveform classification 31 , the electrocardiogram (ECG) classification 32 , and the spoken digit recognition 33 .In the waveform classification task with sine and triangular waves, a perfect classification is achieved at the best conditions.In the ECG classification task with normal and abnormal patterns, the classification accuracy reaches a maximum of 86%.In the spoken digit recognition task with the TI-46 Word corpus, the best classification accuracy is 97.3%.These classification accuracies are comparable to those obtained by state-of-the-art ML methods.The results suggest that the RC systems with memristor networks are very promising as a building block of next-generation ML and AI hardware.

RC systems with memristor networks
A physical RC system with a memristor network is illustrated in Fig. 1a, which consists of a preprocessing part, an input part, a reservoir part, and a readout part.First, a given time series data is converted to a voltage signal in a preprocessing step, such as appropriate scaling and masking, depending on the type of data.Then, this voltage signal is fed into the voltage source of the memristor-network-based reservoir.A dynamic response of the reservoir to the input signal is obtained as time evolutions of electric currents flowing through the individual memristors.In the readout part, the current signals are converted to a matrix through a collection of reservoir states and an optional postprocessing step, and then, used to produce a system output.Only the output weight matrix W out is trainable, which is optimized by linear regression so as to minimize the error between the system output and the target output.We limit the pre/post-processing methods to linear matrix operations, in order to shed light on the nonlinear transformation effect of the memristive reservoir.
We consider a general memristor network consisting of N m memristors, N n circuit nodes, and N i voltage sources (see Fig. 1a with N i = 1).By regarding the circuit nodes as vertices and the memristor branches as edges, the connectivity of the memristors can be represented as a directional graph, described by an incidence matrix E m ∈ R N n ×N m .The connectivity of voltage sources can be similarly described by an incidence matrix E i ∈ R N n ×N i (see Methods section).
In this study, we assume that the individual memristors in the reservoir are described by the linear drift model 34 .When an electric voltage is applied to a metal-oxide memristor, the oxygen vacancies (i.e.dopants) drift in the device as charge carriers, and shift the boundary between a doped region with a low resistance and an undoped region with a high resistance.This is simply represented by the linear drift model composed of a series of a low resistance state (LRS) with resistance R ON and a high resistance state (HRS) with resistance R OFF ( R ON ) as illustrated in Fig. 1b (see Methods section for details).With a variation of the length of the LRS, the total memristance changes in time.The polarity of the memristor, related to its directionality, is determined depending on whether the drift of the dopants expands or contracts the LRS 35 .For a sinusoidal input voltage, the linear drift model can exhibit a pinched hysteresis loop in the I-V curve as shown in Fig. 1c.A variation in the ON/OFF resistance ratio, r = R OFF /R ON , changes the nonlinearity of the I-V characteristics.When micro/nano-scale memristor devices are fabricated, a device-to-device variation in their electrical properties is inevitable 27,[36][37][38] .Therefore, R ON and R OFF are assumed to follow a normal distribution of mean RON and ROFF , respectively (see Methods section).Figure 1d shows a normal distribution of R ON with mean RON = 100 Ω and standard deviation σ RON for different values of σ controlling the degree of variability.
Figure 1e illustrates examples of four different types of network structures tested in our numerical simulations, including a ring with unidirectional polarity (Ring-UP), a ring with random polarity (Ring-RP), a random network with unidirectional polarity (Rand-UP), and a random network with random polarity (Rand-RP).These networks are considered to clarify the effect of randomness in network connectivity and polarity.For instance, the difference between Ring-UP and Ring-RP lies in the polarities of the memristors indicated by the arrows.To avoid a disconnected graph, we construct networks of the Rand-UP and Rand-RP types by adding long-range memristor branches to those of the Ring-UP and Rand-RP types, respectively.
We explicitly formulated circuit equations of a general memristor network based on the new modified nodal analysis method 39 as follows (see Methods section for derivation): where t represents continuous time, Φ Φ Φ n (t) ∈ R N n is a vector of nodal magnetic fluxes, W (•) ∈ R N m ×N m is a diagonal matrix of memductances (memory conductances) of the memristors as a function of fluxes, j i (t) ∈ R N i is a vector of currents at the voltage sources, v n (t) ∈ R N n is a vector of nodal voltages, v i (t) ∈ R N i is a vector of voltages at the voltage sources.This set of equations with respect to the state variables (v n (t), Φ Φ Φ n (t), and j i ) are categorized into a differential-algebraic system of equations (DAEs), where differential and algebraic equations are mixed.In general, DAEs are more difficult to solve than ordinary differential equations (ODEs), because the Jacobian matrix (i.e. the first-order derivative) used for numerical integration becomes singular 40 .By setting E m , E i , v i , and W depending on the reservoir network structure and the memristor models, concrete system equations can be obtained from Eqs. ( 1)-(3) (see Methods section).In our simulation platform on MATLAB 41 , these equations are derived by symbolic computation and solved by using the DAE solver.The dynamic state of the reservoir is represented by time courses of vector j m (t) ∈ R N m of electric currents flowing through the memristors.The reservoir states are used to produce the system output in the readout processing (see Methods section).

Waveform classification
The waveform classification task has often been used as a benchmark task for evaluating the computational performance of physical RC systems 29,31,[42][43][44] .We consider a two-class classification problem with 100 sine and 100 triangular waves having the same amplitude and different frequencies.The frequency of each data was randomly generated from a uniform distribution in a certain range (see Methods section).Some data were used for training and the other data were for testing.These waveform data were converted to voltage signals and then fed into the memristor-network-based reservoir.The parameter conditions used for numerical experiments are listed in Table 1.
Figure 2 demonstrates the responses of a memristor network of the Rand-UP type to sine and triangular input voltage 3/14 Figure 3 shows the results of the waveform classification task.We tested the four network structures shown in Fig. 1b: Ring-UP with (magenta triangles), Ring-RP (green diamonds), Rand-UP (blue squares), and Rand-RP (red circles).We evaluated the classification accuracies for 10 random network realizations based on the 10-fold cross validation.Figures 3ad plot the average value over the 100 trials together with the error bar indicating the standard error.Figure 3a shows the classification accuracies when the number of training data, N train , is varied.It is remarkable that the networks of the Rand-UP and Rand-RP types achieve the 100% accuracy in-dependently of N train due to the rich variety in the reservoir states (see Fig. 2 and Supplementary Fig. 3).The performance of the Ring-UP and Ring-RP types are inferior to that of the Rand-UP and Rand-RP types, but much higher than the baseline accuracy obtained by a linear classifier.This indicates the benefit of the nonlinear signal transformation by the physical reservoir.Figure 3b shows that the classification performance is not sensitive to the change in the variability parameter σ , suggesting a variability-tolerant property of the memristive RC system.Figure 3c demonstrates that the classification accuracy monotonically increases with the number of signals, N x (≤ N m ), used for the readout.In other words, a higherdimensional reservoir state yields a better accuracy.The accuracy reaches the perfect level when N x ≥ 14. Figure 3d shows that the average ON/OFF resistance ratio r related to the nonlinearity of the I-V characteristics is a key factor influencing the classification performance, which is maximized at the intermediate values around r = 5 × 10 2 -10 3 .Figures 3e  and f show the confusion matrices when r = 50 and r = 10

ECG classification
An electrocardiogram (ECG) is an electric signal associated with heartbeats, which is often used for health checks and cardiac disease detections.In a normal state, the ECG signal shows a repetition of typical waveforms called the QRS complex 45 due to depolarization and repolarization of the membrane potentials of the cardiac muscle cells.Irregular ECG signals often correspond to abnormal cardiac behavior caused by diseases such as ischemia and myocardial infarction.An ECG-based heartbeat classification task is aimed at separating abnormal ECG signals from normal ones 32,46 .We used the ECG200 dataset formatted in the UCR Time Series Classification Archive 47 , which contains a total of 200 samples of ECG segment data, including 100 samples for training and the other 100 samples for testing.
Figure 4a shows a preprocessing step for each ECG data of length L data (= 96).The one-dimensional vector representing the original data was transformed into a 2D masked data by using a random mask of size S mask (= 50), each element of which is −1 or 1.This randomization process corresponds to a multiplication of input data by random input weights in echo state networks 3,5 .The masked data was separated into L data column-wise sequences, each of which was converted to a voltage signal and then fed into the memristor-networkbased reservoir in the order of the column index sequentially.After the injection of each sequence, we reset the reservoir state.Figure 4b demonstrates a dynamic response of the reservoir to an input time series.The N m (= 20) current signals were transformed into the sequences of length L out (= 50) for constructing a state collection matrix X in two possible ways as shown in Fig. 4c.In Case (i), the transposed matrices of size N m × L out were stacked vertically for all the inputs to yield a state collection matrix X ∈ R N m L data ×L out .In Case (ii), the N m sequences were concatenated into a one-dimensional sequence and those for all the inputs were further concatenated to form a state collection matrix (vector) X ∈ R N m L data L out ×1 .
Figure 5 shows the results of the ECG classification task.Each plot corresponds to the average accuracy over 10 random network realizations and the error bar indicates the standard error.Figure 5a plots the classification accuracies for the four types of reservoir structures combined with the postprocessing in Case (i) (monochrome marks) and Case (ii) (colored marks) when r is varied.The baseline accuracy (gray crosses) obtained by a linear classifier is 64%, meaning that all the testing data were classified as normal patterns.We can see that the performance with the postprocessing in Case (ii) are much better than those in Case (i).If the postprocessing method in Case (ii) is employed, the classification performance is kept at a high level against the variation of r, indicating the robustness against the device-to-device variation as shown in Fig. 5b.The effect of the number of signals used for the readout processing, N x (≤ N m ), on the performance is in Fig. 5c.By increasing N x , the performance is significantly improved in Case (i) but only slightly in Case (ii) (see Supplementary Fig. 6).We observe that the maximum input voltage V max significantly influences the classification performance in Case (ii) as shown in Fig. 5d.The classification accuracy reaches 86% when the network is the Rand-RP type and V max = V.This performance is comparable to that of the other ML methods, ranging between 77% and 92% .We note that the number of trained weights is 1920 in Case (i) and 96000 in Case (ii).The difference in the number of trainable weights is a potential cause of the gap in the classification performance.These results indicate that the classification performance can largely depend on the postprocessing method even when the same reservoir states are used.It is a future issue to fully understand the influence of the postprocessing on the performance, which is linked with a tradeoff between classification accuracy and computational time for learning.

Spoken digit recognition
The isolated spoken digit recognition has been widely used as a benchmark task for testing the classification ability of RC systems 10,33,43 .The sound dataset from the NIST TI-46 Word corpus 50 contains 500 sound waveform data corresponding to 10 utterances of 10 digits ("zero" to "nine"), spoken by 5 different speakers (ID: 1,2,5,6,7) 51 .The aim of this 10-class classification problem is to correctly predict the true digit from a sound signal.
Each sound data was converted to a cochleagram by using the Lyon's passive ear model after an elimination of silence periods 48 as shown in Fig. 6a (see Methods section for details).The data length L data ranges from 48 to 102.The cochleagram was transformed into a masked data with a random binary mask of size S mask × N f .Each of the N f column-wise sequences is given to the reservoir after an appropriate scaling.An example of the dynamic behavior of the reservoir is shown in Fig. 6b.The N m (=20) currents from the reservoir are transformed into the sequences of length L out (=100) by sampling.They are concatenated into a one-dimensional vector of size N m L out as shown in Fig. 6c.These column vectors are collected for all the inputs to construct a state collection matrix X ∈ R N m L out ×L data .
Figure 7 shows the results of the spoken digit recognition task.Each plot is the average classification accuracy over 10 random network realizations, evaluated based on the 10fold cross validation, and the error bar indicates the standard error.Figure 7a demonstrates the classification accuracies when the number of training data, N train , is varied.The results obtained by the memristive RC systems with S mask = 100 are indicated by the colored marks for the four different structure types, i.e.Ring-UP, Ring-RP, Rand-UP, and Rand-RP.For comparison, the performance of linear classifiers (LCs) applied to the masked data are plotted for different mask size S mask = 10, 25, 50, 100.The classification accuracies for all the tested methods are improved by increasing N train .The performance of the LCs increases with the mask size, but peaks out at around S mask = 100 (see Supplementary Fig. 7).The memristive RC systems produce better accuracies than the LC in the case of S mask = 100, which highlights the benefit brought about by the nonlinear transformation by the memris-tive reservoir 52 .The network structures of the Rand-UP and Rand-RP types yield better accuracies than those of the Ring-UP and Ring-RP types, owing to the diversity in the reservoir dynamics.Figure 7b shows that a smaller average ON/OFF ratio r, corresponding to a stronger nonlinearity of I-V characteristics on average, leads to better classification performance for the Rand-UP and Rand-RP types when N train = 450.The best accuracy reaches 97.3% when the ON/OFF ratio is r = 50 and the network type is Rand-RP, which is comparable to that achieved by other physical RC systems 10 .For a specific separation between the training and testing datasets among the 10 cases for the cross validation, we obtained 99.8% accuracy on average over 10 different memristor networks.This is the first report that the memristor networks can yield such high performance in this task.Figure 7c indicates that the classification performance is kept at a high level irrespective of a change in the variability level σ of the memristors.Figure 7d demonstrates the results when N x signals out of N m signals were used for the readout process, indicating that a higher dimensional reservoir state contributes more to increasing the classification accuracy and N x ∼ 10 is enough for obtaining 8/14 the maximum accuracy.Our analysis on the confusion matrix shown in Fig. 7e reveals that a majority of misclassifications occur when the sound data of digit 2 is incorrectly classified as digit 1 or 8.By overcoming this issue through modifications in the signal processing parts, the performance could be further improved.

Discussion
In this study, we have provided the explicit mathematical formula of the general memristor networks and developed the simulation platform for performing temporal pattern classifications with the memristor-network-based RC systems.The platform enables a considerable search of various system conditions and an identification of the key system component for improving classification performance.The results on the three classification tasks indicate that the randomness of the network connectivity (i.e.Rand-UP and Rand-RP) is favorable for generating diverse nonlinear responses to the input signal and achieving the excellent classification performance compared to the other two (i.e.Ring-UP and Ring-RP).The results have also lead to a new finding that the ON/OFF resistance ratio, controlling the nonlinearity of memristors, can have a large impact on the classification accuracy.Although the best ON/OFF ratios are different depending on the task (see Figs. 3d, 5a, and 7b), all of them (r ∼ 50 -500) are within the feasible range reported for real memristor devices 53 .
The memristor network model that we formulated is quite general, and therefore our simulation platform is easily extendable.In this study, we have used at most 20 memristors in the reservoir to test many different system designs while saving the simulation time.Tackling a more complex pattern recognition task with a larger number of memristors is one of the future works.For this purpose, it is an option to inject multiple different input signals into multiple nodes in the memristive reservoir.Any network connectivity of the memristors, including the four types investigated in this study, can be examined conveniently by setting the incidence matrices.We have assumed the ideal linear drift model for the individual memristors.It can be replaced with any other model by deriving or approximating its memductance as a function of the magnetic flux 39 .In the tested cases, the deviceto-device variability following a normal distribution does not cause a degradation in the classification accuracy, implying a reliable computation with heterogeneous units.As far as we checked, the network topology seems to play a decisive role in the input transformation by the reservoir rather than the device-to-device variability.If the shapes and the ranges of the distributions of R ON and R OFF are estimated through measurements with specific devices, they can be easily incorporated into the memristor network model and the simulation platform.We have leveraged appropriate pre/post-processing methods for each task to get high classification accuracies comparable to those obtained by the other ML methods.It would be possible to further improve the classification accuracy by adding more advanced operations in the readout part, but our signal conversion methods limited to linear matrix operations are reasonable for maintaining the merit of low learning cost.
The memristive reservoir of the network type is an attractive option for hardware implementation, because the number of possible network structures, capable of producing different dynamic responses to input signals, can be drastically increased by scaling up the network size.This structural diversity is the major advantage of the network-type reservoir over the arraytype and single-node-type reservoirs 10 .Since only a part of system components are controllable in real memristive devices and materials, it is a significant issue to find better system settings under practical constraints 54 .From a scientific perspective, our final goal is to comprehensively understand the relationship between dynamical properties and information processing capacity of memristor networks.The dynamical properties can be investigated through spectral analysis 21 and nonlinear time series analysis 55 .The features related to information processing can be evaluated by computing relevant measures such as memory capacity 56 , kernel quality 57 , and other capacity scores 58 .Our simulation platform can contribute to both these purposes.A target for the future is to integrate numerical and experimental approaches for establishing a design principle of memristive reservoirs, thereby accelerating the development of energy-efficient RC-based ML hardware.

Memristor networks and incidence matrices
The physical reservoir in this study is a general memristor network consisting of N m memristors, N n circuit nodes, and N i input voltage sources (see Fig. 1a).By regarding the circuit nodes as vertices v n for n = 1, . . ., N n and the memristor branches as edges e m for m = 1, . . ., N m , a network structure of memristors can be represented as a directional graph with an incidence matrix E m ∈ R N n ×N m .If a memristor branch e m (m = 1, . . ., N m ) connects a starting node v k and an ending node v l , then the incidence matrix is defined as follows: In a similar way, an incidence matrix E i ∈ R N n ×N i for the positions of the voltage sources can be defined.If a voltage source i connects a starting node k and an ending node l, then

Single memristor model
The linear drift memristor model 34 was adopted for our simulations.It is assumed that a memristor device consists of a doped region (or LRS) with low resistance R ON and an undoped region (or HRS) with high resistance R OFF (see Fig. 1c).

9/14
The memristance M at time t is written as follows: where D and w(t) represent the lengths of the memristor device and the doped region, respectively.When a voltage signal v(t) is applied to the memristor, the current j(t) flowing through the memristor and the time evolution of the internal variable w(t) are described as follows: where µ v represents the average ion mobility.Since the memristance can be written as M(q) = dΦ(q)/dq using the charge q and the magnetic flux Φ, the memductance W (q) = dq(Φ)/dΦ is represented as follows 59 : where M(w(0)) is the memristance at the initial condition.
The constant parameter a is obtained as follows: where r is the ON/OFF resistance ratio.

Variability of memristors
The device-to-device variation can be represented as parameter distributions in the single memristor model.In our study, the resistance R ON of the LRS was generated from a normal distribution with mean RON and standard deviation σ Ron .The resistance R OFF of the HRS was generated from a normal distribution with mean ROFF (= r RON ) and standard deviation 2σ ROFF .It was assumed that R OFF has larger variability than R on 38 .The average ON/OFF resistance ratio r controls the nonlinearity of the I-V characteristic (see Fig. 1d).The variability parameter σ controls the degree of device-to-device variation (see Fig. 1e).The initial condition w(0) was generated from a normal distribution with mean D/10 and standard deviation σ D/10.

Formulation of memristor networks
A general memristor network can be formulated as Eqs.( 1)-(3) by using the new modified nodal analysis 39 .First, the Kirchhoff's current law gives a set of N m differential equations as follows: where j m (t) ∈ R N m is the vector of currents flowing on the N m memristors and j i (t) ∈ R N i is the vector of currents flowing on the N i voltage sources at time t.The current vector can be expressed as follows: where q m ∈ R N m and Φ Φ Φ n ∈ R N n denote the vectors of the charges and fluxes, respectively.By substituting Eq. ( 12) into Eq.( 11), Eq. ( 1) is derived.Using the first-order linearization, Eq. ( 1) can be rewritten as follows: where W m ∈ R N m ×N m denotes the diagonal matrix whose elements are the memductances of the memristors.For the linear drift model, the diagonal elements are given by Eq. ( 9).Second, the Faraday's law gives Eq. ( 2) which is a set of N n differential equations.Third, the Kirchhoff's voltage law gives Eq. ( 3) which is a set of N i algebraic equations.As a result, the network of the linear drift memristor models is formulated as a set of differential-algebraic equations (DAEs) consisting of Eq. ( 13), Eq. ( 2), and Eq. ( 3).The DAE was numerically solved with the DAE solver ode15i in the software package MATLAB 41 .For a technical reason, a null signal in time period [0, ∆t relax ] for relaxation was added to the main input voltage signal in time period [∆t relax , ∆t relax + ∆t main ] in our simulations.The main input signal with a maximum absolute voltage V max was generated from an input time series of length L in after scaling and interpolation.The parameter values are listed in Table 1.

Readout
The vector j m (t) of N m currents is measured from the memristor network in time period [∆t relax , ∆t relax + ∆t main ].All the N m currents (or partial N x currents) are used to construct a state collection matrix X for the readout through a task-specific postprocessing (see Methods section for specific tasks).Once the state collection matrix X k corresponding to the k-th input time series data is obtained for k = 1, . . ., N train , the overall state collection matrix is obtained as follows: Correspondingly, the overall teacher collection matrix is set as follows: where D k is the teacher matrix for the k-th input data.The column count of D k is the same as that of X k and its row count is the same as the number of classes, N c , in the classification task.If the k-th input data has label c k ∈ {1, . . ., N c }, then each column of D k is given by a one-hot vector [0, . . ., 1, . . ., 0] ∈ R N c where only the c k -th element is 1 and the others are 0.
In the training (i.e.learning) phase, the error between the system output Y train = W out X train and the target output D train is minimized by a linear regression.An optimal solution is obtained as follows: where D † train represents the pseudoinverse matrix of D train .In the testing (i.e.inference) phase, the classification ability of the trained system is evaluated for N test unknown time series data in the testing dataset.For the testing dataset, X test and 10/14

Waveform classification
The sine and triangular waveform data were generated with the following equations: s sin (t) = sin(2π f t) for t ∈ [0, 1], s tri (t) = sawtooth(2π f t + π/2, 1/2) for t ∈ [0, 1], where the "sawtooth" is a function in MATLAB and the frequency f was randomly drawn from the uniform distribution in In the postprocessing step, the current signals were converted to sequences of length L out via sampling, and then to positive-valued sequences by taking their absolute values.The state collection matrix X ∈ R N m ×L out is constructed by concatenating those sequences for each input time series.The 10-fold cross validation was used to evaluate the classification accuracy.

ECG classification
This task was performed with the ECG200 dataset from the UCR Timeseries Classification Archive 47 , which was originally formatted by Olszewski 60

Spoken digit recognition
This task was performed with the NIST TI-46 Word corpus collected at Texas Instruments in a quiet acoustic enclosure using an Electro-Voice RE-16 Dynamic Cardioid microphone at 12.5kHz sample rate with 12-bit quantization 50 .From each time series data, the main sound signal of length L data were extracted by removing the silence part with the VOICEBOX, a Speech Processing Toolbox for MATLAB 61 .The length L data differs depending on the data, ranging from 48 to 102.As shown in Fig. 6a, each sound signal was transformed into a cochleagram based on the Lyon's passive ear model 48 implemented with the Auditory Toolbox in MATLAB 62 .The cochleagram is represented as a matrix P ∈ R N f ×L data where N f is the number of frequency channels.It was converted to a masked data P mask = QP ∈ R S mask ×L data , where Q ∈ R S mask ×N f represents a binary mask of elements 0 and 1.We set N f = 78.Each of the L data column-wise sequences of length L in (= S mask ) was converted to a voltage signal by scaling and interpolation, and then fed into the memristive reservoir.

Figure 2 . 2 ,Figure 3 .Figure 4 .
Figure 2. Dynamical behavior of a memristor-network-based reservoir driven by waveform signals.The network structure is the Rand-UP type, the average ON/OFF resistance ratio is r = 50, and the degree of variability is σ = 0.2. a A sine wave input.b The nodal voltages.c The currents on the memristor branches.d The I-V curves.e The same as d, but the absolute current on a semi-log plot.f The currents plotted against the nodal voltages.g-l The same as a-f, but for a triangular input.

4 , 2 ,Figure 5 .
Figure 5. Results of the ECG classification task.a The classification accuracies for a variation of the average ON/OFF ratio r = ROFF / RON .b The classification accuracies plotted against the variability parameter σ .The meanings of the marks are the same as those in a. c The classification accuracies when N x signals out of N m (= 20) current signals were used for the readout.d The dependence of the accuracy on the maximum input voltage V max .

Figure 6 .
Figure 6.Signal processing for the spoken digit recognition task.a The preprocessing method.A sound data is transformed into a cochleagram of size N f (= 78) × L data (= 48) by using the Lyon's passive ear model 48 and then multiplied by a random binary mask of size S mask (= 100) × N f .The masked data is separated into L data column-wise sequences.b An example of the dynamic response of the memristive reservoir of the Rand-UP type when r = 50 and σ = 0.2.c The postprocessing method.The N m current signals are transformed into the sequences of length L out (= 100) for each input, which are concatenated into one-dimensional vector of size N m L out .The vectors for all the L data inputs are concatenated to make the state collection matrix.

2 rFigure 7 .
Figure 7. Results of the spoken digit recognition task.a The classification accuracies when the number of training data N train is varied.The results with the linear classifier (LC) include the four cases with S mask = 10, 25, 50, 100.The results with the memristive RC systems include the four cases with Ring-UP, Ring-RP, Rand-UP, and Rand-RP, where S mask = 100.b The classification accuracies plotted against the average ON/OFF ratio r. c The classification accuracies plotted against the variability parameter σ .d The classification accuracies when N x signals out of N m (= 20) current signals were used for the readout.e The confusion matrix for testing data.

[ f 0 ( 1 −
δ ), f 0 (1 + δ )].The parameter values were set at f 0 = 5 and δ = 0.4.The whole dataset includes 100 sine and 100 triangular waveform data, all of which have length L data (= 100).It was separated into N train training data including randomly chosen N train /2 data from each class and the remaining N test (= 200 − N train ) testing data.
. The dataset contains a total of 200 ECG signal data having class labels corresponding to normal and abnormal heartbeats.The dataset consists of N train (= 100) training data and N test (= 100) testing data, all of which have length L data (= 96).The training dataset includes 31 abnormal and 69 normal data.The testing dataset includes 36 abnormal and 64 normal data.

2/14
Memristor-network-based RC system.a System architecture composed of preprocessing, input, reservoir, and readout parts.An input time series data is preprocessed and then fed into the memristor network through the voltage source.The memristor network consists of N m memristors, N n circuit nodes, and N i voltage sources.The current signals are measured as a reservoir state and processed in the readout part.The output weight matrix W out is optimized by a linear regression in the training process.b The linear drift model of a memristor 34 , which is equivalent to a series of a low resistance state (LRS) with resistance R ON and a high resistance state (HRS) with resistance R OFF ( R ON ).The ratio between the lengths of the LRS and HRS changes in time depending on an applied voltage.c Current-voltage (I-V ) characteristics of a single memristor model in response to a sinusoidal voltage signal for different values of r = R OFF /R ON .d Normal distributions of R ON with mean RON = 100 Ω and standard deviation σ RON for different values of σ controlling the degree of the device-to-device variation.
e Examples of four types of network structures, including a ring with unidirectional polarity (Ring-UP), a ring with random polarity (Ring-RP), a random network with unidirectional polarity (Rand-UP), and a random network with random polarity (Rand-RP).The memristors indicated by the arrows in the Ring-UP and Ring-RP types have opposite polarities.The networks of the Rand-UP and Rand-RP types are generated by randomly adding non-local memristor branches to those of the Ring-UP and Ring-RP types, respectively.

Table 1 .
List of system parameters and their values.D test are composed in a similar way to Eq. (14) and Eq.(15), respectively.The system outputs for the testing data are computed asY test = [Y 1 , . ..,Y k , . ..,Y test ] = Ŵ out X test .From each column of Y k , the row index that gives the largest value is recorded.The predicted class ĉk is determined as the most frequent value in the recorded row indices for all the columns of Y k .By comparing the true class c k and the predicted class ĉk for all the N test testing data, the classification accuracy is computed.