Introduction

Machine learning (ML) has been becoming a vital technology in many industries for promoting artificial intelligence (AI) during the last decade. For further penetration and expansion of practical applications based on ML methods, it is often demanded to enhance their computational efficiency by reducing computational time and resources while maintaining desired performance. Reservoir computing (RC) is one of the ML frameworks that can meet such a demand1,2,3,4. An RC system is normally composed of an unadaptable dynamic reservoir for transforming input time series data (or sequential data) into a high-dimensional feature space and a trainable readout for performing a pattern analysis with a simple learning algorithm. The reservoir needs to be well designed for achieving high computational performance in exchange for the simplicity and speediness of the learning process in the readout. For reservoirs constructed with recurrent neural networks5,6,7, there are some practical design guides which are mainly helpful in software computation8,9.

In the exploration of hardware implementation of the adaptability-free reservoirs, various physical reservoirs have been developed based on electronics, photonics, spintronics, mechanics, material engineering, robotics, and biology10. A unified viewpoint can be obtained by categorizing the reservoir architectures into the network type11,12,13,14, the single nonlinear node plus a delayed-feedback type15,16,17,18, and the excitable continuous medium type19,20,21. However, it is still challenging to derive a design guide for each type of physical reservoir. This is partly because of a difficulty in comprehensively understanding how the computational performance of physical RC systems depends on possible influential factors, such as the system architecture, the physical characteristics of system components, and the signal processing method.

In this study, we tackle the above-mentioned issue by focusing on memristive reservoirs. Memristive systems and devices (or memristors)22,23 are suitable for constructing physical reservoirs, because their dynamics can show inherent nonlinearity and their resistance, called memristance, can be time-varying based on the history of an applied voltage signal. The nonlinearity and the input history-dependent reaction of memristors are favorable in solving linearly inseparable problems with time series data3. Previous studies have demonstrated the potential of memristive reservoirs for temporal pattern recognition, which can be mainly divided into two types: memristor networks14,24,25,26 and memristor arrays27,28,29,30. Memristor networks use complex dynamic behavior of interacting memristors as a reservoir state, whereas memristor arrays leverage a set of nonlinear responses of independent memristors with optional delayed feedback loops. Our target in this study is the network-type memristive reservoirs which are more difficult to design and control compared with the array-type ones. We aim to develop a systematic approach for examining the effects of system components, such as the network architecture and the nonlinear characteristics of memristors, on the computational performance of the memristor-network-based RC systems toward establishing a practical design guide and facilitating their hardware implementation.

We mathematically formulate a general system of memristor networks and develop a simulation platform for performing temporal pattern classification in the RC framework. In temporal classification tasks, the dataset is given as a set of multiple time series data and the corresponding class labels. Our purpose is to construct a pattern classifier having a high generalization ability by using the memristor-network-based RC systems, which can well predict the true class label even for unknown time series data after learning. With our simulation platform, we can examine the effects of various system factors on the computational performance of the memristive RC systems. Numerical experiments are conducted to evaluate the classification performance of the RC systems composed of simple memristor models under different system conditions for three temporal pattern classification tasks: the waveform classification31, the electrocardiogram (ECG) classification32, and the spoken digit recognition33. In the waveform classification task with sine and triangular waves, a perfect classification is achieved at the best conditions. In the ECG classification task with normal and abnormal patterns, the classification accuracy reaches a maximum of 86%. In the spoken digit recognition task with the TI-46 Word corpus, the best classification accuracy is 97.3%. These classification accuracies are comparable to those obtained by state-of-the-art ML methods. The results suggest that the RC systems with memristor networks are very promising as a building block of next-generation ML and AI hardware.

Results

RC systems with memristor networks

A physical RC system with a memristor network is illustrated in Fig. 1a, which consists of a preprocessing part, an input part, a reservoir part, and a readout part. First, a given time series data is converted to a voltage signal in a preprocessing step, such as appropriate scaling and masking, depending on the type of data. Then, this voltage signal is fed into the voltage source of the memristor-network-based reservoir. A dynamic response of the reservoir to the input signal is obtained as time evolutions of electric currents flowing through the individual memristors. In the readout part, the current signals are converted to a matrix through a collection of reservoir states and an optional postprocessing step, and then, used to produce a system output. Only the output weight matrix \(W^{\mathrm{out}}\) is trainable, which is optimized by linear regression so as to minimize the error between the system output and the target output. We limit the pre/post-processing methods to matrix operations, in order to shed light on the nonlinear transformation effect of the memristive reservoir.

Figure 1
figure 1

Memristor-network-based RC system. (a) System architecture composed of preprocessing, input, reservoir, and readout parts. An input time series data is preprocessed and then fed into the memristor network through the voltage source. The memristor network consists of \(N_{\mathrm{m}}\) memristors, \(N_{\mathrm{n}}\) circuit nodes, and \(N_{\mathrm{i}}\) voltage sources. The current signals are measured as a reservoir state and processed in the readout part. The output weight matrix \(W^{\mathrm{out}}\) is optimized by linear regression in the training process. (b) The linear drift model of a memristor34, which is equivalent to a series of a low resistance state (LRS) with resistance \(R_{\mathrm{ON}}\) and a high resistance state (HRS) with resistance \(R_{\mathrm{OFF}} (\gg R_{\mathrm{ON}})\). The ratio between the lengths of the LRS and HRS changes in time depending on an applied voltage. (c) Current–voltage (IV) characteristics of a single memristor model in response to a sinusoidal voltage signal for different values of \(r=R_{\mathrm{OFF}}/R_{\mathrm{ON}}\). (d) Normal distributions of \(R_{\mathrm{ON}}\) with mean \(\bar{R}_{\mathrm{ON}} = 100\,\Omega\) and standard deviation \(\sigma \bar{R}_{\mathrm{ON}}\) for different values of \(\sigma\) controlling the degree of the device-to-device variation. (e) Examples of four types of network structures, including a ring with unidirectional polarity (Ring-UP), a ring with random polarity (Ring-RP), a random network with unidirectional polarity (Rand-UP), and a random network with random polarity (Rand-RP). The memristors indicated by the arrows in the Ring-UP and Ring-RP types have opposite polarities. The networks of the Rand-UP and Rand-RP types are generated by randomly adding non-local memristor branches to those of the Ring-UP and Ring-RP types, respectively.

We consider a general memristor network consisting of \(N_{\mathrm{m}}\) memristors, \(N_{\mathrm{n}}\) circuit nodes, and \(N_{\mathrm{i}}\) voltage sources (see Fig. 1a with Nn=8, Nm=11, and \(N_{\mathrm{i}}=1\)). By regarding the circuit nodes as vertices and the memristor branches as edges, the connectivity of the memristors can be represented as a directional graph, described by an incidence matrix \(E_{\mathrm{m}} \in {\mathbb {R}}^{N_{\mathrm{n}}\times N_{\mathrm{m}}}\). The connectivity of voltage sources can be similarly described by an incidence matrix \(E_{\mathrm{i}} \in {\mathbb {R}}^{N_{\mathrm{n}}\times N_{\mathrm{i}}}\) (see “Methods” section).

In this study, we assume that the individual memristors in the reservoir are described by the linear drift model34. When an electric voltage is applied to a metal-oxide memristor, the oxygen vacancies (i.e. dopants) drift in the device as charge carriers, and shift the boundary between a doped region with a low resistance and an undoped region with a high resistance. This is simply represented by the linear drift model composed of a series of a low resistance state (LRS) with resistance \(R_{\mathrm{ON}}\) and a high resistance state (HRS) with resistance \(R_{\mathrm{OFF}}\) (\(\gg R_{\mathrm{ON}}\)) as illustrated in Fig. 1b (see “Methods” section for details). With a variation of the length of the LRS, the total memristance changes in time. The polarity of the memristor, related to its directionality, is determined depending on whether the drift of the dopants expands or contracts the LRS35. For a sinusoidal input voltage, the linear drift model can exhibit a pinched hysteresis loop in the I-V curve as shown in Fig. 1c. A variation in the ON/OFF resistance ratio, \(r=R_{\mathrm{OFF}}/R_{\mathrm{ON}}\), changes the nonlinearity of the I-V characteristics. When micro/nano-scale memristor devices are fabricated, a device-to-device variation in their electrical properties is inevitable27,36,37,38. Therefore, \(R_{\mathrm{ON}}\) and \(R_{\mathrm{OFF}}\) are assumed to follow a normal distribution with mean \(\bar{R}_{\mathrm{ON}}\) and \(\bar{R}_{\mathrm{OFF}}\), respectively (see “Methods” section). Figure 1d shows a normal distribution of \(R_{\mathrm{ON}}\) with mean \(\bar{R}_{\mathrm{ON}}=100\,\Omega\) and standard deviation \(\sigma \bar{R}_{\mathrm{ON}}\) for different values of \(\sigma\) controlling the degree of variability.

Figure 1e illustrates examples of four different types of network structures tested in our numerical simulations, including a ring with unidirectional polarity (Ring-UP), a ring with random polarity (Ring-RP), a random network with unidirectional polarity (Rand-UP), and a random network with random polarity (Rand-RP). These networks are considered in order to clarify the effect of randomness in network connectivity and polarity. For instance, the difference between Ring-UP and Ring-RP lies in the polarities of the memristors indicated by the arrows. To avoid a disconnected graph, we construct networks of the Rand-UP and Rand-RP types by adding long-range memristor branches to those of the Ring-UP and Rand-RP types, respectively.

Figure 2
figure 2

Dynamical behavior of a memristor-network-based reservoir driven by waveform signals. The network structure is the Rand-UP type, the average ON/OFF resistance ratio is \(\bar{r}=50\), and the degree of variability is \(\sigma =0.2\). (a) A sine wave input. (b) The nodal voltages. (c) The currents on the memristor branches. (d) The IV curves. (e) The same as (d), but the absolute current on a semi-log plot. (f) The currents plotted against the nodal voltages. (g)–(l) The same as (a)–(f), but for a triangular wave input.

We explicitly formulated circuit equations of a general memristor network based on the new modified nodal analysis method39 as follows (see “Methods” section for derivation):

$$\begin{aligned} E_{\mathrm{m}} W(E_{\mathrm{m}}^\top \varvec{\Phi }_{\mathrm{n}}(t))E_{\mathrm{m}}^\top \frac{\mathrm{d} \varvec{\Phi }_{\mathrm{n}}(t)}{\mathrm{d}t} + E_{\mathrm{i}} \mathbf {j}_{\mathrm{i}}(t)= & {} \mathbf {0}, \end{aligned}$$
(1)
$$\begin{aligned} \frac{\mathrm{d} \varvec{\Phi }_{\mathrm{n}}(t)}{\mathrm{d}t} -\mathbf {v}_{\mathrm{n}}(t)= & {} \mathbf {0}, \end{aligned}$$
(2)
$$\begin{aligned} E_{\mathrm{i}}^\top \mathbf {v}_{\mathrm{n}}(t) - \mathbf {v}_{\mathrm{i}}(t)= & {} \mathbf {0}, \end{aligned}$$
(3)

where t represents continuous time, \(\varvec{\Phi }_{\mathrm{n}}(t) \in {\mathbb {R}}^{N_{\mathrm{n}}}\) is a vector of nodal magnetic fluxes, \(W(\cdot ) \in {\mathbb {R}}^{N_{\mathrm{m}}\times N_{\mathrm{m}}}\) is a diagonal matrix of memductances (memory conductances) of the memristors as a function of fluxes, \(\mathbf {j}_{\mathrm{i}}(t) \in {\mathbb {R}}^{N_{\mathrm{i}}}\) is a vector of currents at the voltage sources, \(\mathbf {v}_{\mathrm{n}}(t) \in {\mathbb {R}}^{N_{\mathrm{n}}}\) is a vector of nodal voltages, \(\mathbf {v}_{\mathrm{i}}(t) \in {\mathbb {R}}^{N_{\mathrm{i}}}\) is a vector of voltages at the voltage sources. This set of equations with respect to the state variables (\(\mathbf {v}_{\mathrm{n}}(t)\), \(\varvec{\Phi }_{\mathrm{n}}(t)\), and \(\mathbf {j}_{\mathrm{i}}\)) are categorized into a differential-algebraic system of equations (DAEs), where differential and algebraic equations are mixed. In general, DAEs are more difficult to solve than ordinary differential equations (ODEs), because the Jacobian matrix (i.e. the first-order derivative) used for numerical integration becomes singular40. By setting \(E_{\mathrm{m}}\), \(E_{\mathrm{i}}\), \(\mathbf {v}_{\mathrm{i}}\), and W depending on the reservoir network structure and the memristor models, concrete system equations can be obtained from Eqs. (1)–(3) (see “Methods” section). In our simulation platform on MATLAB41, these equations are derived by symbolic computation and solved by using the DAE solver. The dynamic state of the reservoir is represented by time courses of vector \(\mathbf {j}_{\mathrm{m}}(t) \in {\mathbb {R}}^{N_{\mathrm{m}}}\) of electric currents flowing through the memristors. The reservoir states are used to produce the system output in the readout processing (see “Methods” section).

Figure 3
figure 3

Results of the waveform classification task. (a) The classification accuracies for the four different network types when \(N_{\mathrm{train}}\) is varied. (b) The classification accuracies plotted against the variability parameter \(\sigma\). (c) The classification accuracies when \(N_{\mathrm{x}}\) signals out of \(N_{\mathrm{m}} (=20)\) current signals were used for the readout. (d) The classification accuracies plotted against the average ON/OFF resistance ratio \(\bar{r}\). (e) The confusion matrices for \(\bar{r}=50\). (f) The confusion matrices for \(\bar{r}=10^4\).

Waveform classification

The waveform classification task has often been used as a benchmark task for evaluating computational performance of physical RC systems29,31,42,43,44. We consider a two-class classification problem with 100 sine and 100 triangular waves having the same amplitude and different frequencies. The frequency of each waveform was randomly generated from a uniform distribution in a certain range (see “Methods” section). Some data were used for training and the other data were for testing. These waveform data were converted to voltage signals and then fed into the memristor-network-based reservoir. The parameter conditions used for numerical experiments are listed in Table 1.

Figure 4
figure 4

Signal processing for the ECG classification task. (a) The preprocessing method. An original ECG data is transformed with a binary mask of \(+1\) and \(-1\) into a masked data. Each of the \(L_{\mathrm{data}}\) (=96) column-wise sequences is fed into the memristive reservoir. (b) A dynamic response of the memristive reservoir of the Rand-UP type when \(\bar{r}=50\) and \(\sigma =0.2\). (c) The postprocessing method. The \(N_{\mathrm{m}}\) (= 20) sequences of length \(L_{\mathrm{out}}\) (= 50) are obtained from the reservoir for each input sequence. In Case (i), the reservoir outputs of size \(L_{\mathrm{out}} \times N_{\mathrm{m}}\) are rotated and stacked for all the reservoir inputs to form a state collection matrix. In Case (ii), the reservoir outputs are transformed into a one-dimensional sequence and concatenated for all the reservoir inputs to form a state collection matrix.

Figure 2 demonstrates the responses of a memristor network of the Rand-UP type to sine and triangular input voltage signals when \(\bar{r}=\bar{R}_{\mathrm{OFF}}/\bar{R}_{\mathrm{ON}}=50\) and \(\sigma =0.2\). Figure 2a–c show the sine wave input, the nodal voltages, and the currents on the memristor branches, respectively. Due to the random connectivity of the memristors and the device-to-device variation, the individual memristors exhibit different nonlinear IV characteristics as shown in Fig. 2d,e. Figure 2f shows the relationship between the input voltage signal and the branch currents, indicating the input-output transformation realized by the reservoir. The corresponding figures for a triangular wave input are shown in Fig. 2g–l (see Supplementary Figs. 15 for other system conditions). The \(N_{\mathrm{m}}\) current signals were converted to sequences of length \(L_{\mathrm{out}}=100\) by sampling and then used to form a state collection matrix X for the readout processing (see “Methods” section).

Figure 5
figure 5

Results of the ECG classification task. (a) The classification accuracies for a variation of the average ON/OFF ratio \(\bar{r}=\bar{R}_{\mathrm{OFF}}/\bar{R}_{\mathrm{ON}}\). (b) The classification accuracies plotted against the variability parameter \(\sigma\). The meanings of the marks are the same as those in (a). (c) The classification accuracies when \(N_x\) signals out of \(N_m (=20)\) current signals were used for the readout. (d) The dependence of the accuracy on the maximum input voltage \(V_{\mathrm{max}}\).

Figure 3 shows the results of the waveform classification task. We tested the four network structures shown in Fig. 1b: Ring-UP with (magenta triangles), Ring-RP (green diamonds), Rand-UP (blue squares), and Rand-RP (red circles). We evaluated the classification accuracies for 10 random network realizations based on the 10-fold cross validation. Figure 3a–d plot the average value over the 100 trials together with the error bar indicating the standard error. Figure 3a shows the classification accuracies when the number of training data, \(N_{\mathrm{train}}\), is varied. It is remarkable that the networks of the Rand-UP and Rand-RP types achieve the 100% accuracy independently of \(N_{\mathrm{train}}\) due to the rich variety in the reservoir states (see Fig. 2; Supplementary Fig. 3). The performances of the Ring-UP and Ring-RP types are inferior to those of the Rand-UP and Rand-RP types, but much higher than the baseline accuracy obtained by a linear classifier. This indicates the benefit of the nonlinear signal transformation by the physical reservoir. Figure 3b shows that the classification performance is not sensitive to the change in the variability parameter \(\sigma\), suggesting a variability-tolerant property of the memristive RC system. Figure 3c demonstrates that the classification accuracy monotonically increases with the number of signals, \(N_{\mathrm{x}}\) (\(\le N_{\mathrm{m}}\)), used for the readout. In other words, a higher-dimensional reservoir state yields a better accuracy. The accuracy reaches the perfect level when \(N_{\mathrm{x}} \ge 14\). Figure 3d shows that the average ON/OFF resistance ratio \(\bar{r}\) related to the nonlinearity of the IV characteristics is a key factor influencing the classification performance, which is maximized at the intermediate values around \(\bar{r}=5\times 10^2\)\(10^3\). Figure 3e,f show the confusion matrices when \(\bar{r}=50\) and \(\bar{r}=10^4\), respectively. The results imply that the misclassifications are linked to some failures in the training for \(\bar{r}=50\) whereas they are caused by over-training for \(\bar{r}=10^4\).

ECG classification

An electrocardiogram (ECG) is an electric signal associated with heartbeats, which is often used for health checks and cardiac disease detections. In a normal state, the ECG signal shows a repetition of typical waveforms called the QRS complex45 due to depolarization and repolarization of the membrane potentials of the cardiac muscle cells. Irregular ECG signals often correspond to abnormal cardiac behavior caused by diseases such as ischemia and myocardial infarction. An ECG-based heartbeat classification task is aimed at separating abnormal ECG signals from normal ones32,46. We used the ECG200 dataset formatted in the UCR Time Series Classification Archive47, which contains a total of 200 samples of ECG segment data, including 100 samples for training and the other 100 samples for testing.

Figure 4a shows a preprocessing step for each ECG data of length \(L_{\mathrm{data}} (=96)\). The one-dimensional vector representing the original data was transformed into a 2D masked data by using a random mask of size \(S_{\mathrm{mask}} (=50)\), each element of which is \(-1\) or 1. This randomization process corresponds to a multiplication of input data by random input weights in echo state networks3,5. The masked data was separated into \(L_{\mathrm{data}}\) column-wise sequences, each of which was converted to a voltage signal and then fed into the memristor-network-based reservoir in the order of the column index sequentially. After the injection of each sequence, we reset the reservoir state. Figure 4b demonstrates a dynamic response of the reservoir to an input time series. The \(N_{\mathrm{m}} (=20)\) current signals were transformed into the sequences of length \(L_{\mathrm{out}} (=50)\) for constructing a state collection matrix X in two possible ways as shown in Fig. 4c. In Case (i), the transposed matrices of size \(N_{\mathrm{m}}\times L_{\mathrm{out}}\) were stacked vertically for all the inputs to yield a state collection matrix \(X \in {\mathbb {R}}^{N_{\mathrm{m}}L_{\mathrm{data}} \times L_{\mathrm{out}}}\). In Case (ii), the \(N_{\mathrm{m}}\) sequences were concatenated into a one-dimensional sequence and those for all the inputs were further concatenated to form a state collection matrix (vector) \(X\in {\mathbb {R}}^{N_{\mathrm{m}}L_{\mathrm{data}}L_{\mathrm{out}} \times 1}\). The parameter conditions for numerical experiments are listed in Table 1.

Figure 6
figure 6

Signal processing for the spoken digit recognition task. (a) The preprocessing method. A sound data is transformed into a cochleagram of size \(N_{\mathrm{f}} (=78) \times L_{\mathrm{data}} (=48)\) by using the Lyon’s passive ear model48 and then multiplied by a random binary mask of size \(S_{\mathrm{mask}} (=100) \times N_{\mathrm{f}}\). The masked data is separated into \(L_{\mathrm{data}}\) column-wise sequences. (b) An example of the dynamic response of the memristive reservoir of the Rand-UP type when \(\bar{r}=50\) and \(\sigma =0.2\). (c) The postprocessing method. The \(N_{\mathrm{m}}\) current signals are transformed into the sequences of length \(L_{\mathrm{out}} (=100)\) for each input, which are concatenated into one-dimensional vector of size \(N_{\mathrm{m}}L_{\mathrm{out}}\). The vectors for all the \(L_{\mathrm{data}}\) inputs are concatenated to make the state collection matrix.

Figure 5 shows the results of the ECG classification task. Each plot corresponds to the average accuracy over 10 random network realizations and the error bar indicates the standard error. Figure 5a plots the classification accuracies for the four types of reservoir structures combined with the postprocessing in Case (i) (monochrome marks) and Case (ii) (colored marks) when \(\bar{r}\) is varied. The baseline accuracy (gray crosses) obtained by a linear classifier is 64%, meaning that all the testing data were classified as normal patterns. We can see that the performance with the postprocessing in Case (ii) are much better than those in Case (i). If the postprocessing method in Case (ii) is employed, the classification performance is kept at a high level against the variation of \(\bar{r}\), indicating the robustness against the device-to-device variation as shown in Fig. 5b. The effect of the number of signals used for the readout processing, \(N_{\mathrm{x}} (\le N_{\mathrm{m}})\), on the performance is shown in Fig. 5c. By increasing \(N_{\mathrm{x}}\), the performance is significantly improved in Case (i) but only slightly in Case (ii) (see Supplementary Fig. 6). We observe that the maximum input voltage \(V_{\mathrm{max}}\) significantly influences the classification performance in Case (ii) as shown in Fig. 5d. The classification accuracy reaches 86% when the network structure is the Rand-RP type and \(V_{\mathrm{max}}=0.02\) V. This performance is comparable to that of the other ML methods, ranging between 77 and 92%49. We note that the number of trained weights is 1920 in Case (i) and 96,000 in Case (ii). The difference in the number of trainable weights is a potential cause of the gap in the classification performance. These results indicate that the classification performance can largely depend on the postprocessing method even when the same reservoir states are used. It is a future issue to fully understand the influence of the postprocessing on the performance, which is linked with a tradeoff between classification accuracy and computational time for learning.

Figure 7
figure 7

Results of the spoken digit recognition task. (a) The classification accuracies when the number of training data \(N_{\mathrm{train}}\) is varied. The results with the linear classifier (LC) include the four cases with \(S_{\mathrm{mask}}=10, 25, 50, 100\). The results with the memristive RC systems include the four cases with Ring-UP, Ring-RP, Rand-UP, and Rand-RP, where \(S_{\mathrm{mask}}=100\). (b) The classification accuracies plotted against the average ON/OFF ratio \(\bar{r}\). (c) The classification accuracies plotted against the variability parameter \(\sigma\). (d) The classification accuracies when \(N_{\mathrm{x}}\) signals out of \(N_{\mathrm{m}} (=20)\) current signals were used for the readout. (e) The confusion matrix for testing data.

Spoken digit recognition

The isolated spoken digit recognition has been widely used as a benchmark task for testing the classification ability of RC systems10,33,43. The sound dataset from the NIST TI-46 Word corpus50 contains 500 sound waveform data corresponding to 10 utterances of 10 digits (“zero” to “nine”), spoken by 5 different speakers (ID: 1,2,5,6,7)51. The aim of this 10-class classification problem is to correctly predict the true digit from a sound signal.

Each sound data was converted to a cochleagram by using the Lyon’s passive ear model after an elimination of silence periods48 as shown in Fig. 6a (see “Methods” section for details). The data length \(L_{\mathrm{data}}\) ranges from 48 to 102. The cochleagram was transformed into a masked data with a random binary mask of size \(S_{\mathrm{mask}} \times N_{\mathrm{f}}\). Each of the \(N_{\mathrm{f}}\) column-wise sequences is given to the reservoir after an appropriate scaling. An example of the dynamic behavior of the reservoir is shown in Fig. 6b. The \(N_{\mathrm{m}}\) (=20) currents from the reservoir are transformed into the sequences of length \(L_{\mathrm{out}}\) (=100) by sampling. They are concatenated into a one-dimensional vector of size \(N_{\mathrm{m}}L_{\mathrm{out}}\) as shown in Fig. 6c. These column vectors are collected for all the inputs to construct a state collection matrix \(X \in {\mathbb {R}}^{N_{\mathrm{m}}L_{\mathrm{out}} \times L_{\mathrm{data}}}\). The parameter conditions for numerical experiments are listed in Table 1.

Figure 7 shows the results of the spoken digit recognition task. Each plot is the average classification accuracy over 10 random network realizations, evaluated based on the 10-fold cross validation, and the error bar indicates the standard error. Figure 7a demonstrates the classification accuracies when the number of training data, \(N_{\mathrm{train}}\), is varied. The results obtained by the memristive RC systems with \(S_{\mathrm{mask}}=100\) are indicated by the colored marks for the four different structure types, i.e. Ring-UP, Ring-RP, Rand-UP, and Rand-RP. For comparison, the performance of linear classifiers (LCs) applied to the masked data are plotted for different mask size \(S_{\mathrm{mask}}=10, 25, 50, 100\). The classification accuracies for all the tested methods are improved by increasing \(N_{\mathrm{train}}\). The performance of the LCs increases with the mask size, but peaks out at around \(S_{\mathrm{mask}}=100\) (see Supplementary Fig. 7). The memristive RC systems produce better accuracies than the LC in the case of \(S_{\mathrm{mask}}=100\), which highlights the benefit of the nonlinear transformation by the memristive reservoir52. The network structures of the Rand-UP and Rand-RP types yield better accuracies than those of the Ring-UP and Ring-RP types, owing to the diversity in the reservoir dynamics. Figure 7b shows that a smaller average ON/OFF ratio \(\bar{r}\), corresponding to a stronger nonlinearity of I-V characteristics on average, leads to better classification performance for the Rand-UP and Rand-RP types when \(N_{\mathrm{train}}=450\). The best accuracy reaches 97.3% when the ON/OFF ratio is \(\bar{r}=50\) and the network type is Rand-RP, which is comparable to that achieved by other physical RC systems10. For a specific separation between the training and testing datasets among the 10 cases for the cross validation, we obtained 99.8% accuracy on average over 10 different memristor networks. This is the first report that the memristor networks can yield such high performance in this task. Figure 7c indicates that the classification performance is kept at a high level irrespective of a change in the variability level \(\sigma\) of the memristors. Figure 7d demonstrates the results when \(N_{\mathrm{x}}\) signals out of \(N_{\mathrm{m}}\) signals were used for the readout process, indicating that a higher dimensional reservoir state contributes more to increasing the classification accuracy and \(N_{\mathrm{x}}\sim 10\) is enough for obtaining the maximum accuracy. Our analysis on the confusion matrix shown in Fig. 7e reveals that a majority of misclassifications occur when the sound data of digit 2 is incorrectly classified as digit 1 or 8. By overcoming this issue through modifications in the signal processing parts, the performance could be further improved.

Discussion

In this study, we have provided the explicit mathematical formula of the general memristor networks and developed the simulation platform for performing temporal pattern classifications with the memristor-network-based RC systems. The platform enables a considerable search of various system conditions and an identification of the key system component for improving classification performance. The results on the three classification tasks indicate that the randomness of the network connectivity (i.e. Rand-UP and Rand-RP) is favorable for generating diverse nonlinear responses to the input signal and achieving the excellent classification performance compared to the other two (i.e. Ring-UP and Ring-RP). The results have also lead to a new finding that the ON/OFF resistance ratio, controlling the nonlinearity of memristors, can have a large impact on the classification accuracy. Although the best ON/OFF ratios are different depending on the task (see Figs. 3d, 5a, and 7b), all of them (\(\bar{r}\sim 50\) to 500) are within the feasible range reported for real memristor devices53.

The memristor network model that we formulated is quite general, and therefore our simulation platform is easily extendable. In this study, we have used at most 20 memristors in the reservoir to test many different system designs while saving the simulation time. Tackling a more complex pattern recognition task with a larger number of memristors is one of the future works. For this purpose, it is an option to inject multiple different input signals into multiple nodes in the memristive reservoir. Any network connectivity of the memristors, including the four types investigated in this study, can be examined conveniently by setting the incidence matrices. We have assumed the ideal linear drift model for the individual memristors. It can be replaced with any other model by deriving or approximating its memductance as a function of the magnetic flux39. In the tested cases, the device-to-device variability following a normal distribution does not cause a degradation in the classification accuracy, implying a reliable computation with heterogeneous units. As far as we checked, the network topology seems to play a decisive role in the input transformation by the reservoir rather than the device-to-device variability. If the shapes and the ranges of the distributions of \(R_{\mathrm{ON}}\) and \(R_{\mathrm{OFF}}\) are estimated through measurements with specific devices, they can be easily incorporated into the memristor network model and the simulation platform. We have leveraged appropriate pre/post-processing methods for each task to get high classification accuracies comparable to those obtained by the other ML methods. It would be possible to further improve the classification accuracy by adding more advanced operations in the readout part, but our signal conversion methods limited to matrix operations are reasonable for maintaining the merit of low learning cost.

The memristive reservoir of the network type is an attractive option for hardware implementation, because the number of possible network structures producing different dynamic responses to input signals can be drastically increased by scaling up the network size. This structural diversity is the major advantage of the network-type reservoir over the array-type and single-node-type reservoirs10. Since only a part of system components are controllable in real memristive devices and materials, it is a significant issue to find better system settings under practical constraints54. From a scientific perspective, our final goal is to comprehensively understand the relationship between dynamical properties and information processing capacity of memristor networks. The dynamical properties can be investigated through spectral analysis21 and nonlinear time series analysis55. The features related to information processing can be evaluated by computing relevant measures such as memory capacity56, kernel quality57, and other capacity scores58. Our simulation platform can contribute to both these purposes. A target for the future is to integrate numerical and experimental approaches for establishing a design principle of memristive reservoirs, thereby accelerating the development of energy-efficient RC-based ML hardware.

Methods

Memristor networks and incidence matrices

The physical reservoir in this study is a general memristor network consisting of \(N_{\mathrm{m}}\) memristors, \(N_{\mathrm{n}}\) circuit nodes, and \(N_{\mathrm{i}}\) input voltage sources (see Fig. 1a). By regarding the circuit nodes as vertices \(v_n\) for \(n=1,\ldots ,N_{\mathrm{n}}\) and the memristor branches as edges \(e_m\) for \(m=1,\ldots ,N_{\mathrm{m}}\), a network structure of memristors can be represented as a directional graph with an incidence matrix \(E_{\mathrm{m}} \in {\mathbb {R}}^{N_{\mathrm{n}}\times N_{\mathrm{m}}}\). If a memristor branch \(e_m\) (\(m=1,\ldots ,N_{\mathrm{m}}\)) connects a starting node \(v_k\) and an ending node \(v_l\), then the incidence matrix is defined as follows:

$$\begin{aligned} E_{\mathrm{m}}(n,m) =\left\{ \begin{array}{cl} -1 &{} (n=k) \\ 1 &{} (n=l) \\ 0 &{} (n \ne k, l) \end{array}\right. . \end{aligned}$$
(4)

In a similar way, an incidence matrix \(E_{\mathrm{i}} \in {\mathbb {R}}^{N_{\mathrm{n}}\times N_{\mathrm{i}}}\) for the positions of the voltage sources can be defined. If a voltage source i connects a starting node k and an ending node l, then

$$\begin{aligned} E_{\mathrm{i}}(n,i) =\left\{ \begin{array}{cl} -1 &{} (n=k) \\ 1 &{} (n=l) \\ 0 &{} (n \ne k, l) \end{array}\right. . \end{aligned}$$
(5)

Single memristor model

The linear drift memristor model34 was adopted for our simulations. It is assumed that a memristor device consists of a doped region (or LRS) with low resistance \(R_{\mathrm{ON}}\) and an undoped region (or HRS) with high resistance \(R_{\mathrm{OFF}}\) (see Fig. 1c). The memristance M at time t is written as follows:

$$\begin{aligned} M(w(t))= & {} \left( R_{\mathrm{ON}}\frac{w(t)}{D} +R_{\mathrm{OFF}} \left( 1-\frac{w(t)}{D} \right) \right) , \end{aligned}$$
(6)

where D and w(t) represent the lengths of the memristor device and the doped region, respectively. When a voltage signal v(t) is applied to the memristor, the current j(t) flowing through the memristor and the time evolution of the internal variable w(t) are described as follows:

$$\begin{aligned} j(t)= & {} v(t)/M(w(t)), \end{aligned}$$
(7)
$$\begin{aligned} \frac{\mathrm{d}w(t)}{\mathrm{d}t}= & {} \mu _v \frac{R_{\mathrm{ON}}}{D}j(t), \end{aligned}$$
(8)

where \(\mu _v\) represents the average ion mobility. Since the memristance can be written as \(M(q)=\mathrm{d}\Phi (q)/\mathrm{d}q\) using the charge q and the magnetic flux \(\Phi\), the memductance \(W(q)=\mathrm{d}q(\Phi )/\mathrm{d}\Phi\) is represented as follows59:

$$\begin{aligned} W(\Phi ) = (M(w(0))^2-2a\Phi )^{-1/2}, \end{aligned}$$
(9)

where M(w(0)) is the memristance at the initial condition. The constant parameter a is obtained as follows:

$$\begin{aligned} a = \frac{\mu _vR_{\mathrm{ON}}(R_{\mathrm{OFF}} -R_{\mathrm{ON}})}{D^2}=\mu _v R_{\mathrm{ON}} (r-1)/D^2. \end{aligned}$$
(10)

where r is the ON/OFF resistance ratio.

Table 1 List of system parameters and their values.

Variability of memristors

The device-to-device variation can be represented as parameter distributions in the single memristor model. In our study, the resistance \(R_{\mathrm{ON}}\) of the LRS was generated from a normal distribution with mean \(\bar{R}_{\mathrm{ON}}\) and standard deviation \(\sigma \bar{R}_{\mathrm{on}}\). The resistance \(R_{\mathrm{OFF}}\) of the HRS was generated from a normal distribution with mean \(\bar{R}_{\mathrm{OFF}} (=\bar{r}\bar{R}_{\mathrm{ON}})\) and standard deviation \(2\sigma \bar{R}_{\mathrm{OFF}}\). It was assumed that \(R_{\mathrm{OFF}}\) has larger variability than \(R_{\mathrm{on}}\)38. The average ON/OFF resistance ratio \(\bar{r}\) controls the nonlinearity of the I-V characteristic (see Fig. 1d). The variability parameter \(\sigma\) controls the degree of device-to-device variation (see Fig. 1e). The initial condition w(0) was generated from a normal distribution with mean D/10 and standard deviation \(\sigma D/10\).

Formulation of memristor networks

A general memristor network can be formulated as Eqs. (1)–(3) by using the new modified nodal analysis39. First, the Kirchhoff’s current law gives a set of \(N_{\mathrm{m}}\) equations as follows:

$$\begin{aligned} E_{\mathrm{m}}\mathbf {j}_{\mathrm{m}}(t) + E_{\mathrm{i}} \mathbf {j}_{\mathrm{i}}(t) = \mathbf {0}, \end{aligned}$$
(11)

where \(\mathbf {j}_{\mathrm{m}}(t) \in {\mathbb {R}}^{N_{\mathrm{m}}}\) is the vector of currents flowing on the \(N_{\mathrm{m}}\) memristors and \(\mathbf {j}_{\mathrm{i}}(t) \in {\mathbb {R}}^{N_{\mathrm{i}}}\) is the vector of currents flowing on the \(N_{\mathrm{i}}\) voltage sources at time t. The current vector can be expressed as follows:

$$\begin{aligned} \mathbf {j}_{\mathrm{m}} = \frac{\mathrm{d}}{\mathrm{d}t} \mathbf {q}_{\mathrm{m}}(E_{\mathrm{m}}^{\top } \varvec{\Phi }_{\mathrm{n}}(t)), \end{aligned}$$
(12)

where \(\mathbf {q}_{\mathrm{m}} \in {\mathbb {R}}^{N_{\mathrm{m}}}\) and \(\varvec{\Phi }_{\mathrm{n}} \in {\mathbb {R}}^{N_{\mathrm{n}}}\) denote the vectors of the charges and fluxes, respectively. By substituting Eq. (12) into Eq. (11), Eq. (1) is derived. Using the first-order linearization, Eq. (1) can be rewritten as follows:

$$\begin{aligned} E_{\mathrm{m}} W_{\mathrm{m}} (E_{\mathrm{m}}^\top \varvec{\Phi }_{\mathrm{n}}(t)) E_{\mathrm{m}}^\top \frac{\mathrm{d} \varvec{\Phi }_{\mathrm{n}}(t)}{\mathrm{d}t} +E_{\mathrm{i}} \mathbf {j}_{\mathrm{i}}(t) = \mathbf {0}, \end{aligned}$$
(13)

where \(W_{\mathrm{m}} \in {\mathbb {R}}^{N_{\mathrm{m}}\times N_{\mathrm{m}}}\) denotes the diagonal matrix whose elements are the memductances of the memristors. For the linear drift model, the diagonal elements are given by Eq. (9). Second, the Faraday’s law gives Eq. (2) which is a set of \(N_{\mathrm{n}}\) differential equations. Third, the Kirchhoff’s voltage law gives Eq. (3) which is a set of \(N_{\mathrm{i}}\) algebraic equations. As a result, the network of the linear drift memristor models is formulated as a set of differential-algebraic equations (DAEs) consisting of Eqs. (13), (2), and (3). The DAE was numerically solved with the DAE solver ode15i in the software package MATLAB41. For a technical reason, a null signal in time period \([0,\Delta t_{\mathrm{relax}}]\) for relaxation was added to the main input voltage signal in time period \([\Delta t_{\mathrm{relax}}, \Delta t_{\mathrm{relax}}+\Delta t_{\mathrm{main}}]\) in our simulations. The main input signal with a maximum absolute voltage \(V_{\mathrm{max}}\) was generated from an input time series of length \(L_{\mathrm{in}}\) after scaling and interpolation. The parameter values are listed in Table 1.

Readout

The vector \(\mathbf {j}_{\mathrm{m}}(t)\) of \(N_{\mathrm{m}}\) currents is measured from the memristor network in time period \([\Delta t_{\mathrm{relax}}, \Delta t_{\mathrm{relax}}+\Delta t_{\mathrm{main}}]\). All the \(N_{\mathrm{m}}\) currents (or partial \(N_{\mathrm{x}}\) currents) are used to construct a state collection matrix X for the readout through a task-specific postprocessing (see “Methods” section for specific tasks). Once the state collection matrix \(X_k\) corresponding to the k-th input time series data is obtained for \(k=1,\ldots ,N_{\mathrm{train}}\), the overall state collection matrix is obtained as follows:

$$\begin{aligned} X_{\mathrm{train}} = [X_1, \ldots , X_k, \ldots , X_{N_{\mathrm{train}}}]. \end{aligned}$$
(14)

Correspondingly, the overall teacher collection matrix is set as follows:

$$\begin{aligned} D_{\mathrm{train}} = [D_1, \ldots , D_k, \ldots , D_{N_{\mathrm{train}}}], \end{aligned}$$
(15)

where \(D_k\) is the teacher matrix for the k-th input data. The column count of \(D_k\) is the same as that of \(X_k\) and its row count is the same as the number of classes, \(N_{\mathrm{c}}\), in the classification task. If the k-th input data has label \(c_k \in \{1,\ldots ,N_{\mathrm{c}}\}\), then each column of \(D_k\) is given by a one-hot vector \([0,\ldots ,1,\ldots ,0]^\top \in {\mathbb {R}}^{N_{\mathrm{c}}}\) where only the \(c_k\)-th element is 1 and the others are 0.

In the training (i.e. learning) phase, the error between the system output \(Y_{\mathrm{train}} = W^{\mathrm{out}} X_{\mathrm{train}}\) and the target output \(D_{\mathrm{train}}\) is minimized by linear regression. An optimal solution is obtained as follows:

$$\begin{aligned} \hat{W}^{\mathrm{out}} = D_{\mathrm{train}}^\dagger X_{\mathrm{train}}, \end{aligned}$$
(16)

where \(D_{\mathrm{train}}^\dagger\) represents the pseudoinverse matrix of \(D_{\mathrm{train}}\).

In the testing (i.e. inference) phase, the classification ability of the trained system is evaluated for \(N_{\mathrm{test}}\) unknown time series data in the testing dataset. For the testing dataset, \(X_{\mathrm{test}}\) and \(D_{\mathrm{test}}\) are composed in a similar way to Eqs. (14) and (15), respectively. The system outputs for the testing data are computed as \(Y_{\mathrm{test}} = [Y_1,\ldots ,Y_k,\ldots , Y_{\mathrm{test}}]=\hat{W}^{\mathrm{out}} X_{\mathrm{test}}\). From each column of \(Y_k\), the row index that gives the largest value is recorded. The predicted class \(\hat{c}_k\) is determined as the most frequent value in the recorded row indices for all the columns of \(Y_k\). By comparing the true class \(c_k\) and the predicted class \(\hat{c}_k\) for all the \(N_{\mathrm{test}}\) testing data, the classification accuracy is computed.

Waveform classification

The sine and triangular waveform data were generated with the following equations:

$$\begin{aligned} s_{\mathrm{sin}}(t)= & {} \sin (2\pi f t) \quad \text{ for }\,t\in [0,1],\\ s_{\mathrm{tri}}(t)= & {} \text{ sawtooth }(2\pi f t+ \pi /2, 1/2) \quad \text{ for }\,t\in [0,1], \end{aligned}$$

where the “sawtooth” is a function in MATLAB and the frequency f was randomly drawn from the uniform distribution in \([f_0(1-\delta ), f_0(1+\delta )]\). The parameter values were set at \(f_0=5\) and \(\delta =0.4\). The whole dataset includes 100 sine and 100 triangular waveform data, all of which have length \(L_{\mathrm{data}} (=100)\). It was separated into \(N_{\mathrm{train}}\) training data including randomly chosen \(N_{\mathrm{train}}/2\) data from each class and the remaining \(N_{\mathrm{test}} (=200-N_{\mathrm{train}})\) testing data.

In the postprocessing step, the current signals were converted to sequences of length \(L_{\mathrm{out}}\) via sampling, and then to positive-valued sequences by taking their absolute values. The state collection matrix \(X \in {\mathbb {R}}^{N_{\mathrm{m}}\times L_{\mathrm{out}}}\) is constructed by concatenating those sequences for each input time series. The 10-fold cross validation was used to evaluate the classification accuracy.

ECG classification

This task was performed with the ECG200 dataset from the UCR Timeseries Classification Archive47, which was originally formatted by Olszewski60. The dataset contains a total of 200 ECG signal data having class labels corresponding to normal and abnormal heartbeats. The dataset consists of \(N_{\mathrm{train}} (=100)\) training data and \(N_{\mathrm{test}} (=100)\) testing data, all of which have length \(L_{\mathrm{data}} (=96)\). The training dataset includes 31 abnormal and 69 normal data. The testing dataset includes 36 abnormal and 64 normal data.

Spoken digit recognition

This task was performed with the NIST TI-46 Word corpus collected at Texas Instruments in a quiet acoustic enclosure using an Electro-Voice RE-16 Dynamic Cardioid microphone at 12.5 kHz sample rate with 12-bit quantization50. From each time series data, the main sound signal of length \(L_{\mathrm{data}}\) were extracted by removing the silence part with the VOICEBOX, a Speech Processing Toolbox for MATLAB61. The length \(L_{\mathrm{data}}\) differs depending on the data, ranging from 48 to 102. As shown in Fig. 6a, each sound signal was transformed into a cochleagram based on the Lyon’s passive ear model48 implemented with the Auditory Toolbox in MATLAB62. The cochleagram is represented as a matrix \(P \in {\mathbb {R}}^{N_{\mathrm{f}} \times L_{\mathrm{data}}}\) where \(N_{\mathrm{f}}\) is the number of frequency channels. It was converted to a masked data \(P_{\mathrm{mask}}=QP \in {\mathbb {R}}^{S_{\mathrm{mask}} \times L_{\mathrm{data}}}\), where \(Q \in {\mathbb {R}}^{S_{\mathrm{mask}} \times N_{\mathrm{f}}}\) represents a binary mask of elements 0 and 1. We set \(N_{\mathrm{f}}=78\). Each of the \(L_{\mathrm{data}}\) column-wise sequences of length \(L_{\mathrm{in}} (=S_{\mathrm{mask}})\) was converted to a voltage signal by scaling and interpolation, and then fed into the memristive reservoir.