Data Clustering using Memristor Networks

Memristors have emerged as a promising candidate for critical applications such as non-volatile memory as well as non-Von Neumann computing architectures based on neuromorphic and machine learning systems. In this study, we demonstrate that memristors can be used to perform principal component analysis (PCA), an important technique for machine learning and data feature learning. The conductance changes of memristors in response to voltage pulses are studied and modeled with an internal state variable to trace the analog behavior of the device. Unsupervised, online learning is achieved in a memristor crossbar using Sanger’s learning rule, a derivative of Hebb’s rule, to obtain the principal components. The details of weights evolution during training is investigated over learning epochs as a function of training parameters. The effects of device non-uniformity on the PCA network performance are further analyzed. We show that the memristor-based PCA network is capable of linearly separating distinct classes from sensory data with high clarification success of 97.6% even in the presence of large device variations.


Device Modeling
. Energy barrier of ion hopping process.
The growth rate of the state variable, w, is determined by ion hopping process over an energy barrier as shown in figure S1. It can be written as [S1-S3]: where d is half of the average hopping distance of ions, f is the attempt frequency, q is the charge of an electron, U is the activation potential energy, k is the Boltzmann's constant, T is the temperature in Kelvin, α 1 and α 2 are barrier lowering coefficients [S1], and f(w,V) is a window : An oxygen vacancy U α 1 V α 2 V 3 function to account for the non-linear response to the applied voltage [S2]. The window function used in this paper is shown in equation S2: Where u() is the Heaviside step function. By plugging equation (S3) into equation (S1), the rate equation of w can be re-written as: The current through the device described by equation (2) consists of tunneling-dominated conduction and Schottky-dominated conduction. The tunneling current can be calculated by assuming MIM structure with very thin insulator so that the tunneling current is observed. Using the expressions for a square barrier [S4, S5, S6], the current can be derived as: A is the filament area, m is the effective electron mass, h is Plank's constant, φ 0 is the barrier height at zero bias, d is the tunneling distance, and α = 4 √2 ℎ .
At low bias, equation (S5) and (S6) can be simplified as: By plugging equation (S7) and equation (S8) into the equation (S4), For the Schottky junction current is explained by equation (S10).
where q in the charge of an electron, A is the dimension of the device, D is the diffusion coefficient, n is the number of electrons, L is the diffusion length of electrons, V is the applied voltage, is an ideality factor, k is the Boltzmann's constant and T is the temperature in Kelvin.
The total current includes the parallel contribution from the filament region (with relative area w) and Schottky region (with relative area 1-w). Combining Eqs. (S9) and (S10), one obtains: Where γ, δ, α, β corresponds to the parameters defined in (S9) and (S10).  Figure S2 shows the analog conductance changes of the 18 memristor devices forming the network. Device-device variations are clearly observed which causes conductance discrepancy 7 after potentiation/depression pulses. To verify the effect of device variations on the network performance, the relevant device parameters were assumed to vary following Gaussian distributions (Table S1) and the exact value of a parameter for a given device was chosen randomly using a Monte Carlo method during the simulations. Figure 7b shows the average value and standard deviation calculated using this approach, which are consistent with the experimentally observed variations. The model with the random device variations was then applied to the network analysis and led to Figure 7d. The parameters used in the simulation are shown in Table S1. Table S1. Device parameters used in the simulation. 8

Weight normalization with Sanger's Rule
The Sanger's rule originates from Hebb's rule [S7]. For the simplicity, with the assumption that we have only one output, we can write the Hebb's rule as shown in the equation below.
( + 1) = + (S12) If we require the weights to be normalized to prevent infinite growing output of Hebb's rule, the update rule is modified as: where m is the number of inputs. Because , the update rate, is normally very small(<<1), through Taylor expansion and keeping only the leading term, (S13) becomes.
( + 1) = + ( − ) (S14) Eq. (S14) is the equation for Sanger's rule (with only 1 output. The case for more than one outputs can be derived similarly). In other words, by implementing Sanger's rule (S14), we are effectively implementing rule (S13) (again for small update rates which is satisfied in experiments) with normalized weights.