Toward a generalized Bienenstock-Cooper-Munro rule for spatiotemporal learning via triplet-STDP in memristive devices

The close replication of synaptic functions is an important objective for achieving a highly realistic memristor-based cognitive computation. The emulation of neurobiological learning rules may allow the development of neuromorphic systems that continuously learn without supervision. In this work, the Bienenstock-Cooper-Munro learning rule, as a typical case of spike-rate-dependent plasticity, is mimicked using a generalized triplet-spike-timing-dependent plasticity scheme in a WO3−x memristive synapse. It demonstrates both presynaptic and postsynaptic activities and remedies the absence of the enhanced depression effect in the depression region, allowing a better description of the biological counterpart. The threshold sliding effect of Bienenstock-Cooper-Munro rule is realized using a history-dependent property of the second-order memristor. Rate-based orientation selectivity is demonstrated in a simulated feedforward memristive network with this generalized Bienenstock-Cooper-Munro framework. These findings provide a feasible approach for mimicking Bienenstock-Cooper-Munro learning rules in memristors, and support the applications of spatiotemporal coding and learning using memristive networks.

B rain-inspired neuromorphic computing systems are attracting strong interest because of their massive parallelism, high energy efficiency, good error tolerance, and good ability to implement cognitive functions [1][2][3][4][5][6] . Hardware implementations of neuromorphic computing can take advantage of novel nanodevices that emulate the biological synapses with inherent learning functions [7][8][9][10][11][12][13] . The two-terminal memristor is widely recognized as a promising technology with which to mimic the biological synapse because of its functional resemblance to the biological counterpart [14][15][16][17][18][19] . The biorealistic realization of synaptic plasticity in the memristor is considered to be an important step toward realizing an artificial synapse with high accuracy. There have been many efforts to demonstrate basic synaptic learning functions using single and paired spikes, for example, long-term/ short-term plasticity, spike-timing-dependent plasticity (STDP), and paired-pulse facilitation (PPF)/depression [20][21][22][23][24][25][26][27] . In fact, the stimulation mode of a spike train that contains plentiful spikes is a more general case than the single spike or paired spikes, and is produced by a neuron receiving multiple spikes from other connected neurons 28 . The information contained in a spike train allows specific advanced plasticity within a synapse that is referred to as spike-rate-dependent plasticity (SRDP) 29 .
The Bienenstock-Cooper-Munro (BCM) learning rule is an important type of SRDP beyond the Hebbian learning rule and describes history-dependent synaptic modification. In the BCM framework, the high/low spike rate of a train can result in the potentiation/depression of the synaptic weight depending on whether the spike rate is higher than a threshold (θ) [30][31][32][33] . For the memristor-based artificial synapse, several groups have demonstrated BCM rules using rate-based presynaptic spikes, which have led to advances in the field [34][35][36] . These results show that the absolute change in the synaptic weight (i.e., the conductance change of the memristor, |ΔG c |) has a monotonic dependence on the spike rates in both the depression region (ΔG c < 0) and potentiation region (ΔG c > 0). However, such a monotonic change is different from the original BCM rule in neurobiology; that is, there should exist non-monotonic behavior (i.e., an enhanced depression effect (EDE)) in the depression region [30][31][32][33]37,38 . Additionally, previous memristor studies lack the following essential features: first, the lack of a multiplicative term between presynaptic and postsynaptic activities, and second the short-term modification [34][35][36] . This also marks a significant inconsistency with the biological BCM learning.
According to a theoretical model of Pfister et al., it is expected that the use of triplet-STDP, instead of common rate-based presynaptic spikes, allows this issue to be solved 39,40 . Furthermore, the BCM rule can be generalized by the long-term triplet-STDP, thereby allowing higher-order spatiotemporal recognition in the visual cortex, for example, rate-based orientation selectivity 39 . Triplet-STDP means that a third spike, either presynaptic or postsynaptic, is introduced into the standard pair-STDP 33,[40][41][42] . Importantly, in addition to the paired term contribution in pair spikes, a previous spike (presynaptic or postsynaptic) also causes the contribution of a triplet term in the triplet-STDP 33,41,42 . The relationship between the paired term and triplet term contributions provides the multiplicative correlations between presynaptic and postsynaptic activities, which is an essential requisite for BCM learning. There are two types of triplet-STDP in neuroscience: the first-spike-dominating model and last-spike-dominating model proposed by Froemke et al. and Wang et al.,respectively 41,42 . Progress has been made in emulating these two types of triplet-STDP using first-order and second-order memristors 36,[43][44][45] . However, the generalization from triplet-STDP to the BCM learning rule has not yet been experimentally demonstrated in memristors. Additionally, high-order spatiotemporal recognition that relies on generalized BCM learning rules has rarely been reported.
The present work presents the demonstration of generalized BCM learning rules using the last-spike-dominating triplet-STDP in a WO 3−x -based second-order memristor. The second-order memristor has physical behavior similar to Ca 2+ dynamics in the bioneural network, which allows the emulation of rate-based plasticity naturally 16,34,46 . The EDE, which was typically missing in previous studies, is achieved using a long-term triplet-STDP scheme. Our experimental results are highly consistent with the mathematical model of the BCM framework in a biological system. Additionally, rate-based orientation selectivity is demonstrated on the basis of such a generalized triplet-STDP-based BCM learning rule, showing its strong potential in high-order spatiotemporal recognition.

Results
Motivation and WO 3−x second-order memristor. Figure 1a depicts the motivation for a biologically plausible triplet-STDPbased BCM learning rule in memristive hardware. Generally, neurons in a neurophysiological system interact by exchanging spike trains, that is, sequences of pre-and postsynaptic spikes. Because of the diversity of the spike trains, the synaptic modification is not only determined by the timing between paired spike, as in the case of common STDP, but also affected by the details of the spike pattern, such as the spike train rate 23,29,47 . A relatively simple model of pattern-dependent plasticity is triplet-STDP 41,42 . Typical BCM learning rules can then be generalized based on triplet-STDP, which also allows rate-based orientation selectivity for high-order spatiotemporal functions.
To demonstrate the aforementioned synaptic functions, we considered the WO 3−x -based second-order memristors illustrated in Fig. 1b, which consist of a Pt/WO 3−x /W sandwich structure prepared in crossbar arrays using a sputtering deposition technique. Figure 1c shows the I-V curve of the WO 3−x -based memristor, gradually increases and decreases with positive-and negative-bias sweeps on the top Pt electrode. The conductance G c can be regarded as the synaptic weight. The asymmetric I-V curve of Fig. 1c also indicates the existence of Schottky barrier at Pt/ WO 3−x interface. This memristive behavior is similar to the nonlinear transmission of a biological synapse, thereby allowing for a continuous adjustment of synaptic weight. Analogous to biological spikes, a series of positive and negative pulses elicits the consecutive potentiation and depression of our memristive synapse, as shown in Fig. 1d. Additionally, the potentiation state can spontaneously decay to a middle state, as shown in Fig. 1e, which is a clear indicator of the second-order memristor (Supplementary Fig. 1 and Note 1) 16,17,34 . Such behavior is equivalent to a transition from short-term plasticity to long-term plasticity by repeated stimulation 21 . The memristor dynamics are caused by oxygen-ion diffusion that resembles biological Ca 2+ dynamics 15,16,21,24 . Du et al. previously developed a WO x memristive device with a Pd/WO x /W structure where the switching mechanism was attributed to the modification of a relative area of the conducting channel 34 . In our work, instead, the Schottky barrier of the Pt/WO 3−x interface and its modulation induced by the drift and diffusion of oxygen ions can account for the memristive mechanism (Supplementary Fig. 1 and Note 1). The different device fabrication methods and electrodes may be responsible for the different memristive behaviors. A similar Schottky barrier based memristive mechanism was reported in our previous work and other literature 22,36,44,48 .
Synaptic adaptation function emulated by rate-based postsynaptic spikes. The synaptic adaptation function was mimicked using a second-order memristive effect in the WO 3−x device. Figure 2a shows the current response of the memristive device under a single presynaptic spike [2 V, 10 ms]. The figure shows that the presynaptic spike triggers an abrupt increase in current followed by a decay to the initial state within 400 ms, which is similar to the behavior of the excitatory postsynaptic current (EPSC) of the biological synapse 21,24 . The similarity with the biological EPSC behavior is caused by oxygen-ion drift and diffusion in our memristor paralleling the influx and extrusion of Ca 2+ through the synaptic cell 24,49 . It is noted in Fig. 2a that there is not an abrupt decrease in current observed, when the stimulation pulse of 2 V is changed to the monitoring pulse of 0.2 V. One of the possible reasons is the electric double-layer (EDL) capacitance, which is discussed in a recent paper by Yang et al. 50 . In a biological system, the Ca 2+ dynamics allow correlation between paired spikes, where the residual Ca 2+ induced by the first spike enhances the overall Ca 2+ concentration generated by the second spike, thus resulting in PPF. The PPF function is critical for a synapse to make correlations between the temporal spike pair. Based on the similarity between oxygen-ion dynamics and Ca 2+ dynamics, we can demonstrate PPF, as shown in Fig. 2b. When the second stimulation comes before the first EPSC disappears completely, their overlap can effectively suppress the diffusion of oxygen ions. This promotes the more effective All the data of G c were collected 1 s after the stimulation using a reading pulse [+0.2 V, 50 ms], which represents the long-term plasticity. e Spontaneous decay of conductance G c after the potentiation process. This decay process may be related to the diffusion of oxygen ions, which is an indicator of a second-order memristor. Here, the device was operated from an initial conductance G i of 0.1 µS. accumulation of oxygen ions on the Pt/WO 3−x interface, which leads to a larger conductance change. Thus, the peak value of EPSC induced by the second spike P2 is clearly higher than that of the first-spike P1, and a longer interval between presynaptic spikes gradually decreases the facilitation of the second spike ( Supplementary Fig. 2). In fact, the EPSC evoked by a single spike results in a temporary effect (i.e., short-term plasticity), whereas the PPF with sufficiently large spikes results in a permanent effect (i.e., conversion from short-term plasticity to long-term plasticity). The PPF effect can also be extended to the SRDP using spike trains instead of spike pairs, as shown in Fig. 2c. In this case, the increase in EPSC amplitude depends on not only the presynaptic spike number but also the spike rate. Spikes with a higher rate (i.e., a shorter interval) result in a much larger EPSC amplitude, similar to the case of SRDP in the biological synapse [51][52][53] . History-dependent plasticity is an important behavior for the synaptic adaptation function. To verify the feasibility of historydependent plasticity for our memristor, we applied a pulse [−2V, 10 ms] to the bottom electrode as the postsynaptic spike in Fig. 2d, including four distinct phases. In the first phase, a postsynaptic spike train at a relatively high frequency (50 Hz) increased the conductance in the memristor. In the second phase, the lower frequency (10 Hz) of the spike train decreased the conductance (i.e., synaptic depression). A spike train of 5 Hz in the third phase and a final spike train of 10 Hz applied in the fourth phase increased the conductance again (i.e., synaptic potentiation). It is interesting to note that the second and fourth phases induced opposing conductance changes even though the same spike trains of 10 Hz were used. In this experiment, the conductance state after stimulation at 50 and 5 Hz can be regarded as the experienced conductance (G 0 ); that is, different 'experiences' or 'histories.' The postsynaptic spike train of 10 Hz can thus induce depression or potentiation depending on the experienced G 0 activated by higher (50 Hz) or lower (5 Hz) frequencies, thereby indicating the realization of historydependent plasticity.
The postsynaptic spike-rate-based synaptic adaptation function was further implemented in WO 3−x memristors by changing the learning experience, hence G 0 , as illustrated in Fig. 2e. Spiking at relatively low and high frequencies generally results in depression and potentiation, respectively, in line with the results in Fig. 2d. Additionally, the value of G 0 dictates the threshold rate θ, which shows a formalistic similarity with the sliding threshold effect of the BCM rule in [30][31][32][33] . For this reason, previous studies demonstrated the BCM learning by the implementation of results similar to Fig. 2e [34][35][36] . Strictly speaking, however, such synaptic adaptation function is not fully equivalent to the biological BCM rule, given three fundamental differences between them: (i) The observed synaptic adaptation genrally consists of a temporary short-term modification of the synaptic weight, while BCM learning should refer to long-term potentiation/depression (LTP/ LTD) according to the literature [30][31][32][33] . (ii) In BCM learning, the synaptic weight is a function of both the presynaptic and Herein, the experienced G 0 was activated to three levels starting from a fixed initial conductance of G i = 0.1µS. For the case of G 0 = 0.5 µS and 1.0 µS, the stimulation was conducted using a pulse amplitude of −2V and pulse width of 10 ms with postsynaptic spike rates of 20 Hz and 50 Hz, respectively. The peak of temporary conductance potentiation (G peak ) was collected to calculate ΔG c = G peak − G 0 , which represents a type of short-term plasticity. f Biological BCM curve. The vertical and horizontal axes are the weight modification φ(c) and postsynaptic firing rate c, respectively. According to the BCM function of φ BCM , the depression/ potentiation of φ(c) occurs as the firing rate is lower/higher than modification threshold θ m . In particular, the parameter θ m is adaptive to the experienced activity: the synapse which experienced a period of inactivity would follow the blue curve with smaller θ m , whereas the synapse that experienced a period of enhanced activity would follow the red curve. Reproduced with permission 32 . Copyright 2012, Nature Pub. Group. The red shaded area indicates an EDE of | ΔG c | in the depression region, which was usually absent in previous memristor-based BCM studies.
postsynaptic neuron activities and also requires a multiplicative relationship between them. On the other hand, only postsynaptic spikes were used in the synaptic adaptation function implemented in our work and previous studies [34][35][36] . (iii) Note that conductance change ΔG c is a monotonic function of the spike rate in both the depression and potentiation regions, which is obviously different from the biological BCM model of Fig. 2f 32 .
In the depression region, the depression effect should first enhance (i.e., |ΔG c | increases) at a relatively low spike rate and then weaken as the spike rate increases; that is, the EDE in the depression region (the red shaded area in Fig. 2f) was missing in the existing implementation of the BCM rule using rate-based postsynaptic spikes (Fig. 2e) [34][35][36] . The absence of the EDE can be understood in that there are only two competitive factors that determine synaptic change ΔG c : the spontaneous forgetting effect of the experienced G 0 and the potentiation effect induced by the presynaptic spike train [34][35][36] . There is no other rate-based depression effect that can assist the forgetting effect to induce the EDE. It is essential to remedy this EDE region in memristors to closely approximate the biological synapse; however, there is a lack of related studies.
Triplet-STDP and generalized BCM learning rule. Different from previous implementations using common rate-based postsynaptic spikes, in theory, the BCM learning rule can be closely replicated using a generalized model based on triplet-STDP 39,40 . This generalized triplet-STDP-based BCM rule is experimentally demonstrated in our memristive devices. The long-term paired-STDP was first implemented on an experimental basis and as a comparison for triplet-STDP. For the paired-STDP, the time delay between paired spikes (i.e., Δt = t postt pre ) determines whether the LTP at Δt > 0 (i.e., the presynaptic spike is earlier than the postsynaptic spike) or LTD at Δt < 0 occurs, which was also demonstrated in our WO 3−x -based memristors (Supplementary Fig. 3 and Note 2). For the long-term triplet-STDP, the spike train can be assumed to be the combination of two spike pairing events, and the synaptic modification can thus be regarded as the integration of LTP and LTD processes induced by these two events 42 . Additionally, the triplet term induced by the previous spike to the paired spikes also needs to be taken into account. Figure 3a illustrates two typical triplets with 'post-prepost' and 'pre-post-pre' sequences. For the 'post-pre-post' triplet, the LTD process is induced by the first-spike pairing ('post-pre', Δt 1 < 0), which is followed by an LTP process induced by the second spike pairing ('pre-post', Δt 2 > 0). For the 'pre-post-pre' triplet, the LTP process is activated before the LTD process. Hereafter, the pair (Δt 1 , Δt 2 ) is used to denote the spike timing in triplets. This means that the 'post-pre-post' triplet always has the timing of Δt 1 < 0 and Δt 2 > 0, whereas the 'pre-post-pre' triplet has the timing of Δt 1 > 0 and Δt 2 < 0. In the triplet-STDP scheme, each spike applied on the memristor consists of a pair of pulses [V + /V − = 2 V/−2 V, 50 ms] as illustrated in Fig. 3a and in our previous work 21,22 . Refer to Supplementary Fig. 3 for details of the spike design; a delay time of 60 s was introduced to ensure the readout of long-term conductance change before and after the spikes 21 .
The synaptic modification of triplet-STDP was examined with the symmetrical spike timing of the LTP and LTD processes (i.e., |Δt 1 | = |Δt 2 |), as shown in Fig. 3b. For the 'post-pre-post' triplet, the obvious potentiation of ΔG c was observed in the case of G 0 = 0.1 µS (see the red circle in Fig. 3b), whereas the potentiation exponentially decreased with increasing spike timing Δt. For the 'pre-post-pre' triplet, there was no notable change in ΔG c after the stimulation in the case of G 0 = 0.1 µS, which indicates that the LTP process was canceled by the following LTD process. In fact, the asymmetrical result of these two triplet sequences cannot be understood by simply considering the competition of LTP and LTD in the paired-STDP model 39,40 . In that interpretation, the LTP and LTD processes should counteract each other in both triplets regardless of their sequence. Interestingly, this result presented a demonstration of the last-spike-dominating triplet-STDP, and a similar finding was reported in a neurobiological study published by Wang et al. 42 . Furthermore, the effect of the long-term learning experience (i.e., different G 0 measured in long-term plasticity) on triplet-STDP can be observed in Fig. 3b. Our previous work indicates that the stimulation from an intermediate state with long-term learning experience can induce a larger ΔG c 21 . The physical mechanism may be related to the metastable local structure (e.g., unstable interstitial oxygen ions) with a lower energy barrier for further defect migration 21 . For the case G 0 = 0.5 µS, the potentiation was reduced overall, even to the extent that the depression appeared for a relatively long spike timing in the 'post-pre-post' triplet, whereas the depression strengthened for higher G 0 in the 'prepost-pre' triplet. In particular, when G 0 increased to 3.0 µS, ΔG c in the 'post-pre-post' triplet was no longer a monotonic function of the spike interval (Δt 1 = Δt 2 ). The potentiation of ΔG c happened for a relatively short interval time (Δt < 8 ms), while a longer interval time induced the depression of ΔG c , as illustrated by the green-diamond data in Fig. 3b. This behavior is similar to the biological BCM learning rule when converting the interval time into the frequency 39,40 . This provides an experimental foundation with which to remedy the absence of the EDE region in the BCM learning rule, which will be demonstrated later. As discussed previously, there are only two competitive factors in determining ΔG c in the BCM rule implemented using common rate-based postsynaptic spikes, which leads to the absence of the EDE region. By contrast, an additional depression effect of the LTD process is introduced in triplet-STDP. Together with the forgetting effect of experienced G 0 and the potentiation effect of the LTP process, there is a third competitive factor in triplet-STDP. With the help of the forgetting effect of high G 0 , this third factor, that is, the EDE of the LTD process ( Supplementary Fig. 4), could make up for the absence of the EDE region, as illustrated in Fig. 2f. Meanwhile, the asymmetrical spike-timing intervals played a critical role in the competition of LTP and LTD processes, as shown in Fig. 3c. In the 'post-pre-post' case, ΔG c transformed from potentiation to depression as the interval of the LTP process (Δt 2 ) increased from 10 to 120 ms whereas the interval of the LTD process (Δt 1 = −70 ms) was kept constant. Similarly, in the 'prepost-pre' case, the depression of ΔG c weakened with increasing |Δt 2 | and while keeping Δt 1 = 70 ms constant. Figure 3d, e summarize the data of the last-spike-dominating triplet-STDP with varying Δt 1 and Δt 2 using a colored background to show ΔG c . In addition to the mentioned 'post-pre-post' and 'pre-post-pre' cases, four other patterns (i.e., two pre-spikes (postspikes) occur before and after a single post-spike (pre-spike)) were included, as shown in quadrants I and III of Fig. 3d, e. Generally, the potentiation effect was observed in quadrant I because the prespike was always applied before the post-spike regardless of the spike number, similar to the LTP process of paired-STDP. By contrast, the depression effect was dominant in quadrant III, similar to the LTD process of paired-STDP. For quadrants II and IV, the synaptic weight G c can be switched from depression to potentiation by adjusting the timing of Δt 1 and Δt 2 , which is in accordance with the results of Fig. 3b, c. The results shown in Fig. 3d, e also indicate the possible dependence of ΔG c on the interval time between two postsynaptic spikes Δt o = t' post − t post and that between two presynaptic spikes Δt r = t' pre − t pre , where t post , t' post , t pre , and t' pre are the times of the two postsynaptic spikes and two presynaptic spikes. In fact, the dependence of ΔG c NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-020-15158-3 ARTICLE NATURE COMMUNICATIONS | (2020) 11:1510 | https://doi.org/10.1038/s41467-020-15158-3 | www.nature.com/naturecommunications on both the presynaptic spike rate ρ x = 1/Δt r and postsynaptic spike rate ρ y = 1/Δt o can be expected, which is also an essential feature in triplet-STDP 29,39-42 . The dependence of LTP/LTD on both the presynaptic and postsynaptic spike frequency was experimentally measured, as shown in Fig. 3f. The ρ x and ρ y share the same value for the spike frequency in this measurement.
Data indicate that the LTP increases as the spike frequency increasing from 1 Hz to 10 Hz, while the LTD decreases at increasing the spike frequency. Such dependence on both the presynaptic and postsynaptic spike frequency is consistent with the biological triplet-STDP while it fails to appear in paired-STDP 29,39-42 .  Figure 3d, e can guide us in designing a rational triplet-STDP scheme (e.g., choosing a proper relationship between Δt 1 and Δt 2 ) to fully implement the BCM learning rule. A typical example is shown in Fig. 3g, where the diagonal line of quadrant II (i.e., the 'post-pre-post' triplet with |Δt 1 | = |Δt 2 |) is chosen, the devices with high G 0 (i.e., 0.5, 3.0, and 7.0 µS) were used, and the spike timing sum of |Δt 1 | and |Δt 2 | was treated as the frequency of postsynaptic spikes using ρ y = 1/(|Δt 1 | + |Δt 2 |). Figure 3f clearly shows the transition of ΔG c from depression to potentiation with the spike rate increasing to a threshold value. In particular, the EDE region in the BCM, which was typically absent in previous implementations using the common rate-based postsynaptic spikes [34][35][36] , was demonstrated for our memristive synapse. Furthermore, the threshold sliding effect with different learning experiences is shown in Fig. 3g, in which the increase in G 0 from 0.5 to 7.0 µS resulted in a rise of the threshold frequency from 40 to 90 Hz. The forgetting effect thus strengthened with learning experience G 0 increasing to larger values. Our results are therefore in good agreement with the BCM curve of Fig. 2f, thereby indicating the close reproduction of the BCM learning rule. In fact, if using asymmetrical spike timing with a fixed difference (e.g., |Δt 1 | − |Δt 2 | = 20 ms) in quadrant II, the triplet-STDP-based BCM rule can also be demonstrated ( Supplementary  Fig. 5). It is believed that the aforementioned triplet-STDP methods can also be used to obtain the generalized BCM learning rule in other second-order memristors. It is necessary to mention that there are still certain differences between the memristive implementation and the real biological BCM rule. For instance, in the biological BCM rule there is no synaptic weight change for ρ y = 0 and for any value of ρ x , which is instead not the case in our memristive implementation (Fig. 3g). This is mainly because the equivalent pre-/postsynaptic pulses were designed to be equal to simplify the circuit operation, which may be different from the real biological synapse. Anyway, the realization of generalized triplet-STDP-based BCM rule with a long-term nature in memristors has potential applications in high-order spatiotemporal pattern recognition.
Simulation of spatiotemporal selectivity. The realization of high-order spatiotemporal functions in memristive devices is a critical step in building a neuromorphic system that mimics the biological brain. Sun et al. and Wang et al. demonstrated the localization of sound by establishing correlations between the sound azimuth and spike timing 54,55 . Spatiotemporal selectivity (e.g., rate-based orientation selectivity) is another type of classical spatiotemporal function in the visual cortex; however, it has not yet been addressed in memristors. According to the studies of Pfister et al. 39,40 , rate-based orientation selectivity can be expected, following the results of long-term triplet-STDP-based BCM learning rules. A mathematical simulation of the experimental data of the BCM rule is necessary to build the neural network for spatiotemporal selectivity. The characteristics of ρ x -and ρ ydependence (Fig. 3f, g) provided the experimental basis for relating triplet-STDP to the BCM learning rule. The mathematical model of BCM learning can be expressed by refs. 23,39,40,56,57 : where ρ x , ρ y are the rates of presynaptic spikes and postsynaptic spikes, respectively and θ is the threshold rate. Depression of the synaptic weight (ΔG c < 0) occurs if φ(ρ y < θ, θ) < 0, whereas potentiation of the synaptic weight (ΔG c > 0) occurs if φ(ρ y > θ, θ) > 0. Finally. there is no synaptic change for φ(0, θ) = 0. Furthermore, by relating triplet-STDP to the BCM rule, the change in synaptic weight can be expressed as 39,40,56,57 where A þ 2 and A À 2 are the amplitude parameters of the paired term contribution for potentiation and depression, where a presynaptic spike triggered before and after a postsynaptic spike can induce LTP and LTD following the classical paired-STDP. The interval time between presynaptic spike and postsynaptic spike is not substantially longer than τ + and τ − . As proposed in the literature 39,40,56 , the presence of a previous postsynaptic spike causes the potentiation contribution of triplet term A 3 + in addition to the paired term in the 'post-pre-post' triplet. The interval between these two postsynaptic spikes should be in a time window of τ y . Similarly, for the depression contribution of the triplet term, A À 3 and τ x are the amplitude parameter and interval window of two postsynaptic spikes, respectively. All these parameters were extracted from our experiments on paired and triplet spikes (Supplementary Fig. 6 and Note 3). Following an approach reported in the literature 40 , the minimal triplet rule was considered by setting A þ 2 = 0 µS, A À 2 = 0.02 µS, A þ 3 = 0.96 µS, A 3 − = 0 µS, τ + = 38 ms, τ − = 30 ms, τ x =16 ms, and τ y = 0 ms as parameters in the present study (Supplementary Table 1). Additionally, the sliding threshold effect with different experienced G 0 is a typical history-dependent property of the BCM learning rule, as mentioned previously. To simulate the sliding threshold effect, threshold θ differentiating potentiation and depression can be expressed as 39,40,54 where coefficient ρ 0 and index p were set to 10 Hz and 2, respectively, for the calculations. Following the literature 39,40,56 , the dependence of A þ 2 and A À 2 on the mean firing rate 〈ρ y 〉 was introduced by setting A þ 2 → A þ 2 〈ρ y p 〉/ρ 0 p and A À 2 → A À 2 〈ρ y p 〉/ ρ 0 p . ρ y p and ρ 0 p denote the expectation of the p th power of ρ y Fig. 3 Demonstration of triplet-STDP and its related BCM learning rules in Pt/WO 3−x /W memristors. a Schematic of the typical 'post-pre-post' and 'pre-post-pre' triplets, which can be simplified as a superposition of the LTP and LTD processes. Each pre-or postsynaptic spike comprises two pulses with amplitude V + /V − = 2 V/−2 V and duration 50 ms. Taking the former as an example, the LTD process activated in the first 'post-pre' pair with spike timing of Δt 1 < 0 is followed by the LTP process induced by the second 'pre-post' pair with spike timing of Δt 2 > 0. b Synaptic modification of triplet-STDP in the 'post-pre-post' and 'pre-post-pre' sequences using symmetrical spike timing |Δt 1 | = |Δt 2 |, with three levels of the initial G 0 considered as the learning experiences (i.e., 0.1, 0.5, and 3.0 µS). c Triplet-STDP with asymmetrical spike timing. Red column: Δt 1 = −70 ms, Δt 2 is from 10 to 120 ms in the 'post-prepost' sequence; Blue column: Δt 1 = +70 ms, Δt 2 is from −10 to −120 ms in the 'pre-post-pre' sequence. d, e Summaries of triplet-STDP results in our experiments, where potentiation or depression with different synaptic weights is obtained using different spike sequences and different timing intervals. The insets show the schematic of 'post-pre-post' and 'pre-post-pre' sequences. Here, both the size of symbols and the background color indicate the magnitude of ΔG c . A relatively high G 0 of 3.0 µS was adopted for the measurements of Fig. 3 (d, e) to highlight the history-dependent characteristics. f The dependence of ΔG c on both the presynaptic spike rate ρ x and postsynaptic spike rate ρ y . The schematic of the operation signal is shown in the inset, in which three pairs of presynaptic spike and postsynaptic spikes were used. g Triplet-STDP-based BCM learning rules with the EDE in the low-frequency region and the threshold sliding effect, which is highly similar to the biological BCM curve. and ρ 0 , respectively. For our experimental data, this dependence may be related to the experienced G 0 . As a result, the calculated curve of the BCM theory well fits the experimental data for each G 0, as illustrated in Fig. 4a, and the threshold sliding effect induced by tuning G 0 is satisfactorily simulated. The calculated correlations of ΔG and the postsynaptic spike rate ρ y allow the accurate simulation of neural networks for spatiotemporal patterns.
To extend the triplet-STDP-based BCM model to the application of spatiotemporal patterns, we adopted a two-layer neuromorphic feedforward network, as schematically shown in Fig. 4b. Rate-based orientation selectivity, as the typical input selectivity of spatiotemporal patterns, was numerically simulated using this network following BCM theory. The first layer of presynaptic neurons generate the presynaptic spikes in correspondence of the presentation of a visual pattern, whereas the second layer of postsynaptic neurons generate postsynaptic spikes upon excitation of the first layer. In our network, 81 presynaptic neurons (in a 9 × 9 array) and four postsynaptic neurons (with specific number n = 1, 2, 3, 4) were used as the input and output layers, respectively. As a result, there are 81 × 4 = 324 connections (i.e., synapses) between the two layers of the network. Generally, synapses are trained using presynaptic spikes from input neurons with spike rate ρ x and a synchronous spike train from the postsynaptic neuron with rate ρ y . At each learning epoch, four patterns, each containing 81 pixels, with different orientation bars (i.e., 0°, 45°, 90°, and 135°) and four random noise patterns were presented to the presynaptic neurons. The presentation of the orientation patterns and noise patterns had equal probability. For the orientation patterns, only the 9 pixels of these four orientation bars had a high spike rate with a Poisson distribution (i.e., 〈ρ x 〉 = 40 Hz , black), whereas the other 72 pixels had a low spike rate with a Poisson distribution (i.e., 〈ρ x 〉 = 10 Hz , white). For the noise inputs, nine pixels were randomly selected with a high spike rate, and the others had a low spike rate. Four ρ n y corresponded to these four postsynaptic neurons. Every ρ n y was updated in real time after each pattern or noise according to synaptic weight G m n in the last pattern or noise using the equation ρ n y ¼ P 81 m¼1 ρ x;m G n m (m = 1, 2, … 81, n = 1, 2, 3, 4) 58,59 . To achieve the selectivity of multiple orientations, the four ρ y n were compared after each pattern or noise submission. Only the specific postsynaptic neuron with the maximal ρ y n could send the fire postsynaptic spikes following the winner-take-all rule, thereby modifying the synaptic weight combined with presynaptic spikes (Supplementary Fig. 7). Correspondingly, the threshold θ n was also updated with ρ y n according to Eq. (3).
All the synapses started from a stochastic initial state with low G 0 . Considering the 1st postsynaptic neuron as example, the G 1 evolution of the extracted synapses only for the four orientations with the learning epoch is shown in Fig. 4c. During the increasing epochs, once a specific orientation was randomly selected by the 1st postsynaptic neuron with strong potentiation, there would be a corresponding higher ρ y 1 > θ 1 . Meanwhile, the relatively low ρ y 1 < θ 1 was introduced for the other three orientations, thereby leading to depression. Figure 4d shows the ρ 1 y evolution of the four orientation bars as a function of the learning epochs. From the results of Fig. 4c, d, G 1 and ρ y 1 of the 45°orientation bar clearly increase after thousands of learning epochs, which suppressed the values of the other three orientations. Eventually, the orientation of 45°was selected for the 1st postsynaptic neuron, as shown in Fig. 4e-I, in which the conductance of the nine pixels for the 45°orientation bar was obviously higher than the others. Simultaneously, although the other three orientations were suppressed in the first postsynaptic neuron, they were selected without supervision by the other three postsynaptic neurons, thereby generating strong potentiation with higher ρ y n (n = 2, 3, 4). As shown in Fig. 4e-II-e-IV, the orientations of 0°, 90°, and 135°were selected by the 2nd, 3rd, and 4th postsynaptic neurons, respectively, which verifies the feasibility of complete rate-based orientation selectivity in this study. The above results suggest the potential application of the spatiotemporal patterns in our memristors.

Discussion
We demonstrated a generalized triplet-STDP-based BCM learning rule using a WO 3−x -based second-order memristor. Compared with the BCM rules realized by common rate-based presynaptic spikes, the EDE region missing in previous studies was found in our experimental data. A typical threshold sliding effect that depended on the learning history was also obtained. Furthermore, rate-based orientation selectivity was demonstrated in a feedforward network based on the generalized BCM framework in our memristors by simulation, which indicated its potential feasibility for high-order spatiotemporal patterns. It is noted that there are still certain limitations to a full implementation of the BCM learning at the synaptic level using memristors. For instance, the device physics and signal design may bring differences from the biological synapse, such as the spike-timing region, LTP/LTD window, and specific biological features. Further studies are still required to solve these above limitations toward a fully bio-mimetic BCM rule. It is believed that our study makes a progress towards the biorealistic mimicking of BCM learning rules in memristive synapses and paves the way for the application of memristors to spatiotemporal patterns in the future.

Methods
Device fabrication. Memristors with a Pt/WO 3−x /W sandwich structure were fabricated on SiO 2 /Si substrates and patterned into a crossbar array with a junction area of 50 × 50 µm 2 using a metal mask, as shown in Fig. 1b. Both the W bottom electrode and WO 3−x film with thicknesses of 100 and 80 nm were subsequently prepared by RF sputtering using a metal W target. The WO 3−x film was grown under a gas pressure of 2 Pa (Ar:O 2 = 3:1) at 200°C. The 80-nm-thick Pt top electrode was fabricated through electron-beam evaporation.
Electrical measurements. Memristive properties were measured using a self-built test system comprising a sourcemeter (2636A, Keithley), arbitrary function generator (3390, Keithley), oscilloscope (TDS 2012B, Tektronix), and probe station (TTPX, Lake Shore). The positive direction of the bias voltage was defined such that the current flowed from the top to the bottom electrode. To measure the EPSC, the memristor was connected with a load resistor R load of 1 MΩ in series, and the voltage drop across the R load was monitored by an oscilloscope. Then, the monitored voltage was converted to the current flowing through the memristor. To implement pair-or triplet-STDP, each pre-or postsynaptic spike applied to the top or bottom electrode was composed of a pair of pulses with amplitude V + /V − = 2 V/-2 V and a width of 50 ms. The initial and final conductance states of the device (G i and G final ) were readout using a small pulse [0.2 V, 50 ms] before and after applying the programmable pulses, and the conductance change was defined as ΔG c = G final − G i . For the experienced G 0 , ΔG c = G final − G 0 . Both the writing and reading of the memristor were performed in pulse mode.

Data availability
The data that support the plots within this paper and other findings of this study are available from the corresponding author upon reasonable request.