Fig. 3: Spoken-digit recognition demonstration. | Nature Communications

Fig. 3: Spoken-digit recognition demonstration.

From: Dynamic memristor-based reservoir computing for high-efficiency temporal signal processing

Fig. 3

a Left: typical audio waveform of digit 9 pronounced by a female speaker. Right: cochlear spectrum (64 channels per frame) of the corresponding audio waveform. The channel values for each frame are transferred to the time domain with a duration of τ. b Time multiplexing process. In each interval of duration τ, the spectrum signal is multiplied by a mask matrix (64 × M) containing randomly assigned binary values (−1 and 1) to generate the input voltage sequence with a fixed time step δ (δ = 120 μs) equal to 1/M of τ, where M is the mask length. The similar process repeats by N times with different mask matrices in order to mimic N-parallel RC system. c During each time duration, the dynamic memristor response is recorded. The device current is first converted into voltage through the load resistor and then amplified and collected by the amplifier and ADC. After that, the N times memristor responses in each duration of τ are combined into the reservoir states for subsequent classification. d Predicted results obtained from the memristor-based RC system versus the correct outputs, where the word error rate is as low as 0.4%. The two parameters M and N of the RC system are set to be 10 and 40, respectively. Color bar represents the normalized probability of each predicted result under the correct output. e Word error rate as a function of the mask length M, where the total reservoir size (M × N) remains constant at 400. Similar to the waveform classification task, the average of word error rate reaches the lowest value when M = 10. The error bar represents the variation between devices.

Back to article page