Figure 1 | Scientific Reports

Figure 1

From: Role of non-linear data processing on speech recognition task in the framework of reservoir computing

Figure 1

Principle of spoken digit recognition. (a) Audio waveform corresponding to the digit 1 pronounced by speaker 1. (b) Filtering to frequency channels for acoustic feature extraction. The signal during each time interval \(\tau \) is decomposed in \({N}_{f}\) frequency channels. The cochlear model filters each point of the audio waveform in 78 frequency channels (13 in the case of the MFCC model and 65 for the spectrogram model). The frequency channels are concatenated in intervals of duration \(\tau \) to form the filtered input. (c) The filtered input is injected in the neural network or directly used to construct the output (No neural network). The neural network is composed of \(N\) interconnected filtered inputs. (d) For each digit, the response of the neural network (or directly the filtered output) is constructed from a linear combination of neuron states \({V}_{\theta \tau ,\sigma }\) (there are 10 classifiers in total).

Back to article page