Introduction

Among all human sensors, the hearing system has withstood an accurate physical description the longest. Recent progress has revealed that hearing phenomena previously believed to be located in the CNS are the consequences of the nonlinear physics properties of the cochlea1. Here, in continuation of this work, we describe what physics principles are used to generate the biophysical and psychoacoustic hearing information along the hearing pathway up to the auditory nerve.

From a physics point of view, the transduction of external sound towards the CNS involves three components: The hearing sensor (cochlea), the attached inner hair cells (IHC) and the auditory nerve neurons (ANN) (Fig. 1). In the following, we will present exclusively data from our software implementation of the compound device (for consistency), though our hardware implementation yields essentially indistinguishable results. Our Hopf cochlea1,2,3 serves as the hearing sensor. The auditory input signal first passes a Hilbert transform to obtain the dimensionality required to drive Hopf systems that act as nonlinear amplifiers. The Hopf cochlea faithfully reflects mammalian sound processing (and beyond4): Strong enhancement of weak and compression of strong input signals, by large gain active nonlinear input amplification. Phenomena emerging from this nonlinear behavior, like combination tone and two-tone suppression laws, provide important tests for corroborating the validity of the approach. Our Hopf cochlea has an intrinsic mesoscopic design: The frequency axis is discretized into a set of sections, each section modeling the nonlinear amplification process along a region of the basilar membrane. The discretization is flexible; here, one section covers approximately a quarter octave. Each section is endowed with properties of the passive hydrodynamic behavior and an active Hopf amplifier. The active part implements the Hopf normal form5

Figure 1
figure 1

Peripheral auditory signal pathway of an AM sound with fcar = 850 Hz and fmod = 200 Hz.

Responses evoked at a place corresponding to fc = 880 Hz. Top row: Physical signal, bottom row: Fourier spectrum representation. Stages: Cochlear BM motion, inner hair cells (VIC) (both continuous signals), ANN spike trains (of two characteristic classes, see below).

Here, the vectors of the input F(t) and output z are complex variables (j is the imaginary unit) and fc = ωc/2π is the characteristic frequency of the section. μ is the tunable parameter that defines each section's distance from the Hopf bifurcation point at μ = 0. Each section is composed of a Hopf amplifier followed by a section-specific 6th-order Butterworth (low pass) filter modeling the viscous fluid losses. For the results presented below, we use the parameters as in Ref. 1; we will display the responses of the frequency channels fc = 1760 Hz and fc = 440 Hz. The responses of this cochlea are in perfect agreement with biophysical measurements, for both amplitude and phase of the propagating signal (Supplementary Information of Ref. 1).

We have complemented this cochlea by inner hair cells IHC, where the cochlear membrane state VCo(t) is linearly relayed to displacements u of the IHC cilia according to u(t) = 20 · 10−9 · VCo(t), which affect the IHC voltage VIC according to the standard IHC model6 (for the equations see the Inner hair cell (IHC) section of the Methods or the original article; we use the model's standard in vivo parameters). The dynamical role of IHC is to half-wave rectify and slightly compress the signal: on top of a frequency-dependent DC component, the output has now a slightly low-pass filtered AC component6.

The IHC signal feeds into the ANN. Biological ANN show widely divergent response properties. At first view, their extreme noisiness seems to work against their ability to convey precise hearing that crucially depends on precise timing and frequency. Our study will, however, reveal that the opposite is true and that there is a beneficial effect of noise. Biological ANN fall into two main classes7,8,9 (cf. the Classes of auditory nerves (ANN) section of the methods): High spontaneous ANN fire at a high rate even in the absence of input, whereas ANN from the other class require substantial input for firing, with a tendency to phase-lock onto the signal involving a substantial degree of jitter. On a finer level, this second class is often divided into a medium- and a low-spontaneous ANN that mainly differ in their distances to firing threshold and maximal spike rate7,9,10. Physiology is commonly held responsible for these differences: relative to the second ANN class, the first ANN class forms synapses only on distinct IHC sides11 and preferentially projects to distinct locations in the cochlear nucleus12. The two subclasses differ prominently in axon diameter. Every IHC connects to all ANN classes; a single ANN is contacted by only one of roughly 20 densely packed single synapses on an IHC tail7. On both sides of the synaptic cleft, the interaction is by voltage-dependent ion channels.

In our approach, the transmission from IHC to ANN is concatenated into a time-sampled ANN input I(tn). This input is complemented by a strong contribution of noise, strongly correlated in time, to reflect the nature of the neurotransmitter release. As a result, we chose I(tn) to have the form

where constant A has the effect of a firing threshold and where B scales IHC voltage to the evoked ANN current. ξ is exponentially correlated synaptic noise of intensity σ, independent for each transmission channel (we use the algorithm of Ref. 13, with a correlation time constant τσ [ms]. Our paradigm would, however, work equally well with white noise, though at a synapse, this would be less plausible). With this form of I(tn), noise can trigger spontaneous ANN firing at low firing thresholds even in the absence of (other) input. The correlation time of the noise was determined by matching our approach with biological data14 (cf. the INN synaptic noise correlation time section of the Methods). Following the conjecture15 that the distinguished postsynaptic potentials (sub-threshold for low-spontaneous and super-threshold for high-spontaneous ANN) are the consequence of the different biological wiring, we use A = 0 for the high- and A = −0.2 for the low- and medium spontaneous classes and ensure that low- and medium-spontaneous ANN need, in addition to the continuous part of I(t), a noise contribution ξ(t) to cross the spiking threshold. For appropriate parameter values, the membrane potential xn of Rulkov's spike-afterhyperpolarization neuron model16

reproduces the characteristic biological spike trains of the different ANN classes indistinguishably from biology. In this model, yn is a slow hyperpolarizing current, whereas constant yrs defines the resting potential. In represents the external driving current. A spike is generated every time xn attains its maximum value. Spike frequency and spike strength are controlled by the parameters γhp and ghp. Upon constant input current, the nonlinear function xn+1 = f(…) generates a (jittered) limit-cycle behavior. We use parameter values α = 3.8, yrs = −2.9, bhp = 0.5, ghp = 0.1 and be = 0.116 and modify the original timescale by a factor of ten. This yields a sampling rate of 20 kHz that is maintained throughout the compound system, to account for very fast spiking ANN and generates an almost linear I-f curve16. The typical responses of the three ANN classes (cf. Ref. 10 and the ANN rate versus level curves section of the methods) are reproduced in Fig. 2 by stimulating the map with a single tone at fc for varying input intensity at one of the three standard parameter sets of Table I (black lines). The colored lines contained in the figures demonstrate that all biologically observed profiles can be generated by the model by sweeping the parameters across intervals around the standard values, without ever running into non-physiological responses.

Table 1 Parameter values of the three ANN classes. Values in brackets correspond to the parameter variations in Fig. 2 exhibited by colored lines
Figure 2
figure 2

ANN response classes.

Upper panel: Black lines from the standard parameter values of Table I. Colored lines are from the bracketed values of parameter A (red), B (green), σ (purple) and τσ (orange). Spontaneous rates: B = 0 (blue). Lower panel: Corresponding ANN spike rates (fc = 1760 Hz). Cochlear information is relayed into ANN spike rates that take care of different dynamic ranges, but preserve the essentials of the cochlear signal (here on linear spike rate scale, in Fig. 3 last row on logarithmic scale).

Results

Compound model performance

Across the different stages of the compound system, the cochlear information is essentially preserved. In Fig. 3, the outputs of the Hopf cochlea (top panel), of the IHC (second and third panels) and of the ANN (lowest panel) are shown, for two frequency channels. For the experiments, a single tone with fixed amplitude was fed into the Hopf cochlea, sliding input frequencies from 0.2fc to 1.5fc. To cover an input range from −60 dB1V up to −10 dB1V, the experiment was repeated in steps of 10 dB. At the Hopf cochlea, the amplitude of the (single tone)-oscillation was measured; at the IHC, the amplitudes of both the AC- and the DC-components were measured. At the level of the ANN, the amplitude of the neuronal firing was measured in terms of spike rates. From these measurements it follows that all essential features of the mammalian cochlea are faithfully reproduced. The most prominent easily verifiable ones are the strong amplification of faint sounds, compressive nonlinearity of exponent one-third, left-shift of the response peaks upon an increase of the input amplitude and characteristic broadening of particularly the low frequencies for low input amplitudes1,3,5. IHC low-pass filtering (c.f. Fig. 3 fc = 1760 Hz) is accompanied by strong input sound compression (c.f. fc = 440 Hz)6. Upon feeding the IHC signal into the ANN, spikes recover the original quality of the cochlear response (Fig. 3, last row, for high-spontaneous ANN). High-spontaneous neurons show a quicker saturation for loud sounds, low-spontaneous neurons only respond above an input intensity of ~ −30 dB, thereby taking care of different dynamic ranges. On the linear spike rate scale, each class faithfully transmits the essentials of the Hopf cochlea output (Fig. 2, last panel vs. Fig. 3, first panel), but each on a dynamic range of its own. On the typical dynamic ranges transmitted, all three neuron classes fully retain the cochlear information (Fig. 2). Generated tuning curves (an often used alternative to characterize auditory response by iso-intensity tuning curves) for BM cochlea motion and for the different ANN are very similar (cf. the Cochlea and ANN tuning curves section of the methods). Moreover, they agree with the biological data9,17 that serve as the guideline for a faithful transduction from cochlea to CNS17.

Figure 3
figure 3

Output amplitudes (logarithmic scale) as a function of input frequency (linear scale).

Lines represent constant cochlear input intensities, from −60 dB1V = 1 mV (lowest) to −10 dB1V (uppermost line), in steps of 10 dB. The characteristic cochlear information is preserved across the different stages of transcription.

Suprathreshold stochastic resonance

To what extent presence of noise plays a distinguished role in achieving this performance we shall exhibit by a pitch-shift experiment18,19. If an AM sound with fcar = 850 Hz and fmod = 200 Hz is fed into the cochlea (at an amplitude of −17 dB1V), this corresponds to a pitch-shift experiment with f0 = 200 Hz, k = 4 and δf = 50 Hz, generating a perceived pitch , equivalent to a period of 4.7 [ms]1,18,19,20,21. Fig. 5 shows measurements taken at fc = 880 Hz, in the regime where the perceived pitch (measured as the first most prominent peak of the ISI distribution), is known to follow de Boer's first pitch shift rule1. In the absence of noise, high spontaneous neurons would quickly lock onto the signal, i.e., onto the modulation frequency (200 Hz, 5 [ms]). It is only upon the addition of noise, that a distribution with a main peak at the perceived pitch fp emerges (Fig. 4). Sets of low-medium spontaneous neurons (that cannot directly encode fp in their instantaneous frequencies), when driven by identical signals but independent noise, generate an almost regular spike pattern, with a clear instantaneous frequency peak at locus of the perceived pitch fp (the “volley principle” of auditory nerve coding). Fig. 4 demonstrates that this encoding of the perceived pitch in ANN spike rates bequests a nonzero amount of noise. More details of how this is achieved and how well the required noise coincides with the one observed in biological measurements can be found in the Cochlea and ANN tuning curves section of the methods. Clearly, our simple median-based measure p(σ) neither takes account of more global properties of the distribution, nor of how the pitch is finally extracted from the ANN (which may be the origin of the minor mismatch between the optimal noise in Table I vs. the optimal noise in Fig. 4), but otherwise our observations are very stable and consistent.

Figure 4
figure 4

Suprathreshold stochastic resonance of high/low-medium spontaneous ANN (upper/lower panel).

(a) Spike trains of one/four neuron(s), (b) instantaneous spiking frequency distribution at the indicated noise level, (c) probability p for the instantaneous frequency to coincide with frequency of the perceived pitch fp, for variable noise levels σ.

Figure 5
figure 5

Perceived pitch from the peak in the ISI-histogram of high-spontaneous ANN (full dots).

(a) AM-sounds at fixed fcar = 800 + δf (fmod = 200 Hz, noise level σ = 0.07, dotted line fp = 200 + δf/4 Hz, the first pitch shift effect). Results scale correctly with δf. (b) Two-frequency stimulation f1 and f2 = f1 + 200 Hz (input amplitude A0 = −25 dB1V, scaling factor B = 3.0, cochlea output at fc = 622 Hz). The first pitch-shift law fp = (f1 + f2)/(2k + 1) at k = 4 (dotted line) is coherently violated (second pitch shift effect), by psychoacoustic data (crosses22) as well as by our measured data (full dots). Inset: ISI-histogram for f1 = 900 Hz showing the perceived pitches for k = 4 (left peak, for the rightmost cross) and for k = 5 (right peak, cross at fp ≈ 178 Hz, out of display).

As a final test, the biologically faithful transduction of the cochlear signal to the CNS is exhibited in the reproduction of psychoacoustic pitch phenomena. First, three-frequency stimulation of our auditory system is shown to give rise to pitch sensation that indeed follows de Boer's first pitch shift rule, for all detunings δf (Fig. 5a). Second and even more importantly, Smoorenburg's two-frequency stimulations psychoacoustic data22 are perfectly recovered by performing the corresponding pitch-shift experiment in the artificial system (Fig. 5 b), de Boer's second pitch shift effect).

Discussion

From the peripheral hearing system, ever more details are known of the parts involved. How these parts functionally work together, however, has remained a challenge. Our full model of the peripheral hearing system is based on the principles of nonlinear physics and includes in a detailed manner the facts known from biology. Our model is in a sense minimal: the design of the cochlea, the inner hair cells and the auditory nerve neurons, are extremely simple. Yet, our model not only reproduces all salient biological measurements to great accuracy, it also emphasizes the important role of synaptic noise in the transmission of salient hearing features, from the continuous basilar membrane motion to the discrete spiking world of the CNS. We demonstrated on the basis of physics that all nonlinear features measured at the auditory nerve can indeed be traced back to the active amplification process within the cochlea, a conclusion made previously on the basis of physiology23. A novel observation is that suprathreshold stochastic resonance seems to be necessary to enable the correct transition from IHC to ANN.

Our work could be seen as a next step following the modeling of Ref. 24, where Hopf elements at their bifurcation points with no biological interaction among them and with hair cells reduced to abstract threshold oscillators, reproduced the first pitch shift effect. From our more detailed modeling, we observe here the natural emergence of the second pitch shift effect across the peripheral auditory pathway from the cochlea to the auditory nuclei. The straightforward reproduction of the cochlea-generated psychoacoustic second pitch-shift by our peripheral hearing model corroborates that pitch sensation has its origins in the cochlear nonlinearities and that the peripheral auditory system takes surprising care to pass on the cochlear nonlinearities to the CNS. We also provide an important example of and argument for, the omnipresence of noise in the nervous system. Audition is a particularly intriguing place for such an observation, as the mammalian hearing system is famous for its high temporal precision and reliability. In this sense, our approach opens the perspective upon a novel construction paradigm for high-precision information processing based on noisy elements, that circumvents bottlenecks encountered by current technology, particularly in chip design. Beyond this and offering new insights in hearing research, our model can serve as a template for faithfully transducing continuous into discrete-time systems, exceeding conventional high frequency sampling methods in efficacy and robustness. Due to its simple biological blueprint, it was simple to also realize the model in hardware, which yielded virtually coincident results.

Work supported by the Swiss National Science Foundation (Grants 200020-132881, 200021-122276 to R.S).

Methods

Peripheral auditory system implementation details

Here, we provide more details on the implementation of the inner hair cells (IHC), on our auditory nerve neuron (ANN) model and on the nature of the stochastic resonance exhibited in the main manuscript.

A. Inner hair cell (IHC)

We relayed cochlear membrane states VCo(t) linearly to IHC cilia displacements, using u(t) = 20 · 10−9 · VCo(t). Cilia displacements cause the IHC voltage VIC to change according to the IHC model developed originally by Eustaquio-Martin and Lopez-Poveda6:

where gm(u(t)) = GM((1 + exp((u0u(t))/s0) (1 + exp((u1u(t))/s1)))−1, g∞,f/s(VIC) = Gm((1 + exp((V1,f/sVIC)/S1,f/s) (1 + exp((V2,f/sVIC)/S2,f/s)))−1 and where gl is the constant leak conductance. Unchanged in vivo parameters of the model were used.

B. Classes of auditory nerves (ANN)

The spontaneous spike rates of auditory nerve neurons exhibit a distinct bimodal distribution8, see Fig. 6. Taking into account coherent morphological, physiological and functional viewpoints, ANN are conventionally divided into a low, a medium and a high spontaneous spike class. Our modeling closely follows the distinction between these classes.

Figure 6
figure 6

Bimodal spontaneous firing histogram from Ref. 8, leading to the distinction of the three classes of ANN: (a) low-, (b) medium-, (c) high-spontaneous ANN.

The emergent main classification comprises the high-spontaneous and the concatenated medium/low-spontaneous class.

ANN synaptic noise correlation time

Due to the biological mechanism of the synaptic transmission, synaptic noise will be correlated in time. A finite correlation time is particularly evident for high-spontaneous ANN, that spike intensively even in the absence of input. To find the biologically justified correlation, we compared model-generated ISI distributions from high-spontaneous ANN to corresponding animal data14, see Fig. 7. An exact match of the exponential decay is found for a correlation time τ = 3 ms. This value represents a typical synaptic time scale (see e.g. Ref. 25). Smaller/larger chosen correlations lead to a faster/slower decay of the Poisson-like distributions.

Figure 7
figure 7

High-spontaneous ANN ISI distribution, for zero input: (a) model (σ = 0.078, spike-rate = 76 spikes/s, τ = 3 ms), (b) animal data from Ref. 14 (spike-rate = 76 spikes/s).

Lines indicate identical slopes.

ANN rate versus level curves

Our modeling results (Fig. 2) almost indistinguishably reproduce the biological ‘rate vs. level’ data (Fig 8 of Ref. 10).

Figure 8
figure 8

Biological ‘rate vs. level’ data by Ref. 10 (Fig. 2 top row).

C. Cochlea and ANN tuning curves

Auditory response is often characterized by ‘iso-intensity tuning curves’, for both cochlear oscillation amplitudes and auditory nerve spike rates, which were found to having a very similar characteristic form9,17. Also in our setting, the emergent tuning curves of the cochlear oscillations and those of auditory nerve neuron spiking display the same qualitative shape that, moreover, coincides with the measured biological data (Fig. 9).

Figure 9
figure 9

(a) Iso-intensity tuning curves from the cochlea (black lines, where the input amplitudes lead to a cochlear oscillation of −10 dB1V) and from the compound peripheral auditory system model (red lines, where numbers 1–3 indicate the ANN class from which the measurements were taken). Input amplitudes led to (1) 10 spikes/s (low-spontaneous class), (2) 10 spikes/s (medium-spontaneous class), (3) 10 spikes/s additional to the spontaneous spike rate (high-spontaneous class). (b) Biological data10, from high-sponaneous ANN. Measurements at the cochlea have been reported to yield coincident response profiles17.

D. ANN stochastic resonance details

According to our modeling that is based on carefully chosen parameter values, intriguingly, for correct pitch transduction a nonzero amount of noise is required. The working mechanism is illustrated by ISI histograms from high- and medium-spontaneous ANN (Fig. 10).

Figure 10
figure 10

Perceived pitch fp from the ISI-distribution peak, at different noise levels σ = 0, 0.03, 0.08.

(a) High-spontaneous ANN completely lock to the modulation frequency fmod = 200 Hz (ISI peak around 5 ms) in the absence of noise. Upon increasing the noise towards the biological level, the perceived pitch fp is implemented and, for higher than biological noise, fp is lost again. (b) Medium-spontaneous ANN: Here, fp is implemented soon after statistical firing is enabled (around σ = 0.02). Upon higher noise levels, fp is gradually lost.

For vanishing noise level σ → 0, the limit-cycle high-spontaneous ANN lock to the modulation frequency (ISI close to 5 msec) and thereby cannot transduce the perceived pitch (1/fp = 4.7 msec). Turning the noise on leads to a broadened and shifted distribution, until, from σ = 0.03, the peak of the distribution is in the vicinity of 4.7 ms. Increasing the noise level further flattens the ISI-distribution (i.e. decreases the signal-to-noise ratio) and eventually shifts the distribution peak beyond the perceived pitch. Note that changing the coupling strength alone (B = 1 in Eq.(2)) would fail to yield the correct pitch (e.g., B = 0.5 yields peaks from the interval (6, 8.5) ms). At ‘biological’ parameters, single medium-spontaneous ANN are unable to reproduce the perceived pitch. Therefore, a ‘volley’ setting comprising n = 4 neurons is considered. Quite soon after noise enables statistical spiking, the correct pitch frequency fp is transduced, until at too high noise levels, fp is gradually lost. Low-spontaneous ANN behave qualitatively identical to medium-spontaneous ANN.