Temporal asymmetries in auditory coding and perception reflect multi-layered nonlinearities

Sound recognition relies not only on spectral cues, but also on temporal cues, as demonstrated by the profound impact of time reversals on perception of common sounds. To address the coding principles underlying such auditory asymmetries, we recorded a large sample of auditory cortex neurons using two-photon calcium imaging in awake mice, while playing sounds ramping up or down in intensity. We observed clear asymmetries in cortical population responses, including stronger cortical activity for up-ramping sounds, which matches perceptual saliency assessments in mice and previous measures in humans. Analysis of cortical activity patterns revealed that auditory cortex implements a map of spatially clustered neuronal ensembles, detecting specific combinations of spectral and intensity modulation features. Comparing different models, we show that cortical responses result from multi-layered nonlinearities, which, contrary to standard receptive field models of auditory cortex function, build divergent representations of sounds with similar spectral content, but different temporal structure.

c. Raw signals from five individual neurons (in blue) and corresponding local neuropil signals (in red) extracted from the same ROIs. The neuropil-corrected signals (yellow) are obtained by subtracting from the raw signal a scaled version (by 0.7) of the local neuropil signals. For some neurons (e.g. bottom left) all the signal present in the raw data is removed, while for others simultaneously imaged neurons (e.g. bottom right) the signals are little affected by the correction. Scale bar 5 s. a. Simulated GCAMP6s fluorescence (black line) resulting from the train of spike shown below (red bars). The GCAMP6s signal resulting from a single spike is here modeled as double exponential with a unitary calcium increase of 11.3%, a rise time of 70 ms and an exponential decay of 1.87s as described in mouse visual cortex 1 (specifically,

Supplementary
). The blue line corresponds to the simulated signal superposed with white noise. Magnified signal in the inset highlights the temporal delay of the fluorescence peak compared to spikes due to the 70ms rise time.
b. Applying our linear deconvolution algorithm (see Methods) followed with Gaussian smoothing to the noisy fluorescence signal shown in a. yields an estimate of the time course of the instantaneous firing rate (green) which matches the smoothed instantaneous rate (red) much better than smoothed calcium signal (blue, the scale is hand-adjusted to match the rate signals). Correlation of the smoothed firing rate is much higher with the deconvolved calcium signals (0.91) than with the smoothed raw calcium signal (0.21), despite the fact that the deconvolution ignored the slow rise time of GCAMP6s, which results in a slight delay of the rate estimate, as can be seen in the inset. Note that this simulation and estimation proves robustness of the deconvolution to the mismatch between the assumed and actual models of calcium signals in terms of raise time (which is inexistent in the deconvolution), but also in terms of decay time, as the decay time used in the stimulation is of 1.87s, whereas it is assumed by the deconvolution to be of 2s (see Methods). b. Single cluster homogeneity for the 13 identified clusters (see color code in a) when the radius of analysis is varied. The shaded areas represent the range of values for the homogeneity index observed in 99% of the cell identity shufflings (bootstrap). Homogeneity is maximal at small distances and drop below statistical dependency at about 100m distance (depending on clusters); note that in the shuffled data the average homogeneity is the same at all distances by construction, however the variability is higher at small distances because the number of neuron pairs at small distances is much smaller than for large distances.

Supplementary
c-e. Spatial clustering is present across different mice and recordings. Distributions of a global homogeneity index (mean probability for the neighbors of any given neuron within an 30 m radius to belong to the same functional cluster, averaged over neurons from all clusters and within single mice) for 100,000 shufflings of the cell identities (bootstrap) compared to the experimental values for different mice. For any input signal s(t), defined as an integrable function on R, we are interested in transformations F from the space of integrable function to itself, for which the time integral of the output signal is invariant with respect to time-reversal, i.e. the transformation that satisfy the property P 0 : We here describe analytical proofs of this property for specific transformations or classes of transformation. Note that in the following, the notation is used for +∞ −∞ .

Effect of an arbitrary function applied to the input before the transformation
It is interesting to mention, that if F is a transformation that satisfy P 0 , this applies to any integrable function on R. So for any function f : ] also satisfies P 0 . In other words, any function (including non-linear functions) applied to the input signal before the transformation does not affect the invariance of the output integrals to a time reversal.

Invariance for a linear transformation
A general linear transformation of a function s(t), invariant by translation (i.e. the transformation does not depend on the absolute time at which is occurs) can be written as a convolution with a filter h(t).
For such a transformation the integral of the time-reversed signal is: So by setting t = t − u and then u = −u one easily obtains the equality of the integrals:

Case of STRF filters
In the particular case of a STRF filter, the input signal is the spectrogramŝ(t, f ) of the signal s(t). The response r(t) of a neuron predicted by its associated spectro-temporal receptive field, is computed by first convolving the spectro-temporal kernel ST RF (t, f ) withŝ(t, f ) This transformation is linear and invariant by time translation, thus for all frequencies f the time integral ofr(t, f ) is not affected by time-reversal. r(t) = r(t, f )df corresponds to the sum ofr(t, f ) over all frequencies f . This integration step is independant of time and thus is also unaffected by timereversal. Therefore the integral of STRF predictions of a neuron's response is in all cases unaffected by time reversal of the stimulus.
In addition in the particular case of the stimuli used in this study which have a frequency content that is invariant over time, the spectrogramm can be written as a product of a spectral and enveloppe componentŝ(t, f ) = g(f )S(t). In this case: Thus the STRF framework simplifies for this particular case to convolution with a frequency independent effective kernelST RF (t) valid for a particular frequency content. This justifies that the use of a frequency independant kernel to fit, within the STRF framework, the neuronal responses to enveloppe variations of white noise stimuli.

Invariance for the synaptic depression model
The model of synaptic depression is defined by David et al. (2009) as a discrete time equation for a depression variable d: from which the output signal is obtained as: The first equation yields in continuous time: in which d (t) is the first derivative of d(t). If we take that s(t) = 0 for t ≤ 0, (i.e. the signal starts at t = 0) the solution of this first order linear equation can be written as: x (1/τ +us(v))dv θ(t − x)dx in which θ is the Heaviside step function. Because s d (t) = s(t) − s(t)d(t) the invariance to time-reversal will be obtained if and only if A s = s(t)d(t) is invariant to time-reversal. For the forward signal A s (normalized by u) writes as: A s+ = dxdts(t)s(x)e − t x (1/τ +us(v))dv θ(t − x) And for the time-reversed signal it writes as: Setting t = −t, v = −v and x = −x yields, A s− = dt dx s(t )s(x )e − x t (1/τ +us(v ))dv θ(x − t ) In the expression above, the x and t are equivalent. Hence: A s− = dxdts(x)s(t)e − t x (1/τ +us(v))dv θ(t − x) = A s+ proving that the output integral of the synaptic depression model is invariant to time-reversal of the input signal despite its nonlinearity.