Synergistic population coding of natural communication stimuli by hindbrain electrosensory neurons

Understanding how neural populations encode natural stimuli with complex spatiotemporal structure to give rise to perception remains a central problem in neuroscience. Here we investigated population coding of natural communication stimuli by hindbrain neurons within the electrosensory system of weakly electric fish Apteronotus leptorhynchus. Overall, we found that simultaneously recorded neural activities were correlated: signal but not noise correlations were variable depending on the stimulus waveform as well as the distance between neurons. Combining the neural activities using an equal-weight sum gave rise to discrimination performance between different stimulus waveforms that was limited by redundancy introduced by noise correlations. However, using an evolutionary algorithm to assign different weights to individual neurons before combining their activities (i.e., a weighted sum) gave rise to increased discrimination performance by revealing synergistic interactions between neural activities. Our results thus demonstrate that correlations between the neural activities of hindbrain electrosensory neurons can enhance information about the structure of natural communication stimuli that allow for reliable discrimination between different waveforms by downstream brain areas.

. Neuropixels probes were used to record extracellular activities of ELL pyramidal cells responding to chirps created using adobe illustrator CS6 v 16.0 (www. adobe. com). (a) Left: schematics demonstrating chirps stimuli used in the experiments and experimental set up. Right: recorded activities from example channels using Neuropixels probes with spikes of different neurons highlighted in colors. (b) Left: stimulus waveform (top) consisting of a 5 Hz beat with a chirp (vertical red dashed line), raster plot of ON and OFF cells (middle), and the mean and standard deviation (shaded areas) of normalized population PSTHs across different trials (see "Materials and methods") (bottom) for chirp with 30 Hz excursion frequency at 0° of beat phase. The grey rectangle indicates the 40 ms chirp evaluation time window. Middle: same plots for chirp with 30 Hz excursion frequency at 180° of beat phase. Right: same plots for chirp with 60 Hz excursion frequency at 180° of beat phase.
correlations represent similarities between the mean responses of two neurons to a given stimulus (Fig. 2a, left), while noise correlations are instead correlations between the trial-to-trial variabilities of neural responses to repeated presentations of a given stimulus and arise due to shared noisy synaptic input (Fig. 2a, right).
We found that ELL pyramidal cells displayed both signal and noise correlations in their activities in response to chirp stimuli. Specifically, signal correlations of same-type (i.e., pairs containing either ON cells or OFF cells) and opposite-type pairs (i.e., pairs containing both ON and OFF cells) were on average positive and negative respectively (Fig. 2b, compare top and bottom panels). In contrast, noise correlations were similarly distributed around 0 for both same-type and opposite-type pairs (Fig. 2c, compare top and bottom panels). Interestingly, for same-type pairs, signal correlations first decreased and then increased with increasing distance between the probe sites on which both neurons were recorded (Fig. 2b top, from 0 to 550 μm: linear regression, r = − 0.74, p = 0.011; from 400 to 1000 μm: linear regression, r = 0.91, p = 4.2 × 10 -5 ). For opposite-type pairs, the opposite trend was observed in that signal correlation first increased and then decreased with increasing distance (Fig. 2b bottom, from 0 to 550 μm: linear regression, r = 0.66, p = 0.030; from 400 to 1000 μm: linear regression, r = -0.81, p = 8.3 × 10 -3 ). However, noise correlations were largely independent of distance for both same-type and opposite-type pairs (Fig. 2c, same-type pairs: linear regression, r = 0.020, p = 0.96; opposite type pairs: linear regression, r = 0.42, p = 0.10).
Next, we looked at whether and, if so, how signal and noise correlations varied as a function of the different chirp stimulus waveforms used in this study. We found that for the population with only ON cells, the distributions of signal and noise correlations were significantly different from one another for different chirps (Fig. 3a left, Friedman's test, p = 4.0 × 10 -44 ; Fig. 3b left, Friedman's test, p = 0.020). However, for the population with both ON and OFF cells, while the distributions of signal correlation were significantly different (Fig. 3a right, Friedman's test, p = 1.1 × 10 -16 ), noise correlation distributions did not change significantly (Fig. 3b right, Friedman's test, p = 0.17). Furthermore, we noticed that noise and signal correlations were not independent of each other. The signal and noise correlations for both ON-ON pairs and for all pairs are shown in Fig. 3c. Overall, there were positive but weak correlations between signal and noise correlations for both cases (Fig. 3c left, linear regression, r = 0.060, p = 1.3 × 10 -3 ; Fig. 3c right, linear regression, r = 0.11, p = 6.0 × 10 -15 ). Thus, our results at this stage show that, while signal correlations were strongly dependent on distance and chirp stimulus waveform, this was generally not the case for noise correlations.
Decoding ELL pyramidal cells activities with equal-weight sum and weighted sum. We next quantified the performance of a classifier at correctly discriminating between neural responses generated by a given chirp stimulus waveforms (see "Materials and methods"). In short, neural activities of all neurons were combined in different manners to obtain the population activity. The population activities obtained in response to different chirp waveforms were then compared across different stimulus trials using the van Rossum metric 32 . Thus, a given population activity was assigned as being generated by a certain stimulus i if the distance between this activity and the chosen template for stimulus i was lower than all other distances computed using chosen templates for other stimuli (see "Materials and methods"). In practice, the trial-averaged population activities were chosen as templates. The performance of the classifier is represented by a confusion matrix where each entry (i,j) is the probability that a response which was actually generated by stimulus i is classified as generated by stimulus j. As such, the diagonal elements of the confusion matrix give the amount of correct classification whereas the off-diagonal elements instead give the amount of incorrect classification.
First, we combined the neural activities by performing a linear sum giving the same weight to each neuron (Fig. 4a). To quantify the effects of noise correlations, the performance of the classifier was evaluated on the neural responses as well as neural responses that were randomly shuffled with respect to trial order (see "Materials and methods"). Performances obtained with and without noise correlations were significantly above chance level (with noise correlations, one-sample t-test, p = 3.9 × 10 -50 ; without noise correlations, one-sample t-test, p = 1.5 × 10 -54 ). We quantified the effect of timescale of encoding used in the van Rossum metric on the performance. This is important as small timescales put more emphasis on precise spike timing whereas larger timescales instead place more emphasis on slower variations in the firing rate 32 . We found that maximal performance was observed using a timescale of ~ 3 ms (Fig. 4b left), indicating that precise spike timing can be used to reliably discriminate between different chirp stimulus waveforms. The performance when noise correlations were removed was higher than that obtained for the raw data ( Fig. 4b right, one-way ANOVA, p = 1.3 × 10 -4 ), indicating that noise correlations have a detrimental effect on discrimination performance. Next, we analyzed how discrimination performance varied as a function of population size. We separated the entire population into ON cells and OFF cells and increased the population size by adding either ON cells or OFF cells first. We found that when increasing population size by first adding the ON cells, the performance increased when ON cells only were first considered and actually decreased when OFF cells were added to the pool (Fig. 4c). Interestingly, when increasing population size by first adding the OFF cells, the performance started with low values and increased slowly, but later increased drastically when ON cells were added (Fig. 4d). We found that ON cell populations had much better performance than OFF cell populations (Fig. 4d inset, one-way ANOVA, p = 1.0 × 10 -66 ). These results were consistent with the previous findings that single ON cells instead of single OFF cells better respond to chirps 33 .
Next, we combined the neural activities of all neurons using a weighted sum (i.e., a sum with unequal weights) (Fig. 5a). To find the weights that give rise to the best discrimination performance, we used an evolutionary algorithm (see "Materials and methods" ; Fig. 5a). The effect of timescale of encoding on performance was similar as in the case of the equal-weight sum (Fig. 5b left). Overall, the performance improved significantly when performing a weighted sum as compared to that obtained with equal-weight sum with and without noise correlations ( Fig. 5b right (Fig. 5c), in contrast to the equal-weight case (Fig. 4c); however, a population with only OFF cells still had a poor performance in the weighted case (Fig. 5d). It is important to note that population consisting of only ON cells displayed much better performance in the weighted case than in the equal-weighted case (compare Figs. 5c and 4c). As such, the improvement in performance is not due to considering both ON and OFF cells with opposite weights. Rather, such improvement is largely due to heterogeneities within the ON cell population. Why is there an overall performance increase when using a weighted sum vs. an unweighted sum? Intuitively, increases in performance can occur when the set of responses elicited by different stimuli become more distant from one another and thus more discriminable. However, increases in performance can also occur if the size of these sets decreases (Fig. 6a). Figure 6b shows three example stimulus waveforms (left top panel) as well as population PSTHs when taking equal-weight (left middle panel) and weighted (left bottom panel) sums. It was seen that the population activities were more different from each other (see dashed rectangle) when taking weighted sums, partly because a weighted sum with both positive and negative weights can lead to negative population activities while population activities obtained with equal-weight sum can only be positive by definition. Quantification of the distance between responses (see "Materials and methods") confirmed that greater values were obtained when considering weighted sums than equal-weight sums (Fig. 6c, one-way ANOVA, p = 4.5 × 10 -4 ). We next tested whether weighted neural responses were less variable than their equal-weight counterparts. To do so, we quantified the variability in the response using both weighted and equal-weight sums, as well as before and after removing noise correlations (see "Materials and methods"). We found that weighted sums reduced overall variability of neural activities, both with and without noise correlations (Fig. 6d, equal-weight with noise correlations vs weighted with noise correlations, one-way ANOVA, p = 3.9 × 10 -29 ; equal-weight without noise correlations vs weighted without noise correlations, one-way ANOVA, p = 3.4 × 10 -17 ). We also noticed that removing noise correlations reduced overall variability in the equal-weight case and increased overall variability in the weighted case (Fig. 6d, equal-weight with noise correlations vs equal-weight without noise correlations, one-way ANOVA, p = 3.1 × 10 -9 ; weighted with noise correlations vs weighted without noise correlations, oneway ANOVA, p = 0.040).

Weighted sums of ELL pyramidal cells activities eliminate redundancy and introduce synergy.
Why is the performance greater for weighted sums before removing noise correlations? Previous theoretical studies have shown that noise correlations can be beneficial to information transmission when their sign is opposite to that of signal correlations 2 . In order to study the correlation structures at a population level beyond two neurons, we combined the activities of subsets of neurons. Specifically, we divided our dataset into two subpopulations and considered correlations between the summed (either equal-weight or weighted) activities of both subpopulations 34 (see "Materials and methods"). We found that, for equal-weight, signal and noise correlations were both predominantly positive (Fig. 7a, 78.3% of points in upper-right quadrant). However, this was much less the case for weighted sums, as more data points with signal and noise correlations having the opposite signs were observed (Fig. 7b, number of points in upper-left quadrant increased from 18.4 to 34.5%, while num- Figure 2. Signal but not noise correlations varied with distance. (a) Schematics showing how signal and noise correlations arise created using adobe illustrator CS6 v 16.0 (www. adobe. com). While signal correlation arises from similarity in mean responses to stimuli (left), noise correlation instead arises from shared noisy synaptic inputs (right). (b) Top: signal correlations of same-type pairs (i.e., pairs of either ON or OFF cells) as a function of distance (blue dots). Distance was discretized into 20 bins (50 microns per bin) and signal correlations for pairs that fall within the same bin were averaged (black dots, error bars indicate standard deviation). Signal correlations first decreased and then increased with distance (from 0 to 550 microns: linear regression, r = − 0.74, p = 0.011; from 400 to 1000 microns: linear regression, r = 0.91, p = 4.2 × 10 -5 ). Bottom: signal correlations of opposite-type pairs (i.e., pairs containing one ON and one OFF cell) as a function of distance (red dots). Signal correlations first increased and then decreased with distance when the data was averaged within bins (from 0 to 550 microns: linear regression, r = 0.66, p = 0.030; from 400 to 1000 microns: linear regression, r = − 0.81, p = 8.3 × 10 -3 ). We note that qualitatively similar results were obtained when performing a linear regression on the data without averaging (same type: from 0 to 550 microns: linear regression, r = − 0.26, p = 4.7 × 10 -37 ; from 400 to 1000 microns: linear regression, r = 0.34, p = 2.8 × 10 -26 ; opposite type: from 0 to 550 microns: linear regression, r = 0.18, p = 2.2 × 10 -13 ; from 400 to 1000 microns: linear regression, r = − 0.32, p = 2.0 × 10 -14 ). (c) Top: same as (b), but for noise correlations. There was no significant correlation between noise correlations and distance for both same-type pairs and opposite-type pairs (same-type pairs: linear regression, r = 0.020, p = 0.96; opposite type pairs: linear regression, r = 0.42, p = 0.10). When performing a linear regression on the data without averaging, we found a negligible but significant relationship between noise correlations and distance both for same type pairs (slope = − 1.3 × 10 -5 , r = − 0.045, p = 0.014) and for opposite type pairs (slope = 2.5 × 10 -5 , r = 0.096, p = 2.2 × 10 -5 ). However, note that the slopes are infinitesimally small in magnitude in both cases. In panels b and c, correlation coefficient values that were deemed non-significant at the p = 0.05 level using the function "corrcoeff " in Matlab are plotted in green.  www.nature.com/scientificreports/ ber of points in upper-right quadrant decreased from 78.3 to 61.0%). These findings thus confirm our hypothesis and explain why removing noise correlations led to lower performance when considering equal-weight sums but instead led to increased performance when considering weighted sums.

Discussion
Summary of results. In this study, we investigated for the first time how ELL pyramidal cell populations encode natural electro-communication stimuli by simultaneously recording the activities of multiple neurons. We first demonstrated that the activities of ELL pyramidal cells were correlated pairwise under chirp stimulation. Specifically, while signal correlations varied as a function of the physical distance between recording probe sites as well as stimulus waveform, noise correlations were instead largely independent of both distance and stimulus waveform. There was furthermore a positive relationship between signal and noise correlations. We next quantified the performance of a classifier at correctly discriminating which stimulus waveform was presented based on  www.nature.com/scientificreports/ the combined neural activities of ELL pyramidal cells. When the activities were combined using an equal-weight sum, we found that ON cells have better discrimination performance than OFF cells with a combined (ON and OFF cells) correct discrimination performance around 75%. Noise correlations were overall detrimental Schematics showing how the responses of ELL pyramidal cells were summed with different weights assigned for different neurons. The weights were generated by an evolutionary algorithm. If the weights generated gave a better performance, they replaced the previous weights; if the weights generated did not improve the performance for 10 iterations (performance maximized), the evolutionary algorithm was terminated (see "Materials and methods" for details). (b) Left top: confusion matrices where each entry is the probability of a stimulus i predicted as stimulus j (prediction based on the distance between neural responses quantified by van Rossum metric with timescale τ, see "Materials and methods" for details) for a population of 21 neurons consisting of 16 ON and 5 OFF cells with τ = 1, 3 and 100 ms. Left bottom: discrimination performance as a function of τ. The shaded areas represent standard deviation of performance from different simulations of the evolutionary algorithm (30 in total), and same for (c) and (d). The range of τ values for which performance was higher than 90% of the maximum is 5.3 ms, which is similar to that obtained in the equal-weighted case (Fig. 4b) www.nature.com/scientificreports/ to discrimination performance as their removal increased performance. When instead considering weighted sums and using an evolutionary algorithm to optimize the weights, we found increased performance up to 85%. Interestingly, noise correlations were then beneficial as removing them decreased performance. Further analysis revealed that the improved performance by weighted sum was the result of maximizing distance between trial- Boxplots showing that weighted sums of neural activities had lower response variability than equal-weight sums of neural activities (with noise correlations, one-way ANOVA, p = 3.9 × 10 -29 ; without noise correlations, oneway ANOVA, p = 3.4 × 10 -17 ); also, equal-weight sums of neural activities without noise correlations had lower response variability than equal-weight sums with noise correlations (one-way ANOVA, p = 3.1 × 10 -9 ) while weighted sums of neural activities without noise correlations had higher response variability than weighted sums with noise correlations (one-way ANOVA, p = 0.040). www.nature.com/scientificreports/ averaged responses to different chirp stimuli and minimizing overall variability. By considering correlations between the summed activities of subpopulations, we found that signal and noise correlations tended to have the same sign when considering equal-weight sums, which is detrimental to discrimination. In contrast, signal and noise correlations with opposite signs became relatively more dominant when considering weighted sums, which is beneficial to discrimination. Our results thus show that ELL pyramidal cells display significant correlations in their activities during chirp stimulation that can be either beneficial or detrimental to discriminability depending on how these activities are decoded by downstream brain areas.

Origins of signal and noise correlations.
Our results have shown that signal correlation magnitude first decreased with distance then increased. While the decrease can be explained by increasing dissimilarity in the receptive fields of neurons with increasing distance, the increase of signal correlations as the distance further increased is more puzzling. One possible explanation is that descending input from higher brain areas (i.e., feedback) modulate chirp responses to increase signal correlations. Indeed, ELL pyramidal cells receive abundant feedback consisting of both topographic and diffuse sources 35 . In particular, diffuse feedback was shown to affect signal correlations in ELL pyramidal cells to beat stimuli 29 and enhance single neuron responses to chirps 36 . Such feedback originates from cerebellar granule cells, which make the ELL a cerebellum-like structure 37 . As such feedback originates from afferent input located far away from the cell within the non-classical receptive field 38,39 , we hypothesize that this might explain the increase in signal correlations observed for larger distances. Alternatively, the decrease and increase in signal correlations could be due to the fact that the recording probe went across different maps of ELL, from the lateral segment (LS) into the central lateral segment (CLS) thereby recording from cells in different segments that receive similar feedforward inputs from electroreceptor afferents. Further studies are needed to test these predictions. In contrast, our results showed that noise correlations were invariant as the physical distance between neurons increased. These observations agree with previous findings in the visual cortex that noise correlations do not depend on the contact distance 40 . In general, noise correlations can arise from both bottom-up and top-down inputs as well as recurrent connections 41 . The amount of common input from electrosensory afferents to ELL pyramidal cells decreases as the distance between neurons increases 27,28 . Thus, if noise correlations were caused by common feedforward input, they would likely decay as distance between neurons increases. Therefore, it is likely that the descending input from cerebellar granule cells mentioned above strongly contribute to shaping noise correlations during chirp stimulation. Indeed, previous studies have shown that feedback can modulate noise correlations in response to beat stimuli 29 . The fact that a previous study of cerebellum found that parallel fibers can synchronize neural activities and no difference in correlations was found across pairs with different distance 42 is consistent with our hypothesis. www.nature.com/scientificreports/

Optimized decoding of ELL pyramidal cells activities.
Our results showed that a weighted sum of neural activities can improve discrimination performance, which was due in part to synergistic effects of noise correlations. These findings agreed with the previous studies showing that, rather than averaging neuronal responses by weighting them equally, weighting neurons differently can provide more information [43][44][45] . In this case, the weights were generated using an evolutionary algorithm to maximize the discrimination performance of electro-communication stimuli. We note that such "combinatorial codes" can recover much more information about the stimulus and are thus advantageous [46][47][48][49][50][51][52] . In general, the amount of information extracted by the algorithm was an upper-bound. However, it is unclear how such weights can be assigned physiologically. Possible biological implementations of neural decoding with weighted sums have been investigated in previous studies 53,54 . For example, a model of population decoding with weights determined by the activity levels of upstream neurons can capture the experimentally observed behaviours 53 . In the electrosensory system, midbrain neurons of torus semicircularis in general integrate synaptic inputs from both ON-and OFF-type ELL pyramidal cells although the relative proportion varies greatly across individual neurons 55 . While a recent study showing that some midbrain neurons can reliably discriminate between different chirp stimulus waveforms provides support for the hypothesis that TS neurons respond to a weighted sum of ELL inputs 56 , further investigation is needed to fully test this hypothesis, and, if true, determine how the weights are assigned.
Our results show that ELL pyramidal cell populations can discriminate between chirps occurring at different phases of the beat. This is consistent with previous results showing good discriminability in peripheral electroreceptor afferents 57 as these faithfully follow the detailed time course of the chirp stimulus 24,58 . Our results show that considering correlations between ELL pyramidal neuron activity can improve discriminability in the unequal-weighted case and we note that previous studies have shown other types of synergistic neural codes based on synchrony in both afferents 23,59 and ELL pyramidal cells 60 . While behavioral studies have shown that fish can detect chirps with different attributes 24,58,61 , whether fish can discriminate between different chirp stimulus waveforms remains unknown as the behavioral responses were mostly invariant (i.e., the same) when varying chirp attributes such as amplitude, duration, and the phase of the beat at which the chirp occurs at 24,58 .
It is also important to note that our study focused on natural electrocommunication signals termed "small chirps" that tend to occur on top of low frequency beats 19,62 . There are other types of electrocommunication signals with different characteristics, e.g. "big chirps" that instead tend to occur on top of high frequency beats 19 . Interestingly, recent studies have shown that small chirps can also occur on top of high frequency beats 15 . Moreover, a previous study that considered population coding of both small and big chirps but did not consider the effects of noise correlations has found results qualitatively similar to our own when varying the timescale of encoding 33 . Further studies are needed in order to understand how correlations influence coding of big chirps as well as small chirps occurring on top of higher frequency beats by ELL pyramidal cell populations. Moreover, future studies should consider other behaviorally relevant stimulus classes (e.g., prey). We also note that the stimulation protocol using two electrodes on each side of the animal gives rise to stimulation patterns that are more homogeneous than those typically encountered during social interactions 63 . Future studies should take into account such patterns of stimulation when studying sensory processing by neural populations.

Implications for other systems.
Previous studies have shown that the electrosensory system processes information similarly to other sensory systems (e.g. contrast coding 64 , sensory adaptation 65 ). Sensory processing of natural communication stimuli has been widely studied in other animals (e.g. songbirds 66,67 , grasshoppers 68,69 , the grassfrog 70 ). We note that there are also similarities between the electrosensory system and other systems in terms of sensory processing of communication stimuli: for example, the midbrain torus semicircularis in the grassfrog contains neurons that selectively respond to natural mating calls 70 , while in the torus semicircularis of weakly electric fish A. leptorhynchus, neurons selectively respond to chirps were also found 24 . Therefore, we predict that our results are applicable to population coding of communication stimuli in other systems.
Our results further demonstrated the ON-OFF asymmetry of ELL pyramidal cells in terms of chirp discrimination. Previous studies showed symmetry between ON and OFF pyramidal cells in terms of their responses to different chirps (i.e., ON cells increase their firing rates while OFF cells decrease their firing rates in response to increases in stimulus amplitude) 24,64 . While the chirp stimuli we delivered contained equally phases that ON cells prefer and those that OFF cells prefer, ON cells still perform much better than OFF cells in discriminating different chirps, which is in agreement with previous studies 25 . Since we only used chirps with four different phases, future studies of chirp stimuli with more phases used can be done to further confirm an asymmetry in coding of chirp stimuli by ON and OFF type cells. ON and OFF type cells are found in other sensory modalities (e.g. visual 71,72 , auditory 73 , olfactory 74 ). Other types of ON-OFF asymmetries have also been found previously in the visual system [75][76][77][78] . Our results thus add further evidence supporting the hypothesis that ON-OFF asymmetries are general property across different sensory modalities.
Methodologically, we used an evolutionary algorithm that runs iteratively to find weights that maximize discrimination performance, as was done recently for midbrain neurons 56 . The same algorithm was used previously to optimize model parameters 79 . The algorithm takes both spike timing and firing rate into account, therefore extracts information in not only the spike counts but the structures of spike trains. This algorithm can be easily adapted to analyze activities of neurons in other systems and help determine the upper-bound of information that the spiking activities of neurons can carry. We note that a similar approach was also used to optimize weights to maximize discriminability 45

Materials and methods
Animals. The South American wave-type weakly electric fish Apteronotus leptorhynchus (N = 2) was used in this study. Animals were purchased from tropical fish suppliers and were housed in groups (2-10) at controlled water temperatures (26-29 °C) and conductivities (300-800 µS cm −1 ) according to published guidelines 80 . All animal procedures were approved by McGill University's animal care committee and were conducted according to the ARRIVE guidelines.
Surgery and recording. Surgical procedures have been described in details previously 38 . Briefly, animals were immobilized by injection of 0.1-0.5 mg of tubocurarine (Sigma) intramuscularly. The animals were then transferred to an experimental tank (30 cm × 30 cm × 10 cm) containing water from the animal's home tank and respirated by a mouth tube providing constant flow of oxygenated water at a flow rate of 10 mL min −1 . Before surgery, the animal's head was locally anesthetized with lidocaine ointment (5%; AstraZeneca, Mississauga, ON, Canada). Craniotomy (a ~ 5 mm 2 window) was performed to partially expose the hindbrain. Neuropixel probes (Imec inc., Leuven, Belgium) were inserted into the brain along the rostral-caudal axis and a 45° angle with respect to the sagittal plane at transverse slice T-4 of the brain atlas (see 81 ) laterally near the praeeminentialis efferent tract (labeled "tP-Cb" on the atlas), and the tip moved 1500 μm into the brain as measured from the surface. We waited at least one hour after probe insertion before starting recordings to allow brain tissue to settle following probe insertion and to improve recording stability. Accounting for the fact that the first recording site is located 175 μm away from the tip along the probe shaft, as well as the fact that recordings were typically obtained on recording sites ranging between 13 and 97, this gives approximate recording between 355 and 1195 μm from the brain surface along, which are within the range reported from a previous study where location within LS was confirmed by histological post-processing 82,83 . Thus, based on probe geometry, anatomy 81 , and our experience recording from ELL pyramidal cells 58,65,82,84,85 , it is likely that most of our recordings were from LS. However, we cannot reject the hypothesis that some of our recordings were from the centrolateral segment. The distance between recorded units was computed as the physical distance between the recording sites on which the spikes shapes of both units displayed the largest amplitude, which is approximate. However, since a given unit was most often recorded from the nearest neighbours to the primary recording site, the error is at most 40 μm based on probe geometry. We note that this is much smaller than the range of distances over which recordings were obtained.
Stimulation. The electric organ discharge (EOD) of A. leptorhynchus is neurogenic, and therefore is not affected by injection of curare. Stimuli consisted of amplitude modulations (AM) of the animal's own EOD were produced by triggering a function generator to emit one cycle of a sine wave for each zero crossing of the EOD as done previously 86 . The frequency of the emitted sine wave was set slightly higher (30 Hz) than that of the EOD, which allowed the output of the function generator to be synchronized with the EOD. The emitted sine wave was subsequently multiplied with the desired AM waveform (MT3 multiplier; Tucker Davis Technologies, Alachua, FL, USA), and the resulting signal was isolated from the ground (A395 linear stimulus isolator; World Precision Instruments, Sarasota, FL, USA). The isolated signal was then delivered through a pair of chloridized silver wire electrodes located 15 cm away from the animal on each side of the recording tank perpendicular to the fish's rostro-caudal axis. In this study, a 5 Hz beat frequency and 14 ms chirp duration were used. Chirps were generated with different attributes by systematically varying the excursion frequency (30, 60, 90 and 120 Hz) and the phase (0, 90, 180 and 270°) of the underlying beat cycle at which the chirp occurs. As such, a total of 16 chirps were used (4 different chirp amplitudes, 4 different chirp phases). Parameter ranges were chosen to contain those observed in previous studies 17,87 . To measure the stimulus intensity, a dipole was placed near the animal's skin. Stimulus intensity was adjusted to produce changes in EOD amplitude that were ~ 20% of the baseline level, as done previously 24,88 . Each type of chirp stimulus was presented 40 times (i.e., 40 trials).

Data analysis.
Spike times for each individual neuron were sorted using Kilosort and manually curated using Phy 2. The spike times were converted into binary sequences X i (t) sampled at 2 kHz (i.e., 1 if a spike occurred during a given binwidth of 0.5 ms and 0 otherwise). Neurons were classified into either ON-or OFF-type based on spike-triggered average (STA) of a low-pass filtered (0-120 Hz) noise stimulus as done previously 89 . The strength of the neural response was quantified by the STA amplitude (i.e., the distance between the maximum and minimum values) 85 . We quantified correlations between neuronal activities using spike count sequences N i that were obtained from each spike train by counting the number of spikes occurring during 4 successive and non-overlapping 10 ms time windows that were always aligned with respect to 8 ms after the onset of the chirp stimulus in order to account for transmission delays. We then computed the correlation coefficient between pairs of spike count sequences using Pearson's correlation coefficient: where < … > represents an average over trials (i.e., each presentation of a given chirp stimulus is one trial). To compute signal correlations, spike count sequences were first randomly permuted based on the order of trials to obtain shuffled spike counts. Signal correlations were then computed on the shuffled spike counts using Eq. (1) and were averaged over 50 independent realizations of the shuffling procedure. Noise correlations were computed as the correlation coefficient between the spike count residual sequences, which were obtained by averaging over www.nature.com/scientificreports/ trials and subtracting the mean spike count sequence from the spike counts for each trial 56 . Thus, correlations were computed for each individual chirp stimulus. For correlations at the population level, we divided the entire population into two subpopulations through partial sums: we summed the binary sequences X i (t) of 50% of the neurons in the entire population to form the first subpopulation and then the activities of the other 50% of the neurons to form the second subpopulation. Correlations were then computed as described above and error bands for signal and noise correlations were generated for 300 bootstrap samples of partial sums.
The single neuron PSTHs R i (t) were calculated by low-pass filtering the binary sequences X i (t) with a 10 ms boxcar window. The population PSTHs were obtained by summing the single neuron PSTHs R i (t) with either equal weights or unequal weights obtained through an evolutionary algorithm (described below): where w i is the weight of neuron i. We note that, as the weights can be negative, the population PSTH obtained using unequal weights can also be negative. For each stimulus, the population PSTH of each trial was then normalized by the maximal value of that trial. The mean and standard deviation of the normalized population PSTHs across different trials were then obtained.
To quantify the similarity of mean responses of the population to different stimuli, we computed the interresponse distance 24,79 : where x and y are means of normalized population PSTHs across different trials of two different stimuli, < … > denotes an average over an evaluation window of 40 ms after chirp onset. For each stimulus, we calculated the inter-response distance of the stimulus to the rest of the stimuli individually, and then took the average to obtain the averaged distance to other stimuli for this stimulus. For boxplots in Fig. 6c, the interquartile range (Q3 = 0.25, Q4 = 0.75) was taken to rule out stimuli whose averaged distances to other stimuli are either overly high or low, which hinder our comparisons.
To quantify the response variability of the population activities, we averaged the standard deviation of responses across different trials over all stimuli: where σ(k n ) is the standard deviation of normalized population PSTHs across different trials of each stimulus and n is the number of stimuli. The response variability at each time point was normalized by the maximal value of variability across the entire evaluation time window. For boxplots in Fig. 6d, the interquartile range (Q3 = 0.25, Q4 = 0.75) was taken to rule out times at which variability values are either overly high or low.
Classifier. We used a classifier to quantify the performance of ELL pyramidal cells at stimulus discrimination. We combined activities of individual neurons using either weighted or un-weighted sums for each chirp stimulus. For each chirp stimulus, the averaged population activity of all trials was chosen as a template. Next, each combined response was assigned as being generated by the stimulus that gave rise to a given template based on whether the distance between the combined response and the template was minimum. We thus constructed a "confusion matrix" whose element (i,j) gives the probability that a response was assigned as being generated by stimulus j given that it was actually generated by stimulus i 26,89,90 . The diagonal elements of this matrix are the probabilities that a stimulus was correctly assigned, whereas non-zero off-diagonal elements indicate misclassification. For each confusion matrix we computed the discrimination performance by averaging over the diagonal elements, as done previously 26,56,89 . The discrimination performance can thus vary between 0 (no discrimination) and 1 (perfect discrimination). Note that the chance level for discrimination performance was 0.0625 (that is, 1/16) because we used a total of 16 different chirp stimuli. The distance between combined neuron activities was computed using the van Rossum metric 32 . First, the combined neural activities were convolved with a decaying exponential kernel with time constant τ: where t i is the ith spike time, M is the total number of spikes and H(t) is the Heaviside step function (H(x) = 0 if x < 0 and H(x) = 1 if x > = 0). The distance was then computed as the Euclidian distance between convolved combined neural activities f Rj and f Rk : www.nature.com/scientificreports/ We varied τ between 1 and 100 ms to evaluate the effects of precise spike timing on classification. When τ is small, the metric takes into account spike timing whereas, when τ is larger, the metric takes into account slower changes in firing rate. If not specified otherwise, τ = 3 ms was used.
Evolutionary algorithm. In order to determine whether performing a weighted sum of neural response gave rise to better classification than an equal-weight sum, we trained an evolutionary algorithm (EA) using the population responses on a randomly selected 60% of trials for each chirp stimulus as a training dataset. We then measured the classification accuracy of the trained classifier on the entire dataset. We chose the recording session that contained the greatest number of neurons recorded simultaneously (n = 21).
Specifically, each neuron was assigned a weight w i which varies between − 2 and 2 and the goal was to choose a set of weights that maximizes the performance of the classification algorithm described above. The EA is described in detail in a previous studies by our group 56,79 . Specifically, a set of weight vectors (i.e., "agents") is allowed to evolve by minimizing a fitness function F fit over a series of iterations (i.e., "generations"). In keeping with the notation used in previous studies 79 , we denote X r k (i) as parameter i for agent r of generation k. First, the population of K individuals is randomly initialized with weight values that are uniformly distributed with zero mean and restrained within [− 2 2]. For each individual at every generation, a new individual is constructed by "differentiation": the rth new parameter vector X r k,trial is built by combining three other individuals X r 1 k , X r 2 k , and X r 3 k , where r 1 ≠ r 2 ≠ r 3 : where the differential weight F = 0.5, and the three individuals are chosen based on a probability distribution that is preferentially weighted for more fit (i.e., lower fitness score) individuals: where λ is a normalization constant such that the sum of probability values is equal to one. Random mutations are then performed as follows: where u is a random variable generated from a uniform distribution U(0,1) and with crossover probability CR = 0.9. Selection is finally performed to produce the next generation via: In this study, the fitness function for a given individual was defined as: where DP X r k is the discrimination performance estimated by computing the precision of events (i.e., spikes) of our neuronal population in response to our set of 16 chirp stimuli. The EA was terminated if the change in population discrimination performance was less than 0.0001 in 10 consecutive iterations. The algorithm was repeated 30 times, and each time a different set of weights was obtained because of different initial conditions and the randomness in generating new individual and mutations. The weights were normalized so that the sum of weights of all neurons equals to 1. The weights that gave rise to the best performance out of the 30 runs were used for Figs. 6 and 7. As mentioned above, this methodology is the same as that used previously for midbrain neurons 56 , which allows for a direct comparison between these previous results and those obtained in the current study for hindbrain neurons. In general, we found significant correlations between weight magnitude and STA amplitude to noise stimulus for ON (r = 0.92, p = 6.4 × 10 -7 ) and OFF cells (r = 0.95, p = 0.013).