Neural processing of natural sounds

Theunissen, Frédéric E.; Elie, Julie E.

doi:10.1038/nrn3731

Review Article
Published: 20 May 2014

Neural processing of natural sounds

Frédéric E. Theunissen¹ &
Julie E. Elie¹

Nature Reviews Neuroscience volume 15, pages 355–366 (2014)Cite this article

9850 Accesses
151 Citations
15 Altmetric
Metrics details

Subjects

Key Points

Natural sounds include animal vocalizations, environmental sounds such as wind, water and fire noises, and non-vocal sounds made by animals and humans for communication. These natural sounds have characteristic statistical properties that make them perceptually salient and that drive auditory neurons in optimal regimes for information transmission.
Recent advances in statistics and computer sciences have enabled neurophysiologists to extract the stimulus–response function of complex auditory neurons from responses to natural sounds. These studies have shown a hierarchical processing that leads to the neural detection of progressively more complex natural sound features and have demonstrated the importance of the acoustical and behavioural context for the neural responses.
High-level auditory neurons have been shown to be exquisitely selective for conspecific calls. This fine selectivity could have an important role in species recognition, vocal learning in songbirds and, in the case of the bats, the processing of the sounds used in echolocation. Research that investigates how communication sounds are categorized into behaviourally meaningful groups (for example, call types in animals and words in human speech) remains in its infancy.
Animals and humans also excel at separating communication sounds from each other and from background noise. Neurons that detect communication calls in noise have been found, but the neural computations involved in sound source separation and natural auditory scene analysis remain overall poorly understood. Thus, future auditory research will have to focus not only on how natural sounds are processed by the auditory system but also on the computations that enable this processing to occur in natural listening situations.
The complexity of the computations needed in the natural hearing task might require a high-dimensional representation provided by an ensemble of neurons, and the use of natural sounds might be the best solution for understanding the ensemble neural code.

Abstract

We might be forced to listen to a high-frequency tone at our audiologist's office or we might enjoy falling asleep with a white-noise machine, but the sounds that really matter to us are the voices of our companions or music from our favourite radio station. The auditory system has evolved to process behaviourally relevant natural sounds. Research has shown not only that our brain is optimized for natural hearing tasks but also that using natural sounds to probe the auditory system is the best way to understand the neural computations that enable us to comprehend speech or appreciate music.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Historical approaches to auditory neurosciences.**

**Figure 3: Stimulus–response characterization.**

**Figure 4: STRFs at different levels of the auditory system.**

Dissecting neural computations in the human auditory pathway using deep neural networks for speech

Article Open access 30 October 2023

Yuanning Li, Gopala K. Anumanchipalli, … Edward F. Chang

Large-scale single-neuron speech sound encoding across the depth of human cortex

Article Open access 13 December 2023

Matthew K. Leonard, Laura Gwilliams, … Edward F. Chang

Subcortical responses to music and speech are alike while cortical responses diverge

Article Open access 08 January 2024

Tong Shan, Madeline S. Cappelloni & Ross K. Maddox

References

Darrigol, O. Number and measure: Hermann von Helmholtz at the crossroads of mathematics, physics, and psychology. Stud. Hist. Philos. Sci. 34, 515–573 (2003).
PubMed Google Scholar
Robles, L. & Ruggero, M. A. Mechanics of the mammalian cochlea. Physiol. Rev. 81, 1305–1352 (2001).
CAS PubMed PubMed Central Google Scholar
Woolley, S. M. & Casseday, J. H. Response properties of single neurons in the zebra finch auditory midbrain: response patterns, frequency coding, intensity coding, and spike latencies. J. Neurophysiol. 91, 136–151 (2004).
PubMed Google Scholar
Eggermont, J. J. Between sound and perception: reviewing the search for a neural code. Hear. Res. 157, 1–42 (2001).
CAS PubMed Google Scholar
Suga, N., O'Neill, W. E. & Manabe, T. Cortical neurons sensitive to combinations of information-bearing elements of biosonar signals in the moustache bat. Science 200, 778–781 (1978).
CAS PubMed Google Scholar
Margoliash, D. & Fortune, E. S. Temporal and harmonic combination-sensitive neurons in the zebra finch's HVc. J. Neurosci. 12, 4309–4326 (1992).
CAS PubMed Google Scholar
Margoliash, D. & Konishi, M. Auditory representation of autogenous song in the song system of white-crowned sparrows. Proc. Natl Acad. Sci. USA 82, 5997–6000 (1985).
CAS PubMed Google Scholar
Nelken, I., Rotman, Y. & Bar Yosef, O. Responses of auditory-cortex neurons to structural features of natural sounds. Nature 397, 154–157 (1999). This study reports a very surprising result: auditory cortical neurons can be more sensitive to the natural auditory context than to the main component of the sound signal.
CAS PubMed Google Scholar
Hsu, A., Woolley, S. M., Fremouw, T. E. & Theunissen, F. E. Modulation power and phase spectrum of natural sounds enhance neural encoding performed by single auditory neurons. J. Neurosci. 24, 9201–9211 (2004). Using estimations of mutual information, this study shows that sounds that are otherwise identical in intensity and frequency but have a modulation power spectrum with natural distributions are more efficiently encoded in the avian auditory cortex.
CAS PubMed PubMed Central Google Scholar
Rieke, F., Bodnar, D. A. & Bialek, W. Naturalistic stimuli increase the rate and efficiency of information transmission by primary auditory afferents. Proc. R. Soc. Lond. B 262, 259–265 (1995).
CAS Google Scholar
Hedwig, B. Pulses, patterns and paths: neurobiology of acoustic behaviour in crickets. J. Comp. Physiol. A Neuroethol. Sens. Neural Behav. Physiol. 192, 677–689 (2006).
PubMed Google Scholar
Arcadi, A. C., Robert, D. & Boesch, C. Buttress drumming by wild chimpanzees: temporal patterning, phrase integration into loud calls, and preliminary evidence for individual distinctiveness. Primates 39, 505–518 (1998).
Google Scholar
Voss, R. F. & Clarke, J. 1/f noise in music and speech. Nature 258, 317–318 (1975). In this first study on the statistical structure in music and speech sounds, Voss and Clarke show that the amplitude envelope and other features of these natural sounds follow a power law relationship and discuss the significance of this relationship.
Google Scholar
Attias, H. & Schreiner, C. E. in Advances in Neural Information Processing Systems (eds Mozer, M. C., Jordan, M. I. & Petsche, T.) 27–33 (MIT Press, 1997).
Google Scholar
Singh, N. C. & Theunissen, F. E. Modulation spectra of natural sounds and ethological theories of auditory processing. J. Acoust. Soc. Am. 114, 3394–3411 (2003). This paper introduces the joint temporal and spectral modulation power spectrum of sounds and shows that natural sounds have a characteristic signature in this representation.
PubMed Google Scholar
Chen, J. D., Paliwal, K. K. & Nakamura, S. Cepstrum derived from differentiated power spectrum for robust speech recognition. Speech Comm. 41, 469–484 (2003).
Google Scholar
Cohen, L. in Time–Frequency Analysis Ch. 3 50–52 (Prentice Hall, 1995).
Google Scholar
Bialek, W., Nemenman, I. & Tishby, N. Predictability, complexity, and learning. Neural Comput. 13, 2409–2463 (2001).
CAS PubMed Google Scholar
Elliott, T. M., Hamilton, L. S. & Theunissen, F. E. Acoustic structure of the five perceptual dimensions of timbre in orchestral instrument tones. J. Acoust. Soc. Am. 133, 389–404 (2013).
PubMed PubMed Central Google Scholar
Garcia-Lazaro, J. A., Ahmed, B. & Schnupp, J. W. H. Emergence of tuning to natural stimulus statistics along the central auditory pathway. PLoS ONE 6, e22584 (2011).
CAS PubMed PubMed Central Google Scholar
Srivastava, A., Lee, A. B., Simoncelli, E. P. & Zhu, S. C. On advances in statistical modeling of natural images. J. Math. Imaging Vis. 18, 17–33 (2003).
Google Scholar
Ruderman, D. L. Origins of scaling in natural images. Vis. Res. 37, 3385–3398 (1997).
CAS PubMed Google Scholar
Simoncelli, E. P. & Olshausen, B. A. Natural image statistics and neural representation. Annu. Rev. Neurosci. 24, 1193–1216 (2001).
CAS PubMed Google Scholar
Rodriguez, F. A., Chen, C., Read, H. L. & Escabi, M. A. Neural modulation tuning characteristics scale to efficiently encode natural sound statistics. J. Neurosci. 30, 15969–15980 (2010). In this study, Rodriguez et al . show that the modulation tuning of neurons in the inferior colliculus balances out the 1/ f relationship found in the modulation power spectrum of natural sounds to transmit modulation information efficiently, with reduced redundancy.
CAS PubMed PubMed Central Google Scholar
Woolley, S. M., Fremouw, T. E., Hsu, A. & Theunissen, F. E. Tuning for spectro-temporal modulations as a mechanism for auditory discrimination of natural sounds. Nature Neurosci. 8, 1371–1379 (2005).
CAS PubMed Google Scholar
Gerhardt, H. C. & Huber, F. (eds) Acoustic Communication in Insects and Anurans: Common Problems and Diverse Solutions Ch. 4 82–128 (University of Chicago Press, 2002).
Google Scholar
Koppl, C., Gleich, O. & Manley, G. A. An auditory fovea in the barn owl cochlea. J. Comp. Physiol. 171, 695–704 (1993).
Google Scholar
Bruns, V. & Schmieszek, E. Cochlear innervation in the greater horseshoe bat: demonstration of an acoustic fovea. Hear. Res. 3, 27–43 (1980).
CAS PubMed Google Scholar
Lewicki, M. S. Efficient coding of natural sounds. Nature Neurosci. 5, 356–363 (2002). Lewicki shows that the particular decomposition of sound into frequency channels found at the level of the auditory nerve results in an efficient representation of a mixture of animal communication sounds and environmental sounds.
CAS PubMed Google Scholar
Moore, R. C., Lee, T. & Theunissen, F. E. Noise-invariant neurons in the avian auditory cortex: hearing the song in noise. PLoS Comput. Biol. 9, e1002942 (2013). This study demonstrates the presence of neurons that robustly encode song in noise in higher levels of the avian auditory system and, inspired by the spectrotemporal filtering of these neurons, proposes an algorithm for real-time noise reduction.
CAS PubMed PubMed Central Google Scholar
McDermott, J. H., Schemitsch, M. & Simoncelli, E. P. Summary statistics in auditory perception. Nature Neurosci. 16, 493–498 (2013). McDermott et al . show that the acoustical statistical structure perceived depends on the length of the sound and that, for longer, complex sounds, only averaged statistics are perceived, leading to percepts of sound texture.
CAS PubMed Google Scholar
Smith, E. C. & Lewicki, M. S. Efficient auditory coding. Nature 439, 978–982 (2006).
CAS PubMed Google Scholar
Marmarelis, P. Z. & Marmarelis, V. Z. Analysis of Physiological Systems: The White Noise Approach (Plenum, 1978).
Google Scholar
Aertsen, A. M. & Johannesma, P. I. The spectro-temporal receptive field. A functional characteristic of auditory neurons. Biol. Cybern. 42, 133–143 (1981).
CAS PubMed Google Scholar
Eggermont, J. J., Aertsen, A. M. & Johannesma, P. I. Quantitative characterisation procedure for auditory neurons based on the spectro-temporal receptive field. Hear. Res. 10, 167–190 (1983).
CAS PubMed Google Scholar
Eggermont, J. J., Aertsen, A. M. & Johannesma, P. I. Prediction of the responses of auditory neurons in the midbrain of the grass frog based on the spectro-temporal receptive field. Hear. Res. 10, 191–202 (1983).
CAS PubMed Google Scholar
Schafer, M., Rubsamen, R., Dorrscheidt, G. J. & Knipschild, M. Setting complex tasks to single units in the avian auditory forebrain. II. Do we really need natural stimuli to describe neuronal response characteristics? Hear. Res. 57, 231–244 (1992).
CAS PubMed Google Scholar
Andoni, S., Li, N. & Pollak, G. D. Spectrotemporal receptive fields in the inferior colliculus revealing selectivity for spectral motion in conspecific vocalizations. J. Neurosci. 27, 4882–4893 (2007).
CAS PubMed PubMed Central Google Scholar
Ulanovsky, N., Las, L. & Nelken, I. Processing of low-probability sounds by cortical neurons. Nature Neurosci. 6, 391–398 (2003).
CAS PubMed Google Scholar
Asari, H. & Zador, A. M. Long-lasting context dependence constrains neural encoding models in rodent auditory cortex. J. Neurophysiol. 102, 2638–2656 (2009).
PubMed PubMed Central Google Scholar
Woolley, S. M., Gill, P. R. & Theunissen, F. E. Stimulus-dependent auditory tuning results in synchronous population coding of vocalizations in the songbird midbrain. J. Neurosci. 26, 2499–2512 (2006).
CAS PubMed Google Scholar
Theunissen, F. E., Sen, K. & Doupe, A. J. Spectral-temporal receptive fields of nonlinear auditory neurons obtained using natural sounds. J. Neurosci. 20, 2315–2331 (2000).
CAS PubMed Google Scholar
Theunissen, F. E. et al. Estimating spatio-temporal receptive fields of auditory and visual neurons from their responses to natural stimuli. Network 12, 289–316 (2001). This methods paper describes the analytical solution for the simplest normalized and regularized linear regression solution for the estimation of STRFs from responses to natural stimuli.
CAS PubMed Google Scholar
Sahani, M. & Linden, J. in Advances in Neural Infomation Processing Systems (eds Becker, S., Thrun, S. & Obermeyer, K.) 301–308 (MIT Press, 2003).
Google Scholar
Hoerl, A. E. & Kennard, R. W. Ridge regression: biased estimation for nonorthogonal problems. Technometrics 42, 80–86 (2000).
Google Scholar
David, S. V., Mesgarani, N. & Shamma, S. A. Estimating sparse spectro-temporal receptive fields with natural stimuli. Network 18, 191–212 (2007).
PubMed Google Scholar
Christianson, G. B., Sahani, M. & Linden, J. F. The consequences of response nonlinearities for interpretation of spectrotemporal receptive fields. J. Neurosci. 28, 446–455 (2008).
CAS PubMed PubMed Central Google Scholar
Schneider, D. M. & Woolley, S. M. N. Extra-classical tuning predicts stimulus-dependent receptive fields in auditory neurons. J. Neurosci. 31, 11867–11878 (2011).
CAS PubMed PubMed Central Google Scholar
Gill, P., Zhang, J., Woolley, S. M., Fremouw, T. & Theunissen, F. E. Sound representation methods for spectro-temporal receptive field estimation. J. Comput. Neurosci. 21, 5–20 (2006).
PubMed Google Scholar
David, S. V. & Shamma, S. A. Integration over multiple timescales in primary auditory cortex. J. Neurosci. 33, 19154–19166 (2013).
CAS PubMed PubMed Central Google Scholar
Gill, P., Woolley, S. M., Fremouw, T. & Theunissen, F. E. What's that sound? Auditory area CLM encodes stimulus surprise, not intensity or intensity changes. J. Neurophysiol. 99, 2809–2820 (2008).
PubMed Google Scholar
Calabrese, A., Schumacher, J. W., Schneider, D. M., Paninski, L. & Woolley, S. M. N. A generalized linear model for estimating spectrotemporal receptive fields from responses to natural sounds. PLoS ONE 6, e16104 (2011).
CAS PubMed PubMed Central Google Scholar
McFarland, J. M., Cui, Y. W. & Butts, D. A. Inferring nonlinear neuronal computation based on physiologically plausible inputs. PloS Comput. Biol. 9, e1003143 (2013).
CAS PubMed PubMed Central Google Scholar
Sharpee, T., Rust, N. C. & Bialek, W. Analyzing neural responses to natural signals: maximally informative dimensions. Neural Comput. 16, 223–250 (2004).
PubMed Google Scholar
Atencio, C. A., Sharpee, T. O. & Schreiner, C. E. Receptive field dimensionality increases from the auditory midbrain to cortex. J. Neurophysiol. 107, 2594–2603 (2012). In this study, Atencio et al . use the maximally informative dimension algorithm to extract multicomponent STRFs and show that the number of components needed to characterize auditory neurons is greater in the cortex than in the thalamus.
PubMed PubMed Central Google Scholar
Depireux, D. A. & Elhilali, M. (eds) Handbook of Modern Techniques in Auditory Cortex (Nova Science, 2014).
Google Scholar
Ahrens, M. B., Linden, J. F. & Sahani, M. Nonlinearities and contextual influences in auditory cortical responses modeled with multilinear spectrotemporal methods. J. Neurosci. 28, 1929–1942 (2008).
CAS PubMed PubMed Central Google Scholar
Woolley, S. M., Gill, P. R., Fremouw, T. & Theunissen, F. E. Functional groups in the avian auditory system. J. Neurosci. 29, 2780–2793 (2009). This study describes the STRFs found in the avian inferior colliculus and avian auditory cortex and shows how they form unique functional groups that extract distinct features of natural sounds that are in turn important for distinct percepts.
CAS PubMed PubMed Central Google Scholar
Amin, N., Gill, P. & Theunissen, F. E. Role of the zebra finch auditory thalamus in generating complex representations for natural sounds. J. Neurophysiol. 104, 784–798 (2010).
PubMed PubMed Central Google Scholar
Nagel, K. I. & Doupe, A. J. Organizing principles of spectro-temporal encoding in the avian primary auditory area field L. Neuron 58, 938–955 (2008).
CAS PubMed PubMed Central Google Scholar
Kim, G. & Doupe, A. Organized representation of spectrotemporal features in songbird auditory forebrain. J. Neurosci. 31, 16977–16990 (2011).
CAS PubMed PubMed Central Google Scholar
Miller, L. M., Escabi, M. A., Read, H. L. & Schreiner, C. E. Functional convergence of response properties in the auditory thalamocortical system. Neuron 32, 151–160 (2001). This study compares the STRFs in the mammalian auditory thalamus with those in the auditory cortex, illustrating the hierarchical processing found in the ascending auditory processing stream.
CAS PubMed Google Scholar
Miller, L. M., Escabi, M. A., Read, H. L. & Schreiner, C. E. Spectrotemporal receptive fields in the lemniscal auditory thalamus and cortex. J. Neurophysiol. 87, 516–527 (2002).
PubMed Google Scholar
Depireux, D. A., Simon, J. Z., Klein, D. J. & Shamma, S. A. Spectro-temporal response field characterization with dynamic ripples in ferret primary auditory cortex. J. Neurophysiol. 85, 1220–1234 (2001).
CAS PubMed Google Scholar
Chi, T., Ru, P. & Shamma, S. A. Multiresolution spectrotemporal analysis of complex sounds. J. Acoust. Soc. Am. 118, 887–906 (2005).
PubMed Google Scholar
Escabi, M. A. & Schreiner, C. E. Nonlinear spectrotemporal sound analysis by neurons in the auditory midbrain. J. Neurosci. 22, 4114–4131 (2002).
CAS PubMed Google Scholar
Escabi, M. A., Miller, L. M., Read, H. L. & Schreiner, C. E. Naturalistic auditory contrast improves spectrotemporal coding in the cat inferior colliculus. J. Neurosci. 23, 11489–11504 (2003).
CAS PubMed PubMed Central Google Scholar
Carlson, N. L., Ming, V. L. & Deweese, M. R. Sparse codes for speech predict spectrotemporal receptive fields in the inferior colliculus. PLoS Comput. Biol. 8, e1002594 (2012).
CAS PubMed PubMed Central Google Scholar
Fritz, J., Shamma, S., Elhilali, M. & Klein, D. Rapid task-related plasticity of spectrotemporal receptive fields in primary auditory cortex. Nature Neurosci. 6, 1216–1223 (2003).
CAS PubMed Google Scholar
David, S. V., Fritz, J. B. & Shamma, S. A. Task reward structure shapes rapid receptive field plasticity in auditory cortex. Proc. Natl Acad. Sci. USA 109, 2144–2149 (2012).
CAS PubMed Google Scholar
Rabinowitz, N. C., Willmore, B. D. B., Schnupp, J. W. H. & King, A. J. Spectrotemporal contrast kernels for neurons in primary auditory cortex. J. Neurosci. 32, 11271–11284 (2012).
CAS PubMed PubMed Central Google Scholar
Mesgarani, N., David, S. V., Fritz, J. B. & Shamma, S. A. Influence of context and behavior on stimulus reconstruction from neural activity in primary auditory cortex. J. Neurophysiol. 102, 3329–3339 (2009).
PubMed PubMed Central Google Scholar
Amin, N., Gastpar, M. & Theunissen, F. E. Selective and efficient neural coding of communication signals depends on early acoustic and social environment. PLoS ONE 8, e61417 (2013).
CAS PubMed PubMed Central Google Scholar
Mesgarani, N., Slaney, M. & Shamma, S. A. Discrimination of speech from nonspeech based on multiscale spectro-temporal modulations. IEEE Trans. Speech Audio Process. 14, 920–930 (2006).
Google Scholar
Mesgarani, N., David, S. V., Fritz, J. B. & Shamma, S. A. Phoneme representation and classification in primary auditory cortex. J. Acoust. Soc. Am. 123, 899–909 (2008).
PubMed Google Scholar
McDermott, J. H. & Simoncelli, E. P. Sound texture perception via statistics of the auditory periphery: evidence from sound synthesis. Neuron 71, 926–940 (2011).
CAS PubMed PubMed Central Google Scholar
Libersat, F., Murray, J. A. & Hoy, R. R. Frequency as a releaser in the courtship song of 2 crickets, Gryllus-bimaculatus (de Geer) and Teleogryllus oceanicus: a neuroethological analysis. J. Comp. Physiol. A 174, 485–494 (1994).
CAS PubMed Google Scholar
Feng, A. S., Hall, J. C. & Gooler, D. M. Neural basis of sound pattern-recognition in anurans. Prog. Neurobiol. 34, 313–329 (1990).
CAS PubMed Google Scholar
Grimsley, J. M., Shanbhag, S. J., Palmer, A. R. & Wallace, M. N. Processing of communication calls in Guinea pig auditory cortex. PLoS ONE 7, e51646 (2012).
CAS PubMed PubMed Central Google Scholar
McCasland, J. S. & Konishi, M. Interactions between auditory and motor activities in an avian song control nucleus. Proc. Natl Acad. Sci. USA 78, 7815–7819 (1981).
CAS PubMed Google Scholar
Doupe, A. J. & Konishi, M. Song-selective auditory circuits in the vocal control system of the zebra finch. Proc. Natl Acad. Sci. USA 88, 11339–11343 (1991).
CAS PubMed Google Scholar
Grace, J. A., Amin, N., Singh, N. C. & Theunissen, F. E. Selectivity for conspecific song in the zebra finch auditory forebrain. J. Neurophysiol. 89, 472–487 (2003).
PubMed Google Scholar
Newman, J. & Wollberg, Z. Multiple coding of species-specific vocalizations in the auditory cortex of squirrel monkeys. Brain Res. 54, 287–304 (1978).
Google Scholar
Huetz, C., Gourevitch, B. & Edeline, J. M. Neural codes in the thalamocortical auditory system: from artificial stimuli to communication sounds. Hear. Res. 271, 147–158 (2011).
PubMed Google Scholar
Ter-Mikaelian, M., Semple, M. N. & Sanes, D. H. Effects of spectral and temporal disruption on cortical encoding of gerbil vocalizations. J. Neurophysiol. 110, 1190–1204 (2013).
PubMed PubMed Central Google Scholar
Suta, D., Popelar, J. & Syka, J. Coding of communication calls in the subcortical and cortical structures of the auditory system. Physiol. Res. 57 (Suppl. 3), 149–159 (2008).
Google Scholar
Wallace, M. N., Grimsley, J. M., Anderson, L. A. & Palmer, A. R. Representation of individual elements of a complex call sequence in primary auditory cortex. Front. Syst. Neurosci. 7, 72 (2013).
PubMed PubMed Central Google Scholar
Theunissen, F. E. & Doupe, A. J. Temporal and spectral sensitivity of complex auditory neurons in the nucleus HVc of male zebra finches. J. Neurosci. 18, 3786–3802 (1998).
CAS PubMed Google Scholar
Lewicki, M. S. & Konishi, M. Mechanisms underlying the sensitivity of songbird forebrain neurons to temporal order. Proc. Natl Acad. Sci. USA 92, 5582–5586 (1995).
CAS PubMed Google Scholar
Marler, P. R. Avian and primate communication: the problem of natural categories. Neurosci. Biobehav. Rev. 6, 87–94 (1982).
CAS PubMed Google Scholar
Walker, K. M. M., Bizley, J. K., King, A. J. & Schnupp, J. W. H. Multiplexed and robust representations of sound features in auditory cortex. J. Neurosci. 31, 14565–14576 (2011).
CAS PubMed PubMed Central Google Scholar
Meliza, C. D. & Margoliash, D. Emergence of selectivity and tolerance in the avian auditory cortex. J. Neurosci. 32, 15158–15168 (2012).
CAS PubMed PubMed Central Google Scholar
George, I., Cousillas, H., Richard, J. P. & Hausberger, M. A potential neural substrate for processing functional classes of complex acoustic signals. PLoS ONE 3, e2203 (2008).
PubMed PubMed Central Google Scholar
Petkov, C. I. et al. A voice region in the monkey brain. Nature Neurosci. 11, 367–374 (2008).
CAS PubMed Google Scholar
Perrodin, C., Kayser, C., Logothetis, N. K. & Petkov, C. I. Voice cells in the primate temporal lobe. Curr. Biol. 21, 1408–1415 (2011).
CAS PubMed PubMed Central Google Scholar
Cohen, Y. E. et al. A functional role for the ventrolateral prefrontal cortex in non-spatial auditory cognition. Proc. Natl Acad. Sci. USA 106, 20045–20050 (2009).
CAS PubMed Google Scholar
Gifford, G. W., MacLean, K. A., Hauser, M. D. & Cohen, Y. E. The neurophysiology of functionally meaningful categories: macaque ventrolateral prefrontal cortex plays a critical role in spontaneous categorization of species-specific vocalizations. J. Cogn. Neurosci. 17, 1471–1482 (2005).
PubMed Google Scholar
Cohen, Y. E., Theunissen, F., Russ, B. E. & Gill, P. Acoustic features of rhesus vocalizations and their representation in the ventrolateral prefrontal cortex. J. Neurophysiol. 97, 1470–1484 (2007).
PubMed Google Scholar
Rauschecker, J. P. & Scott, S. K. Maps and streams in the auditory cortex: nonhuman primates illuminate human speech processing. Nature Neurosci. 12, 718–724 (2009).
CAS PubMed Google Scholar
Marler, P. Bird calls: their potential for behavioral neurobiology. Ann. NY Acad. Sci. 1016, 31–44 (2004).
PubMed Google Scholar
Prather, J. F., Nowicki, S., Anderson, R. C., Peters, S. & Mooney, R. Neural correlates of categorical perception in learned vocal communication. Nature Neurosci. 12, 221–228 (2009).
CAS PubMed Google Scholar
Marler, P. Birdsong: the acquisition of a learned motor skill. Trends Neurosci. 4, 88–94 (1981).
Google Scholar
Seyfarth, R. M. & Cheney, D. L. Vocal development in vervet monkeys. Animal Behav. 34, 1640–1658 (1986).
Google Scholar
Miranda, J. A. & Liu, R. C. Dissecting natural sensory plasticity: hormones and experience in a maternal context. Hear. Res. 252, 21–28 (2009).
CAS PubMed PubMed Central Google Scholar
Menardy, F. et al. Social experience affects neuronal responses to male calls in adult female zebra finches. Eur. J. Neurosci. 35, 1322–1336 (2012).
CAS PubMed Google Scholar
Gentner, T. Q. & Margoliash, D. Neuronal populations and single cells representing learned auditory objects. Nature 424, 669–674 (2003).
CAS PubMed PubMed Central Google Scholar
Woolley, S. M. N., Hauber, M. E. & Theunissen, F. E. Developmental experience alters information coding in auditory midbrain and forebrain neurons. Dev. Neurobiol. 70, 235–252 (2010).
PubMed PubMed Central Google Scholar
Hauber, M. E., Woolley, S. M. N., Cassey, P. & Theunissen, F. E. Experience dependence of neural responses to different classes of male songs in the primary auditory forebrain of female songbirds. Behav. Brain Res. 243, 184–190 (2013).
PubMed PubMed Central Google Scholar
Sanes, D. H. & Bao, S. W. Tuning up the developing auditory CNS. Curr. Opin. Neurobiol. 19, 188–199 (2009).
CAS PubMed PubMed Central Google Scholar
Doupe, A. J. Song- and order-selective neurons in the songbird anterior forebrain and their emergence during vocal development. J. Neurosci. 17, 1147–1167 (1997).
CAS PubMed Google Scholar
Solis, M. M. & Doupe, A. J. Contributions of tutor and bird's own song experience to neural selectivity in the songbird anterior forebrain. J. Neurosci. 19, 4559–4584 (1999).
CAS PubMed Google Scholar
Volman, S. F. Development of neural selectivity for birdsong during vocal learning. J. Neurosci. 13, 4737–4747 (1993).
CAS PubMed Google Scholar
Kover, H., Gill, K., Tseng, Y. T. L. & Bao, S. W. Perceptual and neuronal boundary learned from higher-order stimulus probabilities. J. Neurosci. 33, 3699–3705 (2013).
CAS PubMed PubMed Central Google Scholar
Moss, C. F. & Surlykke, A. Probing the natural scene by echolocation in bats. Front. Behav. Neurosci. 4, e00033 (2010).
Google Scholar
Cheney, D. L. & Seyfarth, R. M. Assessment of meaning and the detection of unreliable signals by vervet monkeys. Animal Behav. 36, 477–486 (1988).
Google Scholar
Elie, J. E. et al. Vocal communication at the nest between mates in wild zebra finches: a private vocal duet? Animal Behav. 80, 597–605 (2010).
Google Scholar
Penna, M., Llusia, D. & Marquez, R. Propagation of natural toad calls in a Mediterranean terrestrial environment. J. Acoust. Soc. Am. 132, 4025–4031 (2012).
PubMed Google Scholar
Aubin, T. & Jouventin, P. in Advances in the Study of Behavior Vol. 31 (eds Slater, P. J. B., Rosenblatt, J. S., Snowdon, C. T. & Roper, T. J.) 243–277 (Academic, 2002).
Google Scholar
Vignal, C., Mathevon, N. & Mottin, S. Audience drives male songbird response to partner's voice. Nature 430, 448–451 (2004).
CAS PubMed Google Scholar
Kawasaki, M., Margoliash, D. & Suga, N. Delay-tuned combination-sensitive neurons in the auditory-cortex of the vocalizing mustached bat. J. Neurophysiol. 59, 623–635 (1988).
CAS PubMed Google Scholar
Suga, N. & Shimozaw, T. Site of neural attenuation of responses to self-vocalized sounds in echolocating bats. Science 183, 1211–1213 (1974).
CAS PubMed Google Scholar
Eliades, S. J. & Wang, X. Neural substrates of vocalization feedback monitoring in primate auditory cortex. Nature 453, 1102–1106 (2008).
CAS PubMed Google Scholar
Keller, G. B. & Hahnloser, R. H. Neural processing of auditory feedback during vocal practice in a songbird. Nature 457, 187–190 (2009).
CAS PubMed Google Scholar
Miller, C. T. & Wang, X. Q. Sensory–motor interactions modulate a primate vocal behavior: antiphonal calling in common marmosets. J. Comp. Physiol. A Neuroethol. Sens. Neural Behav. Physiol. 192, 27–38 (2006).
PubMed Google Scholar
Fortune, E. S., Rodriguez, C., Li, D., Ball, G. F. & Coleman, M. J. Neural mechanisms for the coordination of duet singing in wrens. Science 334, 666–670 (2011).
CAS PubMed Google Scholar
Lewicki, M. S., Olshausen, B. A., Surlykke, A. & Moss, C. F. Scene analysis in the natural environment. Front. Psychol. 5, 199 (2014).
PubMed PubMed Central Google Scholar
Schneider, D. M. & Woolley, S. M. N. Sparse and background-invariant coding of vocalizations in auditory scenes. Neuron 79, 141–152 (2013).
CAS PubMed PubMed Central Google Scholar
Ding, N. & Simon, J. Z. Adaptive temporal encoding leads to a background-insensitive cortical representation of speech. J. Neurosci. 33, 5728–5735 (2013).
CAS PubMed PubMed Central Google Scholar
Rabinowitz, N. C., Willmore, B. D. B., King, A. J. & Schnupp, J. W. H. Constructing noise-invariant representations of sound in the auditory pathway. PLoS Biol. 11, 1710–1710 (2013). This study shows how increasing levels of adaptation as one ascends the auditory processing stream result in noise-invariant representation of signals at the higher levels.
Google Scholar
Middlebrooks, J. C. & Bremen, P. Spatial stream segregation by auditory cortical neurons. J. Neurosci. 33, 10986–11001 (2013).
CAS PubMed PubMed Central Google Scholar
Fishman, Y. I., Reser, D. H., Arezzo, J. C. & Steinschneider, M. Neural correlates of auditory stream segregation in primary auditory cortex of the awake monkey. Hear. Res. 151, 167–187 (2001).
CAS PubMed Google Scholar
Bee, M. A. & Klump, G. M. Primitive auditory stream segregation: a neurophysiological study in the songbird forebrain. J. Neurophysiol. 92, 1088–1104 (2004).
PubMed Google Scholar
Greenlee, J. D. W. et al. Human auditory cortical activation during self-vocalization. PLoS ONE 6, e14744 (2011).
CAS PubMed PubMed Central Google Scholar
Pasley, B. N. et al. Reconstructing speech from human auditory cortex. PLoS Biol. 10, e1001251 (2012).
CAS PubMed PubMed Central Google Scholar
Regev, M., Honey, C. J., Simony, E. & Hasson, U. Selective and invariant neural responses to spoken and written narratives. J. Neurosci. 33, 15978–15988 (2013).
CAS PubMed PubMed Central Google Scholar
Al-Mana, D., Ceranic, B., Djahanbakhch, O. & Luxon, L. M. Hormones and the auditory system: a review of physiology and pathophysiology. Neuroscience 153, 881–900 (2008).
CAS PubMed Google Scholar
Teramoto, W., Sakamoto, S., Furune, F., Gyoba, J. & Suzuki, Y. Compression of auditory space during forward self-motion. PLoS ONE 7, e39402 (2012).
CAS PubMed PubMed Central Google Scholar
Beckers, G. J. L. & Gahr, M. Large-scale synchronized activity during vocal deviance detection in the zebra finch auditory forebrain. J. Neurosci. 32, 10594–10608 (2012).
CAS PubMed PubMed Central Google Scholar
Schneidman, E. et al. Synergy from silence in a combinatorial neural code. J. Neurosci. 31, 15732–15741 (2011).
CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

The authors thank the members of the Theunissen laboratory and the three anonymous reviewers for critical feedback and suggestions.

Author information

Authors and Affiliations

Department of Psychology and Helen Wills Neuroscience Institute, University of California, Berkeley, 94720, California, USA
Frédéric E. Theunissen & Julie E. Elie

Authors

Frédéric E. Theunissen
View author publications
You can also search for this author in PubMed Google Scholar
Julie E. Elie
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Frédéric E. Theunissen or Julie E. Elie.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Glossary

Neural tuning: This term refers to the response property of brain cells by which they selectively represent a particular type of sensory, motor or cognitive information.
Machine learning: A branch of computer science that combines statistical algorithms and artificial intelligence to extract information from complex data sets.
Sound spectrum: (Also called the frequency spectrum). The representation of a waveform in its frequency domain. The sound spectrum is short for the frequency power spectrum of that sound. The sound spectrum is obtained by taking the amplitude square of the Fourier transform of a waveform and shows the energy in a signal as a function of frequency.
Envelope: The smooth representation (low-pass filtering) of the amplitude of a signal as a function of frequency (spectral envelope) or time (temporal envelope).
Formants: The peaks in the spectral envelope that correspond to a resonance in the sound source. In animal vocalizations and human speech, formants are resonance in the upper vocal tract. Formants are also the distinguishing or meaningful frequency components of voiced human speech.
Spectrogram: The representation of the sound in the time and frequency domain that shows how the sound spectrum varies over time. This representation is often obtained by calculating the sound spectrum in a short-windowed section of sound and repeating this calculation by moving the window in time. This method of calculation is also called the short-time Fourier transform.
Modulation power spectrum: The spectrum obtained from the log of the amplitude of a sound spectrogram. The name comes from the fact that each 'row' in a spectrogram pictorially represents in a narrow frequency channel the amplitude envelope of the sound in that channel. These envelopes 'modulate' the intensity of the signals, and the modulation power spectrum shows the power of these modulations for different rates (that is, the temporal modulation frequencies on the x axis). Similarly, a column in the spectrogram shows the spectral envelope at a particular time point. The power spectrum of this spectral envelope shows the power at particular spectral modulation frequencies (that is, on the y axis). More generally, a spectrogram jointly tracks amplitude envelopes in both time and in frequency and the full modulation power spectrum shows the power of both spectral and temporal modulations.
Harmonic: A complex sound with a strong pitch percept that is made of a tone at a fundamental frequency or first frequency (f₀, corresponding to the perceived pitch) and a series of overtones at multiples of this fundamental frequency.
Timbre: The quality of a sound (also known as tone colour or tone quality from psychoacoustics) that can be used to distinguish different types of sound production (voices or various musical instruments) and that is determined by certain physical properties, such as the sound spectrum.
Invariance: The property of something that does not change under a transformation (for example, in rotation-invariant face neurons, a particular face will elicit very similar responses irrespective of the face orientation).
Decorrelation: The process used to reduce the autocorrelation or the redundancy of information within a signal.
Inferior colliculus: The principal auditory nucleus of the mammalian midbrain, which receives inputs from several peripheral brainstem nuclei, including direct and indirect inputs from the cochlear nucleus.
Gain: For neurons, the gain is the sensitivity of a neural output to the input signal.
Frequency filters: Devices for selectively transmitting specified frequencies of the input signal by attenuating, or filtering out, unwanted frequencies.
Spike-triggered average: (STA). A tool for estimating the stimulus–response function of a neuron by using the average stimulus before each spike. For linear neurons and for white noise stimuli, the STA will yield the neuron's receptive field.
Spectrotemporal receptive field: (STRF). In the most general sense, the STRF is used to label the stimulus–response function of a neuron that uses any spectrotemporal description of the sound as the stimulus (for example, the spectrogram). That general STRF can be a linear or non-linear model.
Regularization: In statistics and machine learning, regularization methods are used for model selection, in particular to prevent overfitting by penalizing models with extreme parameter values or extreme number of parameters. For example, principal component analysis applied on a dataset can be used to reduce the dimensionality of the dataset. The resulting model is a principal component regression or subspace regression.
Priors: Probability distributions attached to parameters before certain data are observed. Prior is short for 'prior probability'.
Principal component regression: The regularization procedure that uses principal component analysis in conjunction with a linear regression analysis to reduce the dimensionality of a data set.
Ridge regression: The combination of linear regression with a regularization procedure that assumes a priori that the coefficients of the regression are normally distributed around zero. By varying the spread of this normal distribution, one can constrain the coefficients to be very close to zero unless there is strong evidence against it in the data. Ridge regression is therefore useful when there are too many parameters in the linear regression to be fitted with limited data (that is, to prevent overfitting). In those cases, most parameters will be equal to zero and only the few that really matter will have non-zero values. It is in this sense, that ridge regression reduces the dimensionality of the problem. Ridge regression is also known as L2 regularization and statisticians also call regularization procedures of this type shrinkage.
Zero-mean Gaussian: A quality of a distribution that is normal with a mean of zero.
Generalized linear models: (GLMs). A generalization of ordinary linear regression that allows for the random aspects of response variables (that is, the noise) to have distributions other than the normal distribution (for example, a Poisson distribution).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Theunissen, F., Elie, J. Neural processing of natural sounds. Nat Rev Neurosci 15, 355–366 (2014). https://doi.org/10.1038/nrn3731

Download citation

Published: 20 May 2014
Issue Date: June 2014
DOI: https://doi.org/10.1038/nrn3731

This article is cited by

A perspective on neuroethology: what the past teaches us about the future of neuroethology
- M. Jerome Beetz
Journal of Comparative Physiology A (2024)
Is song processing distinct and special in the auditory cortex?
- Ilana Harris
- Efe C. Niven
- Sophie K. Scott
Nature Reviews Neuroscience (2023)
Stimulation with acoustic white noise enhances motor excitability and sensorimotor integration
- Giovanni Pellegrino
- Mattia Pinardi
- Giovanni Di Pino
Scientific Reports (2022)
Neuronal On- and Off-type heterogeneities improve population coding of envelope signals in the presence of stimulus-induced noise
- Volker Hofmann
- Maurice J. Chacron
Scientific Reports (2020)
Stimulus- and goal-oriented frameworks for understanding natural vision
- Maxwell H. Turner
- Luis Gonzalo Sanchez Giraldo
- Fred Rieke
Nature Neuroscience (2019)