Key Points
-
Natural sounds include animal vocalizations, environmental sounds such as wind, water and fire noises, and non-vocal sounds made by animals and humans for communication. These natural sounds have characteristic statistical properties that make them perceptually salient and that drive auditory neurons in optimal regimes for information transmission.
-
Recent advances in statistics and computer sciences have enabled neurophysiologists to extract the stimulus–response function of complex auditory neurons from responses to natural sounds. These studies have shown a hierarchical processing that leads to the neural detection of progressively more complex natural sound features and have demonstrated the importance of the acoustical and behavioural context for the neural responses.
-
High-level auditory neurons have been shown to be exquisitely selective for conspecific calls. This fine selectivity could have an important role in species recognition, vocal learning in songbirds and, in the case of the bats, the processing of the sounds used in echolocation. Research that investigates how communication sounds are categorized into behaviourally meaningful groups (for example, call types in animals and words in human speech) remains in its infancy.
-
Animals and humans also excel at separating communication sounds from each other and from background noise. Neurons that detect communication calls in noise have been found, but the neural computations involved in sound source separation and natural auditory scene analysis remain overall poorly understood. Thus, future auditory research will have to focus not only on how natural sounds are processed by the auditory system but also on the computations that enable this processing to occur in natural listening situations.
-
The complexity of the computations needed in the natural hearing task might require a high-dimensional representation provided by an ensemble of neurons, and the use of natural sounds might be the best solution for understanding the ensemble neural code.
Abstract
We might be forced to listen to a high-frequency tone at our audiologist's office or we might enjoy falling asleep with a white-noise machine, but the sounds that really matter to us are the voices of our companions or music from our favourite radio station. The auditory system has evolved to process behaviourally relevant natural sounds. Research has shown not only that our brain is optimized for natural hearing tasks but also that using natural sounds to probe the auditory system is the best way to understand the neural computations that enable us to comprehend speech or appreciate music.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$189.00 per year
only $15.75 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Darrigol, O. Number and measure: Hermann von Helmholtz at the crossroads of mathematics, physics, and psychology. Stud. Hist. Philos. Sci. 34, 515–573 (2003).
Robles, L. & Ruggero, M. A. Mechanics of the mammalian cochlea. Physiol. Rev. 81, 1305–1352 (2001).
Woolley, S. M. & Casseday, J. H. Response properties of single neurons in the zebra finch auditory midbrain: response patterns, frequency coding, intensity coding, and spike latencies. J. Neurophysiol. 91, 136–151 (2004).
Eggermont, J. J. Between sound and perception: reviewing the search for a neural code. Hear. Res. 157, 1–42 (2001).
Suga, N., O'Neill, W. E. & Manabe, T. Cortical neurons sensitive to combinations of information-bearing elements of biosonar signals in the moustache bat. Science 200, 778–781 (1978).
Margoliash, D. & Fortune, E. S. Temporal and harmonic combination-sensitive neurons in the zebra finch's HVc. J. Neurosci. 12, 4309–4326 (1992).
Margoliash, D. & Konishi, M. Auditory representation of autogenous song in the song system of white-crowned sparrows. Proc. Natl Acad. Sci. USA 82, 5997–6000 (1985).
Nelken, I., Rotman, Y. & Bar Yosef, O. Responses of auditory-cortex neurons to structural features of natural sounds. Nature 397, 154–157 (1999). This study reports a very surprising result: auditory cortical neurons can be more sensitive to the natural auditory context than to the main component of the sound signal.
Hsu, A., Woolley, S. M., Fremouw, T. E. & Theunissen, F. E. Modulation power and phase spectrum of natural sounds enhance neural encoding performed by single auditory neurons. J. Neurosci. 24, 9201–9211 (2004). Using estimations of mutual information, this study shows that sounds that are otherwise identical in intensity and frequency but have a modulation power spectrum with natural distributions are more efficiently encoded in the avian auditory cortex.
Rieke, F., Bodnar, D. A. & Bialek, W. Naturalistic stimuli increase the rate and efficiency of information transmission by primary auditory afferents. Proc. R. Soc. Lond. B 262, 259–265 (1995).
Hedwig, B. Pulses, patterns and paths: neurobiology of acoustic behaviour in crickets. J. Comp. Physiol. A Neuroethol. Sens. Neural Behav. Physiol. 192, 677–689 (2006).
Arcadi, A. C., Robert, D. & Boesch, C. Buttress drumming by wild chimpanzees: temporal patterning, phrase integration into loud calls, and preliminary evidence for individual distinctiveness. Primates 39, 505–518 (1998).
Voss, R. F. & Clarke, J. 1/f noise in music and speech. Nature 258, 317–318 (1975). In this first study on the statistical structure in music and speech sounds, Voss and Clarke show that the amplitude envelope and other features of these natural sounds follow a power law relationship and discuss the significance of this relationship.
Attias, H. & Schreiner, C. E. in Advances in Neural Information Processing Systems (eds Mozer, M. C., Jordan, M. I. & Petsche, T.) 27–33 (MIT Press, 1997).
Singh, N. C. & Theunissen, F. E. Modulation spectra of natural sounds and ethological theories of auditory processing. J. Acoust. Soc. Am. 114, 3394–3411 (2003). This paper introduces the joint temporal and spectral modulation power spectrum of sounds and shows that natural sounds have a characteristic signature in this representation.
Chen, J. D., Paliwal, K. K. & Nakamura, S. Cepstrum derived from differentiated power spectrum for robust speech recognition. Speech Comm. 41, 469–484 (2003).
Cohen, L. in Time–Frequency Analysis Ch. 3 50–52 (Prentice Hall, 1995).
Bialek, W., Nemenman, I. & Tishby, N. Predictability, complexity, and learning. Neural Comput. 13, 2409–2463 (2001).
Elliott, T. M., Hamilton, L. S. & Theunissen, F. E. Acoustic structure of the five perceptual dimensions of timbre in orchestral instrument tones. J. Acoust. Soc. Am. 133, 389–404 (2013).
Garcia-Lazaro, J. A., Ahmed, B. & Schnupp, J. W. H. Emergence of tuning to natural stimulus statistics along the central auditory pathway. PLoS ONE 6, e22584 (2011).
Srivastava, A., Lee, A. B., Simoncelli, E. P. & Zhu, S. C. On advances in statistical modeling of natural images. J. Math. Imaging Vis. 18, 17–33 (2003).
Ruderman, D. L. Origins of scaling in natural images. Vis. Res. 37, 3385–3398 (1997).
Simoncelli, E. P. & Olshausen, B. A. Natural image statistics and neural representation. Annu. Rev. Neurosci. 24, 1193–1216 (2001).
Rodriguez, F. A., Chen, C., Read, H. L. & Escabi, M. A. Neural modulation tuning characteristics scale to efficiently encode natural sound statistics. J. Neurosci. 30, 15969–15980 (2010). In this study, Rodriguez et al . show that the modulation tuning of neurons in the inferior colliculus balances out the 1/ f relationship found in the modulation power spectrum of natural sounds to transmit modulation information efficiently, with reduced redundancy.
Woolley, S. M., Fremouw, T. E., Hsu, A. & Theunissen, F. E. Tuning for spectro-temporal modulations as a mechanism for auditory discrimination of natural sounds. Nature Neurosci. 8, 1371–1379 (2005).
Gerhardt, H. C. & Huber, F. (eds) Acoustic Communication in Insects and Anurans: Common Problems and Diverse Solutions Ch. 4 82–128 (University of Chicago Press, 2002).
Koppl, C., Gleich, O. & Manley, G. A. An auditory fovea in the barn owl cochlea. J. Comp. Physiol. 171, 695–704 (1993).
Bruns, V. & Schmieszek, E. Cochlear innervation in the greater horseshoe bat: demonstration of an acoustic fovea. Hear. Res. 3, 27–43 (1980).
Lewicki, M. S. Efficient coding of natural sounds. Nature Neurosci. 5, 356–363 (2002). Lewicki shows that the particular decomposition of sound into frequency channels found at the level of the auditory nerve results in an efficient representation of a mixture of animal communication sounds and environmental sounds.
Moore, R. C., Lee, T. & Theunissen, F. E. Noise-invariant neurons in the avian auditory cortex: hearing the song in noise. PLoS Comput. Biol. 9, e1002942 (2013). This study demonstrates the presence of neurons that robustly encode song in noise in higher levels of the avian auditory system and, inspired by the spectrotemporal filtering of these neurons, proposes an algorithm for real-time noise reduction.
McDermott, J. H., Schemitsch, M. & Simoncelli, E. P. Summary statistics in auditory perception. Nature Neurosci. 16, 493–498 (2013). McDermott et al . show that the acoustical statistical structure perceived depends on the length of the sound and that, for longer, complex sounds, only averaged statistics are perceived, leading to percepts of sound texture.
Smith, E. C. & Lewicki, M. S. Efficient auditory coding. Nature 439, 978–982 (2006).
Marmarelis, P. Z. & Marmarelis, V. Z. Analysis of Physiological Systems: The White Noise Approach (Plenum, 1978).
Aertsen, A. M. & Johannesma, P. I. The spectro-temporal receptive field. A functional characteristic of auditory neurons. Biol. Cybern. 42, 133–143 (1981).
Eggermont, J. J., Aertsen, A. M. & Johannesma, P. I. Quantitative characterisation procedure for auditory neurons based on the spectro-temporal receptive field. Hear. Res. 10, 167–190 (1983).
Eggermont, J. J., Aertsen, A. M. & Johannesma, P. I. Prediction of the responses of auditory neurons in the midbrain of the grass frog based on the spectro-temporal receptive field. Hear. Res. 10, 191–202 (1983).
Schafer, M., Rubsamen, R., Dorrscheidt, G. J. & Knipschild, M. Setting complex tasks to single units in the avian auditory forebrain. II. Do we really need natural stimuli to describe neuronal response characteristics? Hear. Res. 57, 231–244 (1992).
Andoni, S., Li, N. & Pollak, G. D. Spectrotemporal receptive fields in the inferior colliculus revealing selectivity for spectral motion in conspecific vocalizations. J. Neurosci. 27, 4882–4893 (2007).
Ulanovsky, N., Las, L. & Nelken, I. Processing of low-probability sounds by cortical neurons. Nature Neurosci. 6, 391–398 (2003).
Asari, H. & Zador, A. M. Long-lasting context dependence constrains neural encoding models in rodent auditory cortex. J. Neurophysiol. 102, 2638–2656 (2009).
Woolley, S. M., Gill, P. R. & Theunissen, F. E. Stimulus-dependent auditory tuning results in synchronous population coding of vocalizations in the songbird midbrain. J. Neurosci. 26, 2499–2512 (2006).
Theunissen, F. E., Sen, K. & Doupe, A. J. Spectral-temporal receptive fields of nonlinear auditory neurons obtained using natural sounds. J. Neurosci. 20, 2315–2331 (2000).
Theunissen, F. E. et al. Estimating spatio-temporal receptive fields of auditory and visual neurons from their responses to natural stimuli. Network 12, 289–316 (2001). This methods paper describes the analytical solution for the simplest normalized and regularized linear regression solution for the estimation of STRFs from responses to natural stimuli.
Sahani, M. & Linden, J. in Advances in Neural Infomation Processing Systems (eds Becker, S., Thrun, S. & Obermeyer, K.) 301–308 (MIT Press, 2003).
Hoerl, A. E. & Kennard, R. W. Ridge regression: biased estimation for nonorthogonal problems. Technometrics 42, 80–86 (2000).
David, S. V., Mesgarani, N. & Shamma, S. A. Estimating sparse spectro-temporal receptive fields with natural stimuli. Network 18, 191–212 (2007).
Christianson, G. B., Sahani, M. & Linden, J. F. The consequences of response nonlinearities for interpretation of spectrotemporal receptive fields. J. Neurosci. 28, 446–455 (2008).
Schneider, D. M. & Woolley, S. M. N. Extra-classical tuning predicts stimulus-dependent receptive fields in auditory neurons. J. Neurosci. 31, 11867–11878 (2011).
Gill, P., Zhang, J., Woolley, S. M., Fremouw, T. & Theunissen, F. E. Sound representation methods for spectro-temporal receptive field estimation. J. Comput. Neurosci. 21, 5–20 (2006).
David, S. V. & Shamma, S. A. Integration over multiple timescales in primary auditory cortex. J. Neurosci. 33, 19154–19166 (2013).
Gill, P., Woolley, S. M., Fremouw, T. & Theunissen, F. E. What's that sound? Auditory area CLM encodes stimulus surprise, not intensity or intensity changes. J. Neurophysiol. 99, 2809–2820 (2008).
Calabrese, A., Schumacher, J. W., Schneider, D. M., Paninski, L. & Woolley, S. M. N. A generalized linear model for estimating spectrotemporal receptive fields from responses to natural sounds. PLoS ONE 6, e16104 (2011).
McFarland, J. M., Cui, Y. W. & Butts, D. A. Inferring nonlinear neuronal computation based on physiologically plausible inputs. PloS Comput. Biol. 9, e1003143 (2013).
Sharpee, T., Rust, N. C. & Bialek, W. Analyzing neural responses to natural signals: maximally informative dimensions. Neural Comput. 16, 223–250 (2004).
Atencio, C. A., Sharpee, T. O. & Schreiner, C. E. Receptive field dimensionality increases from the auditory midbrain to cortex. J. Neurophysiol. 107, 2594–2603 (2012). In this study, Atencio et al . use the maximally informative dimension algorithm to extract multicomponent STRFs and show that the number of components needed to characterize auditory neurons is greater in the cortex than in the thalamus.
Depireux, D. A. & Elhilali, M. (eds) Handbook of Modern Techniques in Auditory Cortex (Nova Science, 2014).
Ahrens, M. B., Linden, J. F. & Sahani, M. Nonlinearities and contextual influences in auditory cortical responses modeled with multilinear spectrotemporal methods. J. Neurosci. 28, 1929–1942 (2008).
Woolley, S. M., Gill, P. R., Fremouw, T. & Theunissen, F. E. Functional groups in the avian auditory system. J. Neurosci. 29, 2780–2793 (2009). This study describes the STRFs found in the avian inferior colliculus and avian auditory cortex and shows how they form unique functional groups that extract distinct features of natural sounds that are in turn important for distinct percepts.
Amin, N., Gill, P. & Theunissen, F. E. Role of the zebra finch auditory thalamus in generating complex representations for natural sounds. J. Neurophysiol. 104, 784–798 (2010).
Nagel, K. I. & Doupe, A. J. Organizing principles of spectro-temporal encoding in the avian primary auditory area field L. Neuron 58, 938–955 (2008).
Kim, G. & Doupe, A. Organized representation of spectrotemporal features in songbird auditory forebrain. J. Neurosci. 31, 16977–16990 (2011).
Miller, L. M., Escabi, M. A., Read, H. L. & Schreiner, C. E. Functional convergence of response properties in the auditory thalamocortical system. Neuron 32, 151–160 (2001). This study compares the STRFs in the mammalian auditory thalamus with those in the auditory cortex, illustrating the hierarchical processing found in the ascending auditory processing stream.
Miller, L. M., Escabi, M. A., Read, H. L. & Schreiner, C. E. Spectrotemporal receptive fields in the lemniscal auditory thalamus and cortex. J. Neurophysiol. 87, 516–527 (2002).
Depireux, D. A., Simon, J. Z., Klein, D. J. & Shamma, S. A. Spectro-temporal response field characterization with dynamic ripples in ferret primary auditory cortex. J. Neurophysiol. 85, 1220–1234 (2001).
Chi, T., Ru, P. & Shamma, S. A. Multiresolution spectrotemporal analysis of complex sounds. J. Acoust. Soc. Am. 118, 887–906 (2005).
Escabi, M. A. & Schreiner, C. E. Nonlinear spectrotemporal sound analysis by neurons in the auditory midbrain. J. Neurosci. 22, 4114–4131 (2002).
Escabi, M. A., Miller, L. M., Read, H. L. & Schreiner, C. E. Naturalistic auditory contrast improves spectrotemporal coding in the cat inferior colliculus. J. Neurosci. 23, 11489–11504 (2003).
Carlson, N. L., Ming, V. L. & Deweese, M. R. Sparse codes for speech predict spectrotemporal receptive fields in the inferior colliculus. PLoS Comput. Biol. 8, e1002594 (2012).
Fritz, J., Shamma, S., Elhilali, M. & Klein, D. Rapid task-related plasticity of spectrotemporal receptive fields in primary auditory cortex. Nature Neurosci. 6, 1216–1223 (2003).
David, S. V., Fritz, J. B. & Shamma, S. A. Task reward structure shapes rapid receptive field plasticity in auditory cortex. Proc. Natl Acad. Sci. USA 109, 2144–2149 (2012).
Rabinowitz, N. C., Willmore, B. D. B., Schnupp, J. W. H. & King, A. J. Spectrotemporal contrast kernels for neurons in primary auditory cortex. J. Neurosci. 32, 11271–11284 (2012).
Mesgarani, N., David, S. V., Fritz, J. B. & Shamma, S. A. Influence of context and behavior on stimulus reconstruction from neural activity in primary auditory cortex. J. Neurophysiol. 102, 3329–3339 (2009).
Amin, N., Gastpar, M. & Theunissen, F. E. Selective and efficient neural coding of communication signals depends on early acoustic and social environment. PLoS ONE 8, e61417 (2013).
Mesgarani, N., Slaney, M. & Shamma, S. A. Discrimination of speech from nonspeech based on multiscale spectro-temporal modulations. IEEE Trans. Speech Audio Process. 14, 920–930 (2006).
Mesgarani, N., David, S. V., Fritz, J. B. & Shamma, S. A. Phoneme representation and classification in primary auditory cortex. J. Acoust. Soc. Am. 123, 899–909 (2008).
McDermott, J. H. & Simoncelli, E. P. Sound texture perception via statistics of the auditory periphery: evidence from sound synthesis. Neuron 71, 926–940 (2011).
Libersat, F., Murray, J. A. & Hoy, R. R. Frequency as a releaser in the courtship song of 2 crickets, Gryllus-bimaculatus (de Geer) and Teleogryllus oceanicus: a neuroethological analysis. J. Comp. Physiol. A 174, 485–494 (1994).
Feng, A. S., Hall, J. C. & Gooler, D. M. Neural basis of sound pattern-recognition in anurans. Prog. Neurobiol. 34, 313–329 (1990).
Grimsley, J. M., Shanbhag, S. J., Palmer, A. R. & Wallace, M. N. Processing of communication calls in Guinea pig auditory cortex. PLoS ONE 7, e51646 (2012).
McCasland, J. S. & Konishi, M. Interactions between auditory and motor activities in an avian song control nucleus. Proc. Natl Acad. Sci. USA 78, 7815–7819 (1981).
Doupe, A. J. & Konishi, M. Song-selective auditory circuits in the vocal control system of the zebra finch. Proc. Natl Acad. Sci. USA 88, 11339–11343 (1991).
Grace, J. A., Amin, N., Singh, N. C. & Theunissen, F. E. Selectivity for conspecific song in the zebra finch auditory forebrain. J. Neurophysiol. 89, 472–487 (2003).
Newman, J. & Wollberg, Z. Multiple coding of species-specific vocalizations in the auditory cortex of squirrel monkeys. Brain Res. 54, 287–304 (1978).
Huetz, C., Gourevitch, B. & Edeline, J. M. Neural codes in the thalamocortical auditory system: from artificial stimuli to communication sounds. Hear. Res. 271, 147–158 (2011).
Ter-Mikaelian, M., Semple, M. N. & Sanes, D. H. Effects of spectral and temporal disruption on cortical encoding of gerbil vocalizations. J. Neurophysiol. 110, 1190–1204 (2013).
Suta, D., Popelar, J. & Syka, J. Coding of communication calls in the subcortical and cortical structures of the auditory system. Physiol. Res. 57 (Suppl. 3), 149–159 (2008).
Wallace, M. N., Grimsley, J. M., Anderson, L. A. & Palmer, A. R. Representation of individual elements of a complex call sequence in primary auditory cortex. Front. Syst. Neurosci. 7, 72 (2013).
Theunissen, F. E. & Doupe, A. J. Temporal and spectral sensitivity of complex auditory neurons in the nucleus HVc of male zebra finches. J. Neurosci. 18, 3786–3802 (1998).
Lewicki, M. S. & Konishi, M. Mechanisms underlying the sensitivity of songbird forebrain neurons to temporal order. Proc. Natl Acad. Sci. USA 92, 5582–5586 (1995).
Marler, P. R. Avian and primate communication: the problem of natural categories. Neurosci. Biobehav. Rev. 6, 87–94 (1982).
Walker, K. M. M., Bizley, J. K., King, A. J. & Schnupp, J. W. H. Multiplexed and robust representations of sound features in auditory cortex. J. Neurosci. 31, 14565–14576 (2011).
Meliza, C. D. & Margoliash, D. Emergence of selectivity and tolerance in the avian auditory cortex. J. Neurosci. 32, 15158–15168 (2012).
George, I., Cousillas, H., Richard, J. P. & Hausberger, M. A potential neural substrate for processing functional classes of complex acoustic signals. PLoS ONE 3, e2203 (2008).
Petkov, C. I. et al. A voice region in the monkey brain. Nature Neurosci. 11, 367–374 (2008).
Perrodin, C., Kayser, C., Logothetis, N. K. & Petkov, C. I. Voice cells in the primate temporal lobe. Curr. Biol. 21, 1408–1415 (2011).
Cohen, Y. E. et al. A functional role for the ventrolateral prefrontal cortex in non-spatial auditory cognition. Proc. Natl Acad. Sci. USA 106, 20045–20050 (2009).
Gifford, G. W., MacLean, K. A., Hauser, M. D. & Cohen, Y. E. The neurophysiology of functionally meaningful categories: macaque ventrolateral prefrontal cortex plays a critical role in spontaneous categorization of species-specific vocalizations. J. Cogn. Neurosci. 17, 1471–1482 (2005).
Cohen, Y. E., Theunissen, F., Russ, B. E. & Gill, P. Acoustic features of rhesus vocalizations and their representation in the ventrolateral prefrontal cortex. J. Neurophysiol. 97, 1470–1484 (2007).
Rauschecker, J. P. & Scott, S. K. Maps and streams in the auditory cortex: nonhuman primates illuminate human speech processing. Nature Neurosci. 12, 718–724 (2009).
Marler, P. Bird calls: their potential for behavioral neurobiology. Ann. NY Acad. Sci. 1016, 31–44 (2004).
Prather, J. F., Nowicki, S., Anderson, R. C., Peters, S. & Mooney, R. Neural correlates of categorical perception in learned vocal communication. Nature Neurosci. 12, 221–228 (2009).
Marler, P. Birdsong: the acquisition of a learned motor skill. Trends Neurosci. 4, 88–94 (1981).
Seyfarth, R. M. & Cheney, D. L. Vocal development in vervet monkeys. Animal Behav. 34, 1640–1658 (1986).
Miranda, J. A. & Liu, R. C. Dissecting natural sensory plasticity: hormones and experience in a maternal context. Hear. Res. 252, 21–28 (2009).
Menardy, F. et al. Social experience affects neuronal responses to male calls in adult female zebra finches. Eur. J. Neurosci. 35, 1322–1336 (2012).
Gentner, T. Q. & Margoliash, D. Neuronal populations and single cells representing learned auditory objects. Nature 424, 669–674 (2003).
Woolley, S. M. N., Hauber, M. E. & Theunissen, F. E. Developmental experience alters information coding in auditory midbrain and forebrain neurons. Dev. Neurobiol. 70, 235–252 (2010).
Hauber, M. E., Woolley, S. M. N., Cassey, P. & Theunissen, F. E. Experience dependence of neural responses to different classes of male songs in the primary auditory forebrain of female songbirds. Behav. Brain Res. 243, 184–190 (2013).
Sanes, D. H. & Bao, S. W. Tuning up the developing auditory CNS. Curr. Opin. Neurobiol. 19, 188–199 (2009).
Doupe, A. J. Song- and order-selective neurons in the songbird anterior forebrain and their emergence during vocal development. J. Neurosci. 17, 1147–1167 (1997).
Solis, M. M. & Doupe, A. J. Contributions of tutor and bird's own song experience to neural selectivity in the songbird anterior forebrain. J. Neurosci. 19, 4559–4584 (1999).
Volman, S. F. Development of neural selectivity for birdsong during vocal learning. J. Neurosci. 13, 4737–4747 (1993).
Kover, H., Gill, K., Tseng, Y. T. L. & Bao, S. W. Perceptual and neuronal boundary learned from higher-order stimulus probabilities. J. Neurosci. 33, 3699–3705 (2013).
Moss, C. F. & Surlykke, A. Probing the natural scene by echolocation in bats. Front. Behav. Neurosci. 4, e00033 (2010).
Cheney, D. L. & Seyfarth, R. M. Assessment of meaning and the detection of unreliable signals by vervet monkeys. Animal Behav. 36, 477–486 (1988).
Elie, J. E. et al. Vocal communication at the nest between mates in wild zebra finches: a private vocal duet? Animal Behav. 80, 597–605 (2010).
Penna, M., Llusia, D. & Marquez, R. Propagation of natural toad calls in a Mediterranean terrestrial environment. J. Acoust. Soc. Am. 132, 4025–4031 (2012).
Aubin, T. & Jouventin, P. in Advances in the Study of Behavior Vol. 31 (eds Slater, P. J. B., Rosenblatt, J. S., Snowdon, C. T. & Roper, T. J.) 243–277 (Academic, 2002).
Vignal, C., Mathevon, N. & Mottin, S. Audience drives male songbird response to partner's voice. Nature 430, 448–451 (2004).
Kawasaki, M., Margoliash, D. & Suga, N. Delay-tuned combination-sensitive neurons in the auditory-cortex of the vocalizing mustached bat. J. Neurophysiol. 59, 623–635 (1988).
Suga, N. & Shimozaw, T. Site of neural attenuation of responses to self-vocalized sounds in echolocating bats. Science 183, 1211–1213 (1974).
Eliades, S. J. & Wang, X. Neural substrates of vocalization feedback monitoring in primate auditory cortex. Nature 453, 1102–1106 (2008).
Keller, G. B. & Hahnloser, R. H. Neural processing of auditory feedback during vocal practice in a songbird. Nature 457, 187–190 (2009).
Miller, C. T. & Wang, X. Q. Sensory–motor interactions modulate a primate vocal behavior: antiphonal calling in common marmosets. J. Comp. Physiol. A Neuroethol. Sens. Neural Behav. Physiol. 192, 27–38 (2006).
Fortune, E. S., Rodriguez, C., Li, D., Ball, G. F. & Coleman, M. J. Neural mechanisms for the coordination of duet singing in wrens. Science 334, 666–670 (2011).
Lewicki, M. S., Olshausen, B. A., Surlykke, A. & Moss, C. F. Scene analysis in the natural environment. Front. Psychol. 5, 199 (2014).
Schneider, D. M. & Woolley, S. M. N. Sparse and background-invariant coding of vocalizations in auditory scenes. Neuron 79, 141–152 (2013).
Ding, N. & Simon, J. Z. Adaptive temporal encoding leads to a background-insensitive cortical representation of speech. J. Neurosci. 33, 5728–5735 (2013).
Rabinowitz, N. C., Willmore, B. D. B., King, A. J. & Schnupp, J. W. H. Constructing noise-invariant representations of sound in the auditory pathway. PLoS Biol. 11, 1710–1710 (2013). This study shows how increasing levels of adaptation as one ascends the auditory processing stream result in noise-invariant representation of signals at the higher levels.
Middlebrooks, J. C. & Bremen, P. Spatial stream segregation by auditory cortical neurons. J. Neurosci. 33, 10986–11001 (2013).
Fishman, Y. I., Reser, D. H., Arezzo, J. C. & Steinschneider, M. Neural correlates of auditory stream segregation in primary auditory cortex of the awake monkey. Hear. Res. 151, 167–187 (2001).
Bee, M. A. & Klump, G. M. Primitive auditory stream segregation: a neurophysiological study in the songbird forebrain. J. Neurophysiol. 92, 1088–1104 (2004).
Greenlee, J. D. W. et al. Human auditory cortical activation during self-vocalization. PLoS ONE 6, e14744 (2011).
Pasley, B. N. et al. Reconstructing speech from human auditory cortex. PLoS Biol. 10, e1001251 (2012).
Regev, M., Honey, C. J., Simony, E. & Hasson, U. Selective and invariant neural responses to spoken and written narratives. J. Neurosci. 33, 15978–15988 (2013).
Al-Mana, D., Ceranic, B., Djahanbakhch, O. & Luxon, L. M. Hormones and the auditory system: a review of physiology and pathophysiology. Neuroscience 153, 881–900 (2008).
Teramoto, W., Sakamoto, S., Furune, F., Gyoba, J. & Suzuki, Y. Compression of auditory space during forward self-motion. PLoS ONE 7, e39402 (2012).
Beckers, G. J. L. & Gahr, M. Large-scale synchronized activity during vocal deviance detection in the zebra finch auditory forebrain. J. Neurosci. 32, 10594–10608 (2012).
Schneidman, E. et al. Synergy from silence in a combinatorial neural code. J. Neurosci. 31, 15732–15741 (2011).
Acknowledgements
The authors thank the members of the Theunissen laboratory and the three anonymous reviewers for critical feedback and suggestions.
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Glossary
- Neural tuning
-
This term refers to the response property of brain cells by which they selectively represent a particular type of sensory, motor or cognitive information.
- Machine learning
-
A branch of computer science that combines statistical algorithms and artificial intelligence to extract information from complex data sets.
- Sound spectrum
-
(Also called the frequency spectrum). The representation of a waveform in its frequency domain. The sound spectrum is short for the frequency power spectrum of that sound. The sound spectrum is obtained by taking the amplitude square of the Fourier transform of a waveform and shows the energy in a signal as a function of frequency.
- Envelope
-
The smooth representation (low-pass filtering) of the amplitude of a signal as a function of frequency (spectral envelope) or time (temporal envelope).
- Formants
-
The peaks in the spectral envelope that correspond to a resonance in the sound source. In animal vocalizations and human speech, formants are resonance in the upper vocal tract. Formants are also the distinguishing or meaningful frequency components of voiced human speech.
- Spectrogram
-
The representation of the sound in the time and frequency domain that shows how the sound spectrum varies over time. This representation is often obtained by calculating the sound spectrum in a short-windowed section of sound and repeating this calculation by moving the window in time. This method of calculation is also called the short-time Fourier transform.
- Modulation power spectrum
-
The spectrum obtained from the log of the amplitude of a sound spectrogram. The name comes from the fact that each 'row' in a spectrogram pictorially represents in a narrow frequency channel the amplitude envelope of the sound in that channel. These envelopes 'modulate' the intensity of the signals, and the modulation power spectrum shows the power of these modulations for different rates (that is, the temporal modulation frequencies on the x axis). Similarly, a column in the spectrogram shows the spectral envelope at a particular time point. The power spectrum of this spectral envelope shows the power at particular spectral modulation frequencies (that is, on the y axis). More generally, a spectrogram jointly tracks amplitude envelopes in both time and in frequency and the full modulation power spectrum shows the power of both spectral and temporal modulations.
- Harmonic
-
A complex sound with a strong pitch percept that is made of a tone at a fundamental frequency or first frequency (f0, corresponding to the perceived pitch) and a series of overtones at multiples of this fundamental frequency.
- Timbre
-
The quality of a sound (also known as tone colour or tone quality from psychoacoustics) that can be used to distinguish different types of sound production (voices or various musical instruments) and that is determined by certain physical properties, such as the sound spectrum.
- Invariance
-
The property of something that does not change under a transformation (for example, in rotation-invariant face neurons, a particular face will elicit very similar responses irrespective of the face orientation).
- Decorrelation
-
The process used to reduce the autocorrelation or the redundancy of information within a signal.
- Inferior colliculus
-
The principal auditory nucleus of the mammalian midbrain, which receives inputs from several peripheral brainstem nuclei, including direct and indirect inputs from the cochlear nucleus.
- Gain
-
For neurons, the gain is the sensitivity of a neural output to the input signal.
- Frequency filters
-
Devices for selectively transmitting specified frequencies of the input signal by attenuating, or filtering out, unwanted frequencies.
- Spike-triggered average
-
(STA). A tool for estimating the stimulus–response function of a neuron by using the average stimulus before each spike. For linear neurons and for white noise stimuli, the STA will yield the neuron's receptive field.
- Spectrotemporal receptive field
-
(STRF). In the most general sense, the STRF is used to label the stimulus–response function of a neuron that uses any spectrotemporal description of the sound as the stimulus (for example, the spectrogram). That general STRF can be a linear or non-linear model.
- Regularization
-
In statistics and machine learning, regularization methods are used for model selection, in particular to prevent overfitting by penalizing models with extreme parameter values or extreme number of parameters. For example, principal component analysis applied on a dataset can be used to reduce the dimensionality of the dataset. The resulting model is a principal component regression or subspace regression.
- Priors
-
Probability distributions attached to parameters before certain data are observed. Prior is short for 'prior probability'.
- Principal component regression
-
The regularization procedure that uses principal component analysis in conjunction with a linear regression analysis to reduce the dimensionality of a data set.
- Ridge regression
-
The combination of linear regression with a regularization procedure that assumes a priori that the coefficients of the regression are normally distributed around zero. By varying the spread of this normal distribution, one can constrain the coefficients to be very close to zero unless there is strong evidence against it in the data. Ridge regression is therefore useful when there are too many parameters in the linear regression to be fitted with limited data (that is, to prevent overfitting). In those cases, most parameters will be equal to zero and only the few that really matter will have non-zero values. It is in this sense, that ridge regression reduces the dimensionality of the problem. Ridge regression is also known as L2 regularization and statisticians also call regularization procedures of this type shrinkage.
- Zero-mean Gaussian
-
A quality of a distribution that is normal with a mean of zero.
- Generalized linear models
-
(GLMs). A generalization of ordinary linear regression that allows for the random aspects of response variables (that is, the noise) to have distributions other than the normal distribution (for example, a Poisson distribution).
Rights and permissions
About this article
Cite this article
Theunissen, F., Elie, J. Neural processing of natural sounds. Nat Rev Neurosci 15, 355–366 (2014). https://doi.org/10.1038/nrn3731
Published:
Issue Date:
DOI: https://doi.org/10.1038/nrn3731
This article is cited by
-
A perspective on neuroethology: what the past teaches us about the future of neuroethology
Journal of Comparative Physiology A (2024)
-
Is song processing distinct and special in the auditory cortex?
Nature Reviews Neuroscience (2023)
-
Stimulation with acoustic white noise enhances motor excitability and sensorimotor integration
Scientific Reports (2022)
-
Neuronal On- and Off-type heterogeneities improve population coding of envelope signals in the presence of stimulus-induced noise
Scientific Reports (2020)
-
Stimulus- and goal-oriented frameworks for understanding natural vision
Nature Neuroscience (2019)