Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Neural processing of natural sounds

Key Points

  • Natural sounds include animal vocalizations, environmental sounds such as wind, water and fire noises, and non-vocal sounds made by animals and humans for communication. These natural sounds have characteristic statistical properties that make them perceptually salient and that drive auditory neurons in optimal regimes for information transmission.

  • Recent advances in statistics and computer sciences have enabled neurophysiologists to extract the stimulus–response function of complex auditory neurons from responses to natural sounds. These studies have shown a hierarchical processing that leads to the neural detection of progressively more complex natural sound features and have demonstrated the importance of the acoustical and behavioural context for the neural responses.

  • High-level auditory neurons have been shown to be exquisitely selective for conspecific calls. This fine selectivity could have an important role in species recognition, vocal learning in songbirds and, in the case of the bats, the processing of the sounds used in echolocation. Research that investigates how communication sounds are categorized into behaviourally meaningful groups (for example, call types in animals and words in human speech) remains in its infancy.

  • Animals and humans also excel at separating communication sounds from each other and from background noise. Neurons that detect communication calls in noise have been found, but the neural computations involved in sound source separation and natural auditory scene analysis remain overall poorly understood. Thus, future auditory research will have to focus not only on how natural sounds are processed by the auditory system but also on the computations that enable this processing to occur in natural listening situations.

  • The complexity of the computations needed in the natural hearing task might require a high-dimensional representation provided by an ensemble of neurons, and the use of natural sounds might be the best solution for understanding the ensemble neural code.

Abstract

We might be forced to listen to a high-frequency tone at our audiologist's office or we might enjoy falling asleep with a white-noise machine, but the sounds that really matter to us are the voices of our companions or music from our favourite radio station. The auditory system has evolved to process behaviourally relevant natural sounds. Research has shown not only that our brain is optimized for natural hearing tasks but also that using natural sounds to probe the auditory system is the best way to understand the neural computations that enable us to comprehend speech or appreciate music.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Historical approaches to auditory neurosciences.
Figure 2: Natural sound statistics.
Figure 3: Stimulus–response characterization.
Figure 4: STRFs at different levels of the auditory system.

Similar content being viewed by others

References

  1. Darrigol, O. Number and measure: Hermann von Helmholtz at the crossroads of mathematics, physics, and psychology. Stud. Hist. Philos. Sci. 34, 515–573 (2003).

    PubMed  Google Scholar 

  2. Robles, L. & Ruggero, M. A. Mechanics of the mammalian cochlea. Physiol. Rev. 81, 1305–1352 (2001).

    CAS  PubMed  PubMed Central  Google Scholar 

  3. Woolley, S. M. & Casseday, J. H. Response properties of single neurons in the zebra finch auditory midbrain: response patterns, frequency coding, intensity coding, and spike latencies. J. Neurophysiol. 91, 136–151 (2004).

    PubMed  Google Scholar 

  4. Eggermont, J. J. Between sound and perception: reviewing the search for a neural code. Hear. Res. 157, 1–42 (2001).

    CAS  PubMed  Google Scholar 

  5. Suga, N., O'Neill, W. E. & Manabe, T. Cortical neurons sensitive to combinations of information-bearing elements of biosonar signals in the moustache bat. Science 200, 778–781 (1978).

    CAS  PubMed  Google Scholar 

  6. Margoliash, D. & Fortune, E. S. Temporal and harmonic combination-sensitive neurons in the zebra finch's HVc. J. Neurosci. 12, 4309–4326 (1992).

    CAS  PubMed  Google Scholar 

  7. Margoliash, D. & Konishi, M. Auditory representation of autogenous song in the song system of white-crowned sparrows. Proc. Natl Acad. Sci. USA 82, 5997–6000 (1985).

    CAS  PubMed  Google Scholar 

  8. Nelken, I., Rotman, Y. & Bar Yosef, O. Responses of auditory-cortex neurons to structural features of natural sounds. Nature 397, 154–157 (1999). This study reports a very surprising result: auditory cortical neurons can be more sensitive to the natural auditory context than to the main component of the sound signal.

    CAS  PubMed  Google Scholar 

  9. Hsu, A., Woolley, S. M., Fremouw, T. E. & Theunissen, F. E. Modulation power and phase spectrum of natural sounds enhance neural encoding performed by single auditory neurons. J. Neurosci. 24, 9201–9211 (2004). Using estimations of mutual information, this study shows that sounds that are otherwise identical in intensity and frequency but have a modulation power spectrum with natural distributions are more efficiently encoded in the avian auditory cortex.

    CAS  PubMed  PubMed Central  Google Scholar 

  10. Rieke, F., Bodnar, D. A. & Bialek, W. Naturalistic stimuli increase the rate and efficiency of information transmission by primary auditory afferents. Proc. R. Soc. Lond. B 262, 259–265 (1995).

    CAS  Google Scholar 

  11. Hedwig, B. Pulses, patterns and paths: neurobiology of acoustic behaviour in crickets. J. Comp. Physiol. A Neuroethol. Sens. Neural Behav. Physiol. 192, 677–689 (2006).

    PubMed  Google Scholar 

  12. Arcadi, A. C., Robert, D. & Boesch, C. Buttress drumming by wild chimpanzees: temporal patterning, phrase integration into loud calls, and preliminary evidence for individual distinctiveness. Primates 39, 505–518 (1998).

    Google Scholar 

  13. Voss, R. F. & Clarke, J. 1/f noise in music and speech. Nature 258, 317–318 (1975). In this first study on the statistical structure in music and speech sounds, Voss and Clarke show that the amplitude envelope and other features of these natural sounds follow a power law relationship and discuss the significance of this relationship.

    Google Scholar 

  14. Attias, H. & Schreiner, C. E. in Advances in Neural Information Processing Systems (eds Mozer, M. C., Jordan, M. I. & Petsche, T.) 27–33 (MIT Press, 1997).

    Google Scholar 

  15. Singh, N. C. & Theunissen, F. E. Modulation spectra of natural sounds and ethological theories of auditory processing. J. Acoust. Soc. Am. 114, 3394–3411 (2003). This paper introduces the joint temporal and spectral modulation power spectrum of sounds and shows that natural sounds have a characteristic signature in this representation.

    PubMed  Google Scholar 

  16. Chen, J. D., Paliwal, K. K. & Nakamura, S. Cepstrum derived from differentiated power spectrum for robust speech recognition. Speech Comm. 41, 469–484 (2003).

    Google Scholar 

  17. Cohen, L. in Time–Frequency Analysis Ch. 3 50–52 (Prentice Hall, 1995).

    Google Scholar 

  18. Bialek, W., Nemenman, I. & Tishby, N. Predictability, complexity, and learning. Neural Comput. 13, 2409–2463 (2001).

    CAS  PubMed  Google Scholar 

  19. Elliott, T. M., Hamilton, L. S. & Theunissen, F. E. Acoustic structure of the five perceptual dimensions of timbre in orchestral instrument tones. J. Acoust. Soc. Am. 133, 389–404 (2013).

    PubMed  PubMed Central  Google Scholar 

  20. Garcia-Lazaro, J. A., Ahmed, B. & Schnupp, J. W. H. Emergence of tuning to natural stimulus statistics along the central auditory pathway. PLoS ONE 6, e22584 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  21. Srivastava, A., Lee, A. B., Simoncelli, E. P. & Zhu, S. C. On advances in statistical modeling of natural images. J. Math. Imaging Vis. 18, 17–33 (2003).

    Google Scholar 

  22. Ruderman, D. L. Origins of scaling in natural images. Vis. Res. 37, 3385–3398 (1997).

    CAS  PubMed  Google Scholar 

  23. Simoncelli, E. P. & Olshausen, B. A. Natural image statistics and neural representation. Annu. Rev. Neurosci. 24, 1193–1216 (2001).

    CAS  PubMed  Google Scholar 

  24. Rodriguez, F. A., Chen, C., Read, H. L. & Escabi, M. A. Neural modulation tuning characteristics scale to efficiently encode natural sound statistics. J. Neurosci. 30, 15969–15980 (2010). In this study, Rodriguez et al . show that the modulation tuning of neurons in the inferior colliculus balances out the 1/ f relationship found in the modulation power spectrum of natural sounds to transmit modulation information efficiently, with reduced redundancy.

    CAS  PubMed  PubMed Central  Google Scholar 

  25. Woolley, S. M., Fremouw, T. E., Hsu, A. & Theunissen, F. E. Tuning for spectro-temporal modulations as a mechanism for auditory discrimination of natural sounds. Nature Neurosci. 8, 1371–1379 (2005).

    CAS  PubMed  Google Scholar 

  26. Gerhardt, H. C. & Huber, F. (eds) Acoustic Communication in Insects and Anurans: Common Problems and Diverse Solutions Ch. 4 82–128 (University of Chicago Press, 2002).

    Google Scholar 

  27. Koppl, C., Gleich, O. & Manley, G. A. An auditory fovea in the barn owl cochlea. J. Comp. Physiol. 171, 695–704 (1993).

    Google Scholar 

  28. Bruns, V. & Schmieszek, E. Cochlear innervation in the greater horseshoe bat: demonstration of an acoustic fovea. Hear. Res. 3, 27–43 (1980).

    CAS  PubMed  Google Scholar 

  29. Lewicki, M. S. Efficient coding of natural sounds. Nature Neurosci. 5, 356–363 (2002). Lewicki shows that the particular decomposition of sound into frequency channels found at the level of the auditory nerve results in an efficient representation of a mixture of animal communication sounds and environmental sounds.

    CAS  PubMed  Google Scholar 

  30. Moore, R. C., Lee, T. & Theunissen, F. E. Noise-invariant neurons in the avian auditory cortex: hearing the song in noise. PLoS Comput. Biol. 9, e1002942 (2013). This study demonstrates the presence of neurons that robustly encode song in noise in higher levels of the avian auditory system and, inspired by the spectrotemporal filtering of these neurons, proposes an algorithm for real-time noise reduction.

    CAS  PubMed  PubMed Central  Google Scholar 

  31. McDermott, J. H., Schemitsch, M. & Simoncelli, E. P. Summary statistics in auditory perception. Nature Neurosci. 16, 493–498 (2013). McDermott et al . show that the acoustical statistical structure perceived depends on the length of the sound and that, for longer, complex sounds, only averaged statistics are perceived, leading to percepts of sound texture.

    CAS  PubMed  Google Scholar 

  32. Smith, E. C. & Lewicki, M. S. Efficient auditory coding. Nature 439, 978–982 (2006).

    CAS  PubMed  Google Scholar 

  33. Marmarelis, P. Z. & Marmarelis, V. Z. Analysis of Physiological Systems: The White Noise Approach (Plenum, 1978).

    Google Scholar 

  34. Aertsen, A. M. & Johannesma, P. I. The spectro-temporal receptive field. A functional characteristic of auditory neurons. Biol. Cybern. 42, 133–143 (1981).

    CAS  PubMed  Google Scholar 

  35. Eggermont, J. J., Aertsen, A. M. & Johannesma, P. I. Quantitative characterisation procedure for auditory neurons based on the spectro-temporal receptive field. Hear. Res. 10, 167–190 (1983).

    CAS  PubMed  Google Scholar 

  36. Eggermont, J. J., Aertsen, A. M. & Johannesma, P. I. Prediction of the responses of auditory neurons in the midbrain of the grass frog based on the spectro-temporal receptive field. Hear. Res. 10, 191–202 (1983).

    CAS  PubMed  Google Scholar 

  37. Schafer, M., Rubsamen, R., Dorrscheidt, G. J. & Knipschild, M. Setting complex tasks to single units in the avian auditory forebrain. II. Do we really need natural stimuli to describe neuronal response characteristics? Hear. Res. 57, 231–244 (1992).

    CAS  PubMed  Google Scholar 

  38. Andoni, S., Li, N. & Pollak, G. D. Spectrotemporal receptive fields in the inferior colliculus revealing selectivity for spectral motion in conspecific vocalizations. J. Neurosci. 27, 4882–4893 (2007).

    CAS  PubMed  PubMed Central  Google Scholar 

  39. Ulanovsky, N., Las, L. & Nelken, I. Processing of low-probability sounds by cortical neurons. Nature Neurosci. 6, 391–398 (2003).

    CAS  PubMed  Google Scholar 

  40. Asari, H. & Zador, A. M. Long-lasting context dependence constrains neural encoding models in rodent auditory cortex. J. Neurophysiol. 102, 2638–2656 (2009).

    PubMed  PubMed Central  Google Scholar 

  41. Woolley, S. M., Gill, P. R. & Theunissen, F. E. Stimulus-dependent auditory tuning results in synchronous population coding of vocalizations in the songbird midbrain. J. Neurosci. 26, 2499–2512 (2006).

    CAS  PubMed  Google Scholar 

  42. Theunissen, F. E., Sen, K. & Doupe, A. J. Spectral-temporal receptive fields of nonlinear auditory neurons obtained using natural sounds. J. Neurosci. 20, 2315–2331 (2000).

    CAS  PubMed  Google Scholar 

  43. Theunissen, F. E. et al. Estimating spatio-temporal receptive fields of auditory and visual neurons from their responses to natural stimuli. Network 12, 289–316 (2001). This methods paper describes the analytical solution for the simplest normalized and regularized linear regression solution for the estimation of STRFs from responses to natural stimuli.

    CAS  PubMed  Google Scholar 

  44. Sahani, M. & Linden, J. in Advances in Neural Infomation Processing Systems (eds Becker, S., Thrun, S. & Obermeyer, K.) 301–308 (MIT Press, 2003).

    Google Scholar 

  45. Hoerl, A. E. & Kennard, R. W. Ridge regression: biased estimation for nonorthogonal problems. Technometrics 42, 80–86 (2000).

    Google Scholar 

  46. David, S. V., Mesgarani, N. & Shamma, S. A. Estimating sparse spectro-temporal receptive fields with natural stimuli. Network 18, 191–212 (2007).

    PubMed  Google Scholar 

  47. Christianson, G. B., Sahani, M. & Linden, J. F. The consequences of response nonlinearities for interpretation of spectrotemporal receptive fields. J. Neurosci. 28, 446–455 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  48. Schneider, D. M. & Woolley, S. M. N. Extra-classical tuning predicts stimulus-dependent receptive fields in auditory neurons. J. Neurosci. 31, 11867–11878 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  49. Gill, P., Zhang, J., Woolley, S. M., Fremouw, T. & Theunissen, F. E. Sound representation methods for spectro-temporal receptive field estimation. J. Comput. Neurosci. 21, 5–20 (2006).

    PubMed  Google Scholar 

  50. David, S. V. & Shamma, S. A. Integration over multiple timescales in primary auditory cortex. J. Neurosci. 33, 19154–19166 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  51. Gill, P., Woolley, S. M., Fremouw, T. & Theunissen, F. E. What's that sound? Auditory area CLM encodes stimulus surprise, not intensity or intensity changes. J. Neurophysiol. 99, 2809–2820 (2008).

    PubMed  Google Scholar 

  52. Calabrese, A., Schumacher, J. W., Schneider, D. M., Paninski, L. & Woolley, S. M. N. A generalized linear model for estimating spectrotemporal receptive fields from responses to natural sounds. PLoS ONE 6, e16104 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  53. McFarland, J. M., Cui, Y. W. & Butts, D. A. Inferring nonlinear neuronal computation based on physiologically plausible inputs. PloS Comput. Biol. 9, e1003143 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  54. Sharpee, T., Rust, N. C. & Bialek, W. Analyzing neural responses to natural signals: maximally informative dimensions. Neural Comput. 16, 223–250 (2004).

    PubMed  Google Scholar 

  55. Atencio, C. A., Sharpee, T. O. & Schreiner, C. E. Receptive field dimensionality increases from the auditory midbrain to cortex. J. Neurophysiol. 107, 2594–2603 (2012). In this study, Atencio et al . use the maximally informative dimension algorithm to extract multicomponent STRFs and show that the number of components needed to characterize auditory neurons is greater in the cortex than in the thalamus.

    PubMed  PubMed Central  Google Scholar 

  56. Depireux, D. A. & Elhilali, M. (eds) Handbook of Modern Techniques in Auditory Cortex (Nova Science, 2014).

    Google Scholar 

  57. Ahrens, M. B., Linden, J. F. & Sahani, M. Nonlinearities and contextual influences in auditory cortical responses modeled with multilinear spectrotemporal methods. J. Neurosci. 28, 1929–1942 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  58. Woolley, S. M., Gill, P. R., Fremouw, T. & Theunissen, F. E. Functional groups in the avian auditory system. J. Neurosci. 29, 2780–2793 (2009). This study describes the STRFs found in the avian inferior colliculus and avian auditory cortex and shows how they form unique functional groups that extract distinct features of natural sounds that are in turn important for distinct percepts.

    CAS  PubMed  PubMed Central  Google Scholar 

  59. Amin, N., Gill, P. & Theunissen, F. E. Role of the zebra finch auditory thalamus in generating complex representations for natural sounds. J. Neurophysiol. 104, 784–798 (2010).

    PubMed  PubMed Central  Google Scholar 

  60. Nagel, K. I. & Doupe, A. J. Organizing principles of spectro-temporal encoding in the avian primary auditory area field L. Neuron 58, 938–955 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  61. Kim, G. & Doupe, A. Organized representation of spectrotemporal features in songbird auditory forebrain. J. Neurosci. 31, 16977–16990 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  62. Miller, L. M., Escabi, M. A., Read, H. L. & Schreiner, C. E. Functional convergence of response properties in the auditory thalamocortical system. Neuron 32, 151–160 (2001). This study compares the STRFs in the mammalian auditory thalamus with those in the auditory cortex, illustrating the hierarchical processing found in the ascending auditory processing stream.

    CAS  PubMed  Google Scholar 

  63. Miller, L. M., Escabi, M. A., Read, H. L. & Schreiner, C. E. Spectrotemporal receptive fields in the lemniscal auditory thalamus and cortex. J. Neurophysiol. 87, 516–527 (2002).

    PubMed  Google Scholar 

  64. Depireux, D. A., Simon, J. Z., Klein, D. J. & Shamma, S. A. Spectro-temporal response field characterization with dynamic ripples in ferret primary auditory cortex. J. Neurophysiol. 85, 1220–1234 (2001).

    CAS  PubMed  Google Scholar 

  65. Chi, T., Ru, P. & Shamma, S. A. Multiresolution spectrotemporal analysis of complex sounds. J. Acoust. Soc. Am. 118, 887–906 (2005).

    PubMed  Google Scholar 

  66. Escabi, M. A. & Schreiner, C. E. Nonlinear spectrotemporal sound analysis by neurons in the auditory midbrain. J. Neurosci. 22, 4114–4131 (2002).

    CAS  PubMed  Google Scholar 

  67. Escabi, M. A., Miller, L. M., Read, H. L. & Schreiner, C. E. Naturalistic auditory contrast improves spectrotemporal coding in the cat inferior colliculus. J. Neurosci. 23, 11489–11504 (2003).

    CAS  PubMed  PubMed Central  Google Scholar 

  68. Carlson, N. L., Ming, V. L. & Deweese, M. R. Sparse codes for speech predict spectrotemporal receptive fields in the inferior colliculus. PLoS Comput. Biol. 8, e1002594 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  69. Fritz, J., Shamma, S., Elhilali, M. & Klein, D. Rapid task-related plasticity of spectrotemporal receptive fields in primary auditory cortex. Nature Neurosci. 6, 1216–1223 (2003).

    CAS  PubMed  Google Scholar 

  70. David, S. V., Fritz, J. B. & Shamma, S. A. Task reward structure shapes rapid receptive field plasticity in auditory cortex. Proc. Natl Acad. Sci. USA 109, 2144–2149 (2012).

    CAS  PubMed  Google Scholar 

  71. Rabinowitz, N. C., Willmore, B. D. B., Schnupp, J. W. H. & King, A. J. Spectrotemporal contrast kernels for neurons in primary auditory cortex. J. Neurosci. 32, 11271–11284 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  72. Mesgarani, N., David, S. V., Fritz, J. B. & Shamma, S. A. Influence of context and behavior on stimulus reconstruction from neural activity in primary auditory cortex. J. Neurophysiol. 102, 3329–3339 (2009).

    PubMed  PubMed Central  Google Scholar 

  73. Amin, N., Gastpar, M. & Theunissen, F. E. Selective and efficient neural coding of communication signals depends on early acoustic and social environment. PLoS ONE 8, e61417 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  74. Mesgarani, N., Slaney, M. & Shamma, S. A. Discrimination of speech from nonspeech based on multiscale spectro-temporal modulations. IEEE Trans. Speech Audio Process. 14, 920–930 (2006).

    Google Scholar 

  75. Mesgarani, N., David, S. V., Fritz, J. B. & Shamma, S. A. Phoneme representation and classification in primary auditory cortex. J. Acoust. Soc. Am. 123, 899–909 (2008).

    PubMed  Google Scholar 

  76. McDermott, J. H. & Simoncelli, E. P. Sound texture perception via statistics of the auditory periphery: evidence from sound synthesis. Neuron 71, 926–940 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  77. Libersat, F., Murray, J. A. & Hoy, R. R. Frequency as a releaser in the courtship song of 2 crickets, Gryllus-bimaculatus (de Geer) and Teleogryllus oceanicus: a neuroethological analysis. J. Comp. Physiol. A 174, 485–494 (1994).

    CAS  PubMed  Google Scholar 

  78. Feng, A. S., Hall, J. C. & Gooler, D. M. Neural basis of sound pattern-recognition in anurans. Prog. Neurobiol. 34, 313–329 (1990).

    CAS  PubMed  Google Scholar 

  79. Grimsley, J. M., Shanbhag, S. J., Palmer, A. R. & Wallace, M. N. Processing of communication calls in Guinea pig auditory cortex. PLoS ONE 7, e51646 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  80. McCasland, J. S. & Konishi, M. Interactions between auditory and motor activities in an avian song control nucleus. Proc. Natl Acad. Sci. USA 78, 7815–7819 (1981).

    CAS  PubMed  Google Scholar 

  81. Doupe, A. J. & Konishi, M. Song-selective auditory circuits in the vocal control system of the zebra finch. Proc. Natl Acad. Sci. USA 88, 11339–11343 (1991).

    CAS  PubMed  Google Scholar 

  82. Grace, J. A., Amin, N., Singh, N. C. & Theunissen, F. E. Selectivity for conspecific song in the zebra finch auditory forebrain. J. Neurophysiol. 89, 472–487 (2003).

    PubMed  Google Scholar 

  83. Newman, J. & Wollberg, Z. Multiple coding of species-specific vocalizations in the auditory cortex of squirrel monkeys. Brain Res. 54, 287–304 (1978).

    Google Scholar 

  84. Huetz, C., Gourevitch, B. & Edeline, J. M. Neural codes in the thalamocortical auditory system: from artificial stimuli to communication sounds. Hear. Res. 271, 147–158 (2011).

    PubMed  Google Scholar 

  85. Ter-Mikaelian, M., Semple, M. N. & Sanes, D. H. Effects of spectral and temporal disruption on cortical encoding of gerbil vocalizations. J. Neurophysiol. 110, 1190–1204 (2013).

    PubMed  PubMed Central  Google Scholar 

  86. Suta, D., Popelar, J. & Syka, J. Coding of communication calls in the subcortical and cortical structures of the auditory system. Physiol. Res. 57 (Suppl. 3), 149–159 (2008).

    Google Scholar 

  87. Wallace, M. N., Grimsley, J. M., Anderson, L. A. & Palmer, A. R. Representation of individual elements of a complex call sequence in primary auditory cortex. Front. Syst. Neurosci. 7, 72 (2013).

    PubMed  PubMed Central  Google Scholar 

  88. Theunissen, F. E. & Doupe, A. J. Temporal and spectral sensitivity of complex auditory neurons in the nucleus HVc of male zebra finches. J. Neurosci. 18, 3786–3802 (1998).

    CAS  PubMed  Google Scholar 

  89. Lewicki, M. S. & Konishi, M. Mechanisms underlying the sensitivity of songbird forebrain neurons to temporal order. Proc. Natl Acad. Sci. USA 92, 5582–5586 (1995).

    CAS  PubMed  Google Scholar 

  90. Marler, P. R. Avian and primate communication: the problem of natural categories. Neurosci. Biobehav. Rev. 6, 87–94 (1982).

    CAS  PubMed  Google Scholar 

  91. Walker, K. M. M., Bizley, J. K., King, A. J. & Schnupp, J. W. H. Multiplexed and robust representations of sound features in auditory cortex. J. Neurosci. 31, 14565–14576 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  92. Meliza, C. D. & Margoliash, D. Emergence of selectivity and tolerance in the avian auditory cortex. J. Neurosci. 32, 15158–15168 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  93. George, I., Cousillas, H., Richard, J. P. & Hausberger, M. A potential neural substrate for processing functional classes of complex acoustic signals. PLoS ONE 3, e2203 (2008).

    PubMed  PubMed Central  Google Scholar 

  94. Petkov, C. I. et al. A voice region in the monkey brain. Nature Neurosci. 11, 367–374 (2008).

    CAS  PubMed  Google Scholar 

  95. Perrodin, C., Kayser, C., Logothetis, N. K. & Petkov, C. I. Voice cells in the primate temporal lobe. Curr. Biol. 21, 1408–1415 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  96. Cohen, Y. E. et al. A functional role for the ventrolateral prefrontal cortex in non-spatial auditory cognition. Proc. Natl Acad. Sci. USA 106, 20045–20050 (2009).

    CAS  PubMed  Google Scholar 

  97. Gifford, G. W., MacLean, K. A., Hauser, M. D. & Cohen, Y. E. The neurophysiology of functionally meaningful categories: macaque ventrolateral prefrontal cortex plays a critical role in spontaneous categorization of species-specific vocalizations. J. Cogn. Neurosci. 17, 1471–1482 (2005).

    PubMed  Google Scholar 

  98. Cohen, Y. E., Theunissen, F., Russ, B. E. & Gill, P. Acoustic features of rhesus vocalizations and their representation in the ventrolateral prefrontal cortex. J. Neurophysiol. 97, 1470–1484 (2007).

    PubMed  Google Scholar 

  99. Rauschecker, J. P. & Scott, S. K. Maps and streams in the auditory cortex: nonhuman primates illuminate human speech processing. Nature Neurosci. 12, 718–724 (2009).

    CAS  PubMed  Google Scholar 

  100. Marler, P. Bird calls: their potential for behavioral neurobiology. Ann. NY Acad. Sci. 1016, 31–44 (2004).

    PubMed  Google Scholar 

  101. Prather, J. F., Nowicki, S., Anderson, R. C., Peters, S. & Mooney, R. Neural correlates of categorical perception in learned vocal communication. Nature Neurosci. 12, 221–228 (2009).

    CAS  PubMed  Google Scholar 

  102. Marler, P. Birdsong: the acquisition of a learned motor skill. Trends Neurosci. 4, 88–94 (1981).

    Google Scholar 

  103. Seyfarth, R. M. & Cheney, D. L. Vocal development in vervet monkeys. Animal Behav. 34, 1640–1658 (1986).

    Google Scholar 

  104. Miranda, J. A. & Liu, R. C. Dissecting natural sensory plasticity: hormones and experience in a maternal context. Hear. Res. 252, 21–28 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  105. Menardy, F. et al. Social experience affects neuronal responses to male calls in adult female zebra finches. Eur. J. Neurosci. 35, 1322–1336 (2012).

    CAS  PubMed  Google Scholar 

  106. Gentner, T. Q. & Margoliash, D. Neuronal populations and single cells representing learned auditory objects. Nature 424, 669–674 (2003).

    CAS  PubMed  PubMed Central  Google Scholar 

  107. Woolley, S. M. N., Hauber, M. E. & Theunissen, F. E. Developmental experience alters information coding in auditory midbrain and forebrain neurons. Dev. Neurobiol. 70, 235–252 (2010).

    PubMed  PubMed Central  Google Scholar 

  108. Hauber, M. E., Woolley, S. M. N., Cassey, P. & Theunissen, F. E. Experience dependence of neural responses to different classes of male songs in the primary auditory forebrain of female songbirds. Behav. Brain Res. 243, 184–190 (2013).

    PubMed  PubMed Central  Google Scholar 

  109. Sanes, D. H. & Bao, S. W. Tuning up the developing auditory CNS. Curr. Opin. Neurobiol. 19, 188–199 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  110. Doupe, A. J. Song- and order-selective neurons in the songbird anterior forebrain and their emergence during vocal development. J. Neurosci. 17, 1147–1167 (1997).

    CAS  PubMed  Google Scholar 

  111. Solis, M. M. & Doupe, A. J. Contributions of tutor and bird's own song experience to neural selectivity in the songbird anterior forebrain. J. Neurosci. 19, 4559–4584 (1999).

    CAS  PubMed  Google Scholar 

  112. Volman, S. F. Development of neural selectivity for birdsong during vocal learning. J. Neurosci. 13, 4737–4747 (1993).

    CAS  PubMed  Google Scholar 

  113. Kover, H., Gill, K., Tseng, Y. T. L. & Bao, S. W. Perceptual and neuronal boundary learned from higher-order stimulus probabilities. J. Neurosci. 33, 3699–3705 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  114. Moss, C. F. & Surlykke, A. Probing the natural scene by echolocation in bats. Front. Behav. Neurosci. 4, e00033 (2010).

    Google Scholar 

  115. Cheney, D. L. & Seyfarth, R. M. Assessment of meaning and the detection of unreliable signals by vervet monkeys. Animal Behav. 36, 477–486 (1988).

    Google Scholar 

  116. Elie, J. E. et al. Vocal communication at the nest between mates in wild zebra finches: a private vocal duet? Animal Behav. 80, 597–605 (2010).

    Google Scholar 

  117. Penna, M., Llusia, D. & Marquez, R. Propagation of natural toad calls in a Mediterranean terrestrial environment. J. Acoust. Soc. Am. 132, 4025–4031 (2012).

    PubMed  Google Scholar 

  118. Aubin, T. & Jouventin, P. in Advances in the Study of Behavior Vol. 31 (eds Slater, P. J. B., Rosenblatt, J. S., Snowdon, C. T. & Roper, T. J.) 243–277 (Academic, 2002).

    Google Scholar 

  119. Vignal, C., Mathevon, N. & Mottin, S. Audience drives male songbird response to partner's voice. Nature 430, 448–451 (2004).

    CAS  PubMed  Google Scholar 

  120. Kawasaki, M., Margoliash, D. & Suga, N. Delay-tuned combination-sensitive neurons in the auditory-cortex of the vocalizing mustached bat. J. Neurophysiol. 59, 623–635 (1988).

    CAS  PubMed  Google Scholar 

  121. Suga, N. & Shimozaw, T. Site of neural attenuation of responses to self-vocalized sounds in echolocating bats. Science 183, 1211–1213 (1974).

    CAS  PubMed  Google Scholar 

  122. Eliades, S. J. & Wang, X. Neural substrates of vocalization feedback monitoring in primate auditory cortex. Nature 453, 1102–1106 (2008).

    CAS  PubMed  Google Scholar 

  123. Keller, G. B. & Hahnloser, R. H. Neural processing of auditory feedback during vocal practice in a songbird. Nature 457, 187–190 (2009).

    CAS  PubMed  Google Scholar 

  124. Miller, C. T. & Wang, X. Q. Sensory–motor interactions modulate a primate vocal behavior: antiphonal calling in common marmosets. J. Comp. Physiol. A Neuroethol. Sens. Neural Behav. Physiol. 192, 27–38 (2006).

    PubMed  Google Scholar 

  125. Fortune, E. S., Rodriguez, C., Li, D., Ball, G. F. & Coleman, M. J. Neural mechanisms for the coordination of duet singing in wrens. Science 334, 666–670 (2011).

    CAS  PubMed  Google Scholar 

  126. Lewicki, M. S., Olshausen, B. A., Surlykke, A. & Moss, C. F. Scene analysis in the natural environment. Front. Psychol. 5, 199 (2014).

    PubMed  PubMed Central  Google Scholar 

  127. Schneider, D. M. & Woolley, S. M. N. Sparse and background-invariant coding of vocalizations in auditory scenes. Neuron 79, 141–152 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  128. Ding, N. & Simon, J. Z. Adaptive temporal encoding leads to a background-insensitive cortical representation of speech. J. Neurosci. 33, 5728–5735 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  129. Rabinowitz, N. C., Willmore, B. D. B., King, A. J. & Schnupp, J. W. H. Constructing noise-invariant representations of sound in the auditory pathway. PLoS Biol. 11, 1710–1710 (2013). This study shows how increasing levels of adaptation as one ascends the auditory processing stream result in noise-invariant representation of signals at the higher levels.

    Google Scholar 

  130. Middlebrooks, J. C. & Bremen, P. Spatial stream segregation by auditory cortical neurons. J. Neurosci. 33, 10986–11001 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  131. Fishman, Y. I., Reser, D. H., Arezzo, J. C. & Steinschneider, M. Neural correlates of auditory stream segregation in primary auditory cortex of the awake monkey. Hear. Res. 151, 167–187 (2001).

    CAS  PubMed  Google Scholar 

  132. Bee, M. A. & Klump, G. M. Primitive auditory stream segregation: a neurophysiological study in the songbird forebrain. J. Neurophysiol. 92, 1088–1104 (2004).

    PubMed  Google Scholar 

  133. Greenlee, J. D. W. et al. Human auditory cortical activation during self-vocalization. PLoS ONE 6, e14744 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  134. Pasley, B. N. et al. Reconstructing speech from human auditory cortex. PLoS Biol. 10, e1001251 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  135. Regev, M., Honey, C. J., Simony, E. & Hasson, U. Selective and invariant neural responses to spoken and written narratives. J. Neurosci. 33, 15978–15988 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  136. Al-Mana, D., Ceranic, B., Djahanbakhch, O. & Luxon, L. M. Hormones and the auditory system: a review of physiology and pathophysiology. Neuroscience 153, 881–900 (2008).

    CAS  PubMed  Google Scholar 

  137. Teramoto, W., Sakamoto, S., Furune, F., Gyoba, J. & Suzuki, Y. Compression of auditory space during forward self-motion. PLoS ONE 7, e39402 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  138. Beckers, G. J. L. & Gahr, M. Large-scale synchronized activity during vocal deviance detection in the zebra finch auditory forebrain. J. Neurosci. 32, 10594–10608 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  139. Schneidman, E. et al. Synergy from silence in a combinatorial neural code. J. Neurosci. 31, 15732–15741 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

The authors thank the members of the Theunissen laboratory and the three anonymous reviewers for critical feedback and suggestions.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Frédéric E. Theunissen or Julie E. Elie.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

PowerPoint slides

Glossary

Neural tuning

This term refers to the response property of brain cells by which they selectively represent a particular type of sensory, motor or cognitive information.

Machine learning

A branch of computer science that combines statistical algorithms and artificial intelligence to extract information from complex data sets.

Sound spectrum

(Also called the frequency spectrum). The representation of a waveform in its frequency domain. The sound spectrum is short for the frequency power spectrum of that sound. The sound spectrum is obtained by taking the amplitude square of the Fourier transform of a waveform and shows the energy in a signal as a function of frequency.

Envelope

The smooth representation (low-pass filtering) of the amplitude of a signal as a function of frequency (spectral envelope) or time (temporal envelope).

Formants

The peaks in the spectral envelope that correspond to a resonance in the sound source. In animal vocalizations and human speech, formants are resonance in the upper vocal tract. Formants are also the distinguishing or meaningful frequency components of voiced human speech.

Spectrogram

The representation of the sound in the time and frequency domain that shows how the sound spectrum varies over time. This representation is often obtained by calculating the sound spectrum in a short-windowed section of sound and repeating this calculation by moving the window in time. This method of calculation is also called the short-time Fourier transform.

Modulation power spectrum

The spectrum obtained from the log of the amplitude of a sound spectrogram. The name comes from the fact that each 'row' in a spectrogram pictorially represents in a narrow frequency channel the amplitude envelope of the sound in that channel. These envelopes 'modulate' the intensity of the signals, and the modulation power spectrum shows the power of these modulations for different rates (that is, the temporal modulation frequencies on the x axis). Similarly, a column in the spectrogram shows the spectral envelope at a particular time point. The power spectrum of this spectral envelope shows the power at particular spectral modulation frequencies (that is, on the y axis). More generally, a spectrogram jointly tracks amplitude envelopes in both time and in frequency and the full modulation power spectrum shows the power of both spectral and temporal modulations.

Harmonic

A complex sound with a strong pitch percept that is made of a tone at a fundamental frequency or first frequency (f0, corresponding to the perceived pitch) and a series of overtones at multiples of this fundamental frequency.

Timbre

The quality of a sound (also known as tone colour or tone quality from psychoacoustics) that can be used to distinguish different types of sound production (voices or various musical instruments) and that is determined by certain physical properties, such as the sound spectrum.

Invariance

The property of something that does not change under a transformation (for example, in rotation-invariant face neurons, a particular face will elicit very similar responses irrespective of the face orientation).

Decorrelation

The process used to reduce the autocorrelation or the redundancy of information within a signal.

Inferior colliculus

The principal auditory nucleus of the mammalian midbrain, which receives inputs from several peripheral brainstem nuclei, including direct and indirect inputs from the cochlear nucleus.

Gain

For neurons, the gain is the sensitivity of a neural output to the input signal.

Frequency filters

Devices for selectively transmitting specified frequencies of the input signal by attenuating, or filtering out, unwanted frequencies.

Spike-triggered average

(STA). A tool for estimating the stimulus–response function of a neuron by using the average stimulus before each spike. For linear neurons and for white noise stimuli, the STA will yield the neuron's receptive field.

Spectrotemporal receptive field

(STRF). In the most general sense, the STRF is used to label the stimulus–response function of a neuron that uses any spectrotemporal description of the sound as the stimulus (for example, the spectrogram). That general STRF can be a linear or non-linear model.

Regularization

In statistics and machine learning, regularization methods are used for model selection, in particular to prevent overfitting by penalizing models with extreme parameter values or extreme number of parameters. For example, principal component analysis applied on a dataset can be used to reduce the dimensionality of the dataset. The resulting model is a principal component regression or subspace regression.

Priors

Probability distributions attached to parameters before certain data are observed. Prior is short for 'prior probability'.

Principal component regression

The regularization procedure that uses principal component analysis in conjunction with a linear regression analysis to reduce the dimensionality of a data set.

Ridge regression

The combination of linear regression with a regularization procedure that assumes a priori that the coefficients of the regression are normally distributed around zero. By varying the spread of this normal distribution, one can constrain the coefficients to be very close to zero unless there is strong evidence against it in the data. Ridge regression is therefore useful when there are too many parameters in the linear regression to be fitted with limited data (that is, to prevent overfitting). In those cases, most parameters will be equal to zero and only the few that really matter will have non-zero values. It is in this sense, that ridge regression reduces the dimensionality of the problem. Ridge regression is also known as L2 regularization and statisticians also call regularization procedures of this type shrinkage.

Zero-mean Gaussian

A quality of a distribution that is normal with a mean of zero.

Generalized linear models

(GLMs). A generalization of ordinary linear regression that allows for the random aspects of response variables (that is, the noise) to have distributions other than the normal distribution (for example, a Poisson distribution).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Theunissen, F., Elie, J. Neural processing of natural sounds. Nat Rev Neurosci 15, 355–366 (2014). https://doi.org/10.1038/nrn3731

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nrn3731

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing