Abstract
Sounds are generated by interactions between objects in the world and carry information about the sound’s sources and the objects’ sound-generating actions. This dual nature of auditory information poses a problem for defining and investigating auditory object representations in staged theories of perception. In this Review, we describe a framework for separating auditory source and action representations. Auditory source and action representations differ from each other in how they are formed, their relation to prediction, the information they carry, how they are experienced and remembered, and the brain responses associated with them. We also suggest that auditory source and action representations are part of event segmentation: structuring information about the environment and what is happening in it. In real life, auditory scenes are resolved together with other modalities, producing an integrated episodic description of the environment. Thus, event segmentation can guide the integration of information from different modalities and mediate the effects of learned knowledge on auditory scene analysis. We end by discussing how these insights offer important advantages for the development of more comprehensive theories and computational models of sound perception in natural scenes.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 digital issues and online access to articles
$59.00 per year
only $4.92 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Bregman, A. S. Auditory Scene Analysis. The Perceptual Organization of Sound (MIT Press, 1990).
Nudds, M. What are auditory objects? Rev. Philos. Psychol. 1, 105–122 (2010).
Lahav, A., Saltzman, E. & Schlaug, G. Action representation of sound: audiomotor recognition network while listening to newly acquired actions. J. Neurosci. 27, 308–314 (2007).
Pennartz, C. M. A. Consciousness, representation, action: the importance of being goal-directed. Trends Cogn. Sci. 22, 137–153 (2018).
Grinfeder, E., Lorenzi, C., Haupert, S. & Sueur, J. What do we mean by “soundscape”? A functional description. Front. Ecol. Evol. 10, 894232 (2022).
Bizley, J. K. & Cohen, Y. E. The what, where and how of auditory-object perception. Nat. Rev. Neurosci. 14, 693–707 (2013).
Green, E. J. A theory of perceptual objects. Philos. Phenomenol. Res. 99, 663–693 (2019).
Griffiths, T. D. & Warren, J. D. What is an auditory object? Nat. Rev. Neurosci. 5, 887–892 (2004).
Hermes, D. J. The Perceptual Structure of Sound (Springer, 2023).
O’Callaghan, C. Object perception: vision and audition. Philos. Compass 3–4, 803–829 (2008).
Santarcangelo, V. Auditory objects as higher-order objects. Riv. Estet. 66, 8–21 (2017).
Snyder, J. S., Gregg, M. K., Weintraub, D. M. & Alain, C. Attention, awareness, and the perception of auditory scenes. Front. Psychol. 3, 15 (2012).
Winkler, I., Denham, S. L. & Nelken, I. Modeling the auditory scene: predictive regularity representations and perceptual objects. Trends Cogn. Sci. 13, 532–540 (2009).
Shams, L. & Beierholm, U. Bayesian causal inference: a unifying neuroscience theory. Neurosci. Biobehav. Rev. 137, 104619 (2022).
Gregory, R. L. Perceptions as hypotheses. Philos. Trans. R. Soc. Lond. B Biol. Sci. 290, 181–197 (1980).
Friston, K. J. The free-energy principle: a unified brain theory? Nat. Rev. Neurosci. 11, 127–138 (2010).
Denham, S. L. & Winkler, I. Predictive coding in auditory perception: challenges and unresolved questions. Eur. J. Neurosci. 51, 1151–1160 (2020).
Köhler, W. Gestalt Psychology: An Introduction to New Concepts in Modern Psychology (Liveright, 1947).
Treisman, A. M. Feature binding, attention and object perception. Philos. Trans. R. Soc. Lond. B Biol. Sci. 353, 1295–1306 (1998).
Cowan, N. On short and long auditory stores. Psychol. Bull. 96, 341–370 (1984).
Zacks, J. M. Event perception and memory. Annu. Rev. Psychol. 71, 165–191 (2020).
Pressnitzer, D., Patterson, R. D. & Krumbholz, K. The lower limit of melodic pitch. J. Acoust. Soc. Am. 109, 2074–2084 (2001).
Marrone, N., Mason, C. R. & Kidd, G. Jr Tuning in the spatial dimension: evidence from a masked speech identification task. J. Acoust. Soc. Am. 124, 1146–1158 (2008).
Middlebrooks, J. C. & Waters, M. F. Spatial mechanisms for segregation of competing sounds, and a breakdown in spatial hearing. Front. Neurosci. 14, 571095 (2020).
Reed, D. K., Chait, M., Tóth, B., Winkler, I. & Shinn-Cunningham, B. Spatial cues can support auditory figure–ground segregation. J. Acoust. Soc. Am. 147, 3814–3818 (2020).
Kitterick, P. T., Bailey, P. J. & Summerfield, A. Q. Benefits of knowing who, where, and when in multi-talker listening. J. Acoust. Soc. Am. 127, 2498–2508 (2010).
Kreitewolf, J., Mathias, S. R., Trapeau, R., Obleser, J. & Schönwiesner, M. Perceptual grouping in the cocktail party: contributions of voice-feature continuity. J. Acoust. Soc. Am. 144, 2178–2188 (2018).
Yeark, M., Paton, B. & Todd, J. The impact of spatial variance on precision estimates in an auditory oddball paradigm. Cortex 165, 1–13 (2023).
Tosi, P., Sbarra, P. & Rubeis, V. Earthquake sound perception. Geophys. Res. Lett. 39, 24301 (2012).
Arnal, L. H., Poeppel, D. & Giraud, A.-L. in Handbook of Clinical Neurology 3rd Series (eds Celesia, G. G. & Hickok, G.) vol. 129, 85–98 (Elsevier, 2015).
Booras, A., Stevenson, T., McCormack, C. N., Rhoads, M. E. & Hanks, T. D. Change point detection with multiple alternatives reveals parallel evaluation of the same stream of evidence along distinct timescales. Sci. Rep. 11, 13098 (2021).
Rimmele, J. M., Morillon, B., Poeppel, D. & Arnal, L. H. Proactive sensing of periodic and aperiodic auditory patterns. Trends Cogn. Sci. 22, 870–882 (2018).
van Noorden, L. P. A. S. Temporal Coherence in the Perception of Tone Sequences (Institute for Perception Research, Technical Univ. Eindhoven, 1975).
Jones, M. R. Time, our lost dimension: toward a new theory of perception, attention, and memory. Psychol. Rev. 83, 323–355 (1976).
Andreou, L.-V., Kashino, M. & Chait, M. The role of temporal regularity in auditory segregation. Hear. Res. 280, 228–235 (2011).
Bendixen, A., Denham, S. L. & Winkler, I. Feature predictability flexibly supports auditory stream segregation or integration. Acta Acust. U Acust 100, 888–899 (2014).
Rajendran, V. G., Harper, N. S., Willmore, B. D., Hartmann, W. M. & Schnupp, J. W. H. Temporal predictability as a grouping cue in the perception of auditory streams. J. Acoust. Soc. Am. 134, 98–104 (2013).
Woods, K. J. P. & McDermott, J. H. Schema learning for the cocktail party problem. Proc. Natl Acad. Sci. USA 115, 3313–3322 (2018).
Näätänen, R. & Picton, T. The N1 wave of the human electric and magnetic response to sound: a review and an analysis of the component structure. Psychophysiology 24, 375–425 (1987).
Bläsing, B., Tenenbaum, G. & Schack, T. The cognitive structure of movements in classical dance. Psychol. Sport. Exerc. 10, 350–360 (2009).
Zacks, J. M., Tversky, B. & Iyer, G. Perceiving, remembering, and communicating structure in events. J. Exp. Psychol. Gen. 130, 29–58 (2001).
Newtson, D. Attribution and the unit of perception of ongoing behavior. J. Pers. Soc. Psychol. 28, 28–38 (1973).
Swallow, K. M., Kemp, J. T. & Candan Simsek, A. The role of perspective in event segmentation. Cognition 177, 249–262 (2018).
Huff, M., Meitz, T. G. K. & Papenmeier, F. Changes in situation models modulate processes of event perception in audiovisual narratives. J. Exp. Psychol. Learn. Mem. Cogn. 40, 1377–1388 (2014).
Kubovy, M. & Valkenburg, D. Auditory and visual objects. Cognition 80, 97–126 (2001).
Shamma, S. A., Elhilali, M. & Micheyl, C. Temporal coherence and attention in auditory scene analysis. Trends Neurosci. 34, 114–123 (2011).
Raccah, O., Doelling, K. B., Davachi, L. & Poeppel, D. Acoustic features drive event segmentation in speech. J. Exp. Psychol. Learn. Mem. Cogn. 49, 1494–1504 (2023).
Newtson, D., Engquist, G. A. & Bois, J. The objective basis of behavior units. J. Pers. Soc. Psychol. 35, 847–862 (1977).
Zacks, J. M. Using movement and intentions to understand simple events. Cogn. Sci. 28, 979–1008 (2004).
Poeppel, D. & Assaneo, M. F. Speech rhythms and their neural foundations. Nat. Rev. Neurosci. 21, 322–334 (2020).
Rimmele, J. M., Sussman, E. & Poeppel, D. The role of temporal structure in the investigation of sensory memory, auditory scene analysis, and speech perception: a healthy-aging perspective. Int. J. Psychophysiol. 95, 175–183 (2015).
Kurby, C. A. & Zacks, J. M. Segmentation in the perception and memory of events. Trends Cogn. Sci. 12, 72–79 (2008).
Berto, M., Ricciardi, E., Pietrini, P., Weisz, N. & Bottari, D. Distinguishing fine structure and summary representation of sound textures from neural activity. eNeuro 10, 2023 (2023).
Jeunehomme, O. & D’Argembeau, A. Event segmentation and the temporal compression of experience in episodic memory. Psychol. Res. 84, 481–490 (2020).
Sun, Y. & Poeppel, D. Syllables and their beginnings have a special role in the mental lexicon. Proc. Natl Acad. Sci. USA 120, 2215710120 (2023).
Weise, A., Grimm, S., Müller, D. & Schröger, E. A temporal constraint for automatic deviance detection and object formation: a mismatch negativity study. Brain Res. 1331, 88–95 (2010).
Weise, A., Grimm, S., Rimmele, J. M. & Schröger, E. Auditory representations for long lasting sounds: insights from event-related brain potentials and neural oscillations. Brain Lang. 237, 105221 (2023).
Neuhoff, J. G. Perceptual bias for rising tones. Nature 395, 123–124 (1998).
Ignatiadis, K., Baier, D., Tóth, B. & Baumgartner, R. Neural mechanisms underlying the auditory looming bias. Audit. Percept. Cogn. 4, 60–73 (2021).
Lee, S., Potamianos, A. & Narayanan, S. Acoustics of children’s speech: developmental changes of temporal and spectral parameters. J. Acoust. Soc. Am. 105, 1455–1468 (1999).
Barthel, H. & Quené, H. in Proc. 18th Int. Congress of Phonetic Sciences (eds Wolters, M. et al.) https://www.internationalphoneticassociation.org/icphs-proceedings/ICPhS2015/Papers/ICPHS0337.pdf (Univ. of Glasgow, 2015).
Brefczynski-Lewis, J. A. & Lewis, J. W. Auditory object perception: a neurobiological model and prospective review. Neuropsychologia 105, 223–242 (2017).
Andics, A., Gacsi, M., Farago, T., Kis, A. & Miklosi, A. Voice-sensitive regions in the dog and human brain are revealed by comparative fMRI. Curr. Biol. 24, 574–578 (2014).
Engel, L. R., Frum, C., Puce, A., Walker, N. A. & Lewis, J. W. Different categories of living and non-living sound-sources activate distinct cortical networks. NeuroImage 47, 1778–1791 (2009).
Lewis, J. W., Talkington, W. J., Puce, A., Engel, L. R. & Frum, C. Cortical networks representing object categories and high-level attributes of familiar real-world action sounds. J. Cogn. Neurosci. 23, 2079–2101 (2011).
Lewis, J. W., Talkington, W. J., Tallaksen, K. C. & Frum, C. A. Auditory object salience: human cortical processing of non-biological action sounds and their acoustic signal attributes. Front. Syst. Neurosci. 6, 27 (2012).
Webster, P. J. et al. Divergent human cortical regions for processing distinct acoustic-semantic categories of natural sounds: animal action sounds vs. vocalizations. Front. Neurosci. 10, 579 (2017).
Ogg, M., Moraczewski, D., Kuchinsky, S. E. & Slevc, L. R. Separable neural representations of sound sources: speaker identity and musical timbre. NeuroImage 191, 116–126 (2019).
Webb, A. R., Heller, H. T., Benson, C. B. & Lahav, A. Mother’s voice and heartbeat sounds elicit auditory plasticity in the human brain before full gestation. Proc. Natl Acad. Sci. USA 112, 3152–3157 (2015).
Ogg, M. & Slevc, L. R. Acoustic correlates of auditory object and event perception: speakers, musical timbres, and environmental sounds. Front. Psychol. 10, 1594 (2019).
Schellenberg, E. G. & Habashi, P. Remembering the melody and timbre, forgetting the key and tempo. Mem. Cognit. 43, 1021–1031 (2015).
Lemaitre, G., Grimault, N. & Suied, C. in Computational Analysis of Sound Scenes and Events (eds Virtanen, T., Plumbley, M. D. & Ellis, D.) 41–67 (Springer, 2018).
Campeanu, S., Craik, F. I. M. & Alain, C. Speaker’s voice as a memory cue. Int. J. Psychophysiol. 95, 167–174 (2015).
Mathias, S. R. & Kriegstein, K. How do we recognise who is speaking. Front. Biosci. Sch. Ed. 6, 92–109 (2014).
Tuninetti, A., Chládková, K., Peter, V., Schiller, N. O. & Escudero, P. When speaker identity is unavoidable: neural processing of speaker identity cues in natural speech. Brain Lang. 174, 42–49 (2017).
Best, V., Ozmeral, E. J., Kopčo, N. & Shinn-Cunningham, B. G. Object continuity enhances selective auditory attention. Proc. Natl Acad. Sci. USA 105, 13174–13178 (2008).
Fischer, M., Soden, K., Thoret, E., Montrey, M. & McAdams, S. Instrument timbre enhances perceptual segregation in orchestral music. Music. Percept. 38, 473–498 (2021).
Wei, Y., Gan, L. & Huang, X. A review of research on the neurocognition for timbre perception. Front. Psychol. 13, 869475 (2022).
Caclin, A., McAdams, S., Smith, B. K. & Winsberg, S. Acoustic correlates of timbre space dimensions: a confirmatory study using synthetic tones. J. Acoust. Soc. Am. 118, 471–482 (2005).
Misdariis, N. et al. Environmental sound perception: metadescription and modeling based on independent primary studies. EURASIP J. Audio Speech Music. Process. 2010, 362013 (2010).
Ciocca, V. The auditory organization of complex sounds. Front. Biosci. 13, 148–169 (2008).
Bigand, E. & Pineau, M. Global context effects on musical expectancy. Percept. Psychophys. 59, 1098–1107 (1997).
Micheyl, C. & Oxenham, A. J. Pitch, harmonicity and concurrent sound segregation: psychoacoustical and neurophysiological findings. Hear. Res. 266, 36–51 (2010).
Moore, B. C. J., Glasberg, B. R. & Peters, R. W. Thresholds for hearing mistuned partials as separate tones in harmonic complexes. J. Acoust. Soc. Am. 80, 479–483 (1986).
Koulaguina, E. et al. The perception of concurrent sound objects through the use of harmonic enhancement: a study of auditory attention. Atten. Percept. Psychophys. 77, 922–929 (2015).
Alain, C., Arnott, S. R. & Picton, T. W. Bottom-up and top-down influences on auditory scene analysis: evidence from event-related brain potentials. J. Exp. Psychol. Hum. Percept. Perform. 27, 1072–1089 (2001).
Tóth, B. et al. EEG signatures accompanying auditory figure–ground segregation. Neuroimage 141, 108–119 (2016).
Bendixen, A. et al. Newborn infants detect cues of concurrent sound segregation. Dev. Neurosci. 37, 172–181 (2015).
Virtala, P., Huotilainen, M., Partanen, E., Fellman, V. & Tervaniemi, M. Newborn infants’ auditory system is sensitive to western music chord categories. Front. Psychol. 4, 492 (2013).
Fishman, Y. I. et al. Consonance and dissonance of musical chords: neural correlates in auditory cortex of monkeys and humans. J. Neurophysiol. 86, 2761–2788 (2001).
Fishman, Y. I. & Steinschneider, M. Neural correlates of auditory scene analysis based on inharmonicity in monkey primary auditory cortex. J. Neurosci. 30, 12480–12494 (2010).
Młynarski, W. & McDermott, J. H. Ecological origins of perceptual grouping principles in the auditory system. Proc. Natl Acad. Sci. USA 116, 25355–25364 (2019).
Strori, D., Zaar, J., Cooke, M. & Mattys, S. L. Sound specificity effects in spoken word recognition: the effect of integrality between words and sounds. Atten. Percept. Psychophys. 80, 222–241 (2018).
Teki, S., Chait, M., Kumar, S., Kriegstein, K. & Griffiths, T. D. Brain bases for auditory stimulus-driven figure–ground segregation. J. Neurosci. 31, 164–171 (2011).
Schneider, F. et al. Neuronal figure–ground responses in primate primary auditory cortex. Cell Rep. 35, 109242 (2021).
Rezaeizadeh, M. & Shamma, S. Binding the acoustic features of an auditory source through temporal coherence. Cereb. Cortex Commun. 2, 060 (2021).
O’Sullivan, J. A., Shamma, S. A. & Lalor, E. C. Evidence for neural computations of temporal coherence in an auditory scene and their enhancement during active listening. J. Neurosci. 35, 7256–7263 (2015).
Teki, S. et al. Neural correlates of auditory figure–ground segregation based on temporal coherence. Cereb. Cortex 26, 3669–3680 (2016).
Aller, M. & Noppeney, U. To integrate or not to integrate: temporal dynamics of hierarchical Bayesian causal inference. PLoS Biol. 17, 3000210 (2019).
Sawai, K. I., Sato, Y. & Aihara, K. Auditory time-interval perception as causal inference on sound sources. Front. Psychol. 3, 524 (2012).
Schwartz, J.-L., Grimault, N., Hupé, J.-M., Moore, B. C. J. & Pressnitzer, D. Multistability in perception: binding sensory modalities, an overview. Philos. Trans. R. Soc. Lond. B Biol. Sci. 367, 896–905 (2012).
Rankin, J., Osborn Popp, P. J. & Rinzel, J. Stimulus pauses and perturbations differentially delay or promote the segregation of auditory objects: psychoacoustics and modeling. Front. Neurosci. 11, 198 (2017).
Schröger, E., Roeber, U. & Coy, N. Markov chains as a proxy for the predictive memory representations underlying mismatch negativity (MMN). Front. Hum. Neurosci. 17, 1249413 (2023).
Goncalves, N. R. & Welchman, A. E. “What not” detectors help the brain see in depth. Curr. Biol. 27, 1403–1412 8 (2017).
Rideaux, R. & Welchman, A. E. Proscription supports robust perceptual integration by suppression in human visual cortex. Nat. Commun. 9, 1502 (2018).
de Boer, E. in Auditory System Vol. 3 (eds Keidel, W. D. & Neff, W. D.) 479–583 (Springer, 1976).
Miller, G. A. & Licklider, J. C. R. The intelligibility of interrupted speech. J. Acoust. Soc. Am. 22, 167–173 (1950).
Warren, R. M. Perceptual restoration of missing speech sounds. Science 167, 392–393 (1970).
Warren, R. M. & Warren, R. P. Auditory illusions and confusions. Sci. Am. 223, 30–37 (1970).
Riecke, L., Micheyl, C. & Oxenham, A. J. Global not local masker features govern the auditory continuity illusion. J. Neurosci. 32, 4660–4664 (2012).
Bidelman, G. M. & Patro, C. Auditory perceptual restoration and illusory continuity correlates in the human brainstem. Brain Res. 1646, 84–90 (2016).
Brodbeck, C., Jiao, A., Hong, L. E. & Simon, J. Z. Neural speech restoration at the cocktail party: auditory cortex recovers masked speech of both attended and ignored speakers. PLoS Biol. 18, 3000883 (2020).
Petkov, C. I., O’Connor, K. N. & Sutter, M. L. Illusory sound perception in macaque monkeys. J. Neurosci. 23, 9155–9161 (2003).
Tomlinson, R. W. W. & Schwarz, D. W. F. Perception of the missing fundamental in nonhuman primates. J. Acoust. Soc. Am. 84, 560–565 (1988).
Sollini, J., Poole, K. C., Blauth-Muszkowski, D. & Bizley, J. K. The role of temporal coherence and temporal predictability in the build-up of auditory grouping. Sci. Rep. 12, 14493 (2022).
Barascud, N., Pearce, M. T., Griffiths, T. D., Friston, K. J. & Chait, M. Brain responses in humans reveal ideal observer-like sensitivity to complex acoustic patterns. Proc. Natl Acad. Sci. USA 113, 616–625 (2016).
Ringer, H., Schröger, E. & Grimm, S. Neural signatures of automatic repetition detection in temporally regular and jittered acoustic sequences. PLoS ONE 18, 0284836 (2023).
Coffman, B. A., Haigh, S. M., Murphy, T. K. & Salisbury, D. F. Event-related potentials demonstrate deficits in acoustic segmentation in schizophrenia. Schizophr. Res. 173, 109–115 (2016).
Coffman, B. A., Haigh, S. M., Murphy, T. K., Leiter-Mcbeth, J. & Salisbury, D. F. Reduced auditory segmentation potentials in first-episode schizophrenia. Schizophr. Res. 195, 421–427 (2018).
Hemeren, P. E. & Thill, S. Deriving motor primitives through action segmentation. Front. Psychol. 1, 243 (2011).
Kushnerenko, E. V., Bergh, B. R. H. & Winkler, I. Separating acoustic deviance from novelty during the first year of life: a review of event-related potential evidence. Front. Psychol. 4, 595 (2013).
Háden, G. P., Németh, R., Török, M. & Winkler, I. Predictive processing of pitch trends in newborn infants. Brain Res. 1626, 14–20 (2015).
Chait, M. How the brain discovers structure in sound sequences. Acoust. Sci. Technol. 41, 48–53 (2020).
Kaernbach, C. The memory of noise. Exp. Psychol. 51, 240–248 (2004).
Ringer, H., Schröger, E. & Grimm, S. Within- and between-subject consistency of perceptual segmentation in periodic noise: a combined behavioral tapping and EEG study. Psychophysiology 60, 14174 (2023).
Kang, H., Agus, T. R. & Pressnitzer, D. Auditory memory for random time patterns. J. Acoust. Soc. Am. 142, 2219–2232 (2017).
Bader, M., Schröger, E. & Grimm, S. Auditory pattern representations under conditions of uncertainty — an ERP study. Front. Hum. Neurosci. 15, 682820 (2021).
Bendixen, A., Roeber, U. & Schröger, E. Regularity extraction and application in dynamic auditory stimulus sequences. J. Cogn. Neurosci. 19, 1664–1677 (2007).
Cowan, N., Winkler, I., Teder, W. & Näätänen, R. Memory prerequisites of mismatch negativity in the auditory event-related potential (ERP). J. Exp. Psychol. Learn. Mem. Cogn. 19, 909–921 (1993).
Bianco, R. et al. Long-term implicit memory for sequential auditory patterns in humans. eLife 9, e56073 (2020).
Ringer, H., Schröger, E. & Grimm, S. Perceptual learning of random acoustic patterns: impact of temporal regularity and attention. Eur. J. Neurosci. 1, 24 (2023).
Terry, J., Stevens, C. J., Weidemann, G. & Tillmann, B. Implicit learning of between-group intervals in auditory temporal structures. Atten. Percept. Psychophys. 78, 1728–1743 (2016).
Sussman, E. S. A new view on the MMN and attention debate — the role of context in processing auditory events. J. Psychophysiol. 21, 164–175 (2007).
Bendixen, A., SanMiguel, I. & Schröger, E. Early electrophysiological indicators for predictive processing in audition: a review. Int. J. Psychophysiol. 83, 120–131 (2012).
Fitzgerald, K. & Todd, J. Making sense of mismatch negativity. Front. Psychiatry 11, 468 (2020).
Winkler, I. & Czigler, I. Evidence from auditory and visual event-related potential (ERP) studies of deviance detection (MMN and vMMN) linking predictive coding theories and perceptual object representations. Int. J. Psychophysiol. 83, 132–143 (2012).
Paavilainen, P., Kaukinen, C., Koskinen, O., Kylmälä, J. & Rehn, L. Mismatch negativity (MMN) elicited by abstract regularity violations in two concurrent auditory streams. Heliyon 4, 00608 (2018).
Ritter, W., Sussman, E. & Molholm, S. Evidence that the mismatch negativity system works on the basis of objects. NeuroReport 11, 61–63 (2000).
Yabe, H. et al. Organizing sound sequences in the human brain: the interplay of auditory streaming and temporal integration. Brain Res. 897, 222–227 (2001).
Tiitinen, H., May, P., Reinikainen, K. & Näätänen, R. Attentive novelty detection in humans is governed by pre-attentive sensory memory. Nature 372, 90–92 (1994).
Winkler, I. et al. The effect of small variation of the frequent auditory stimulus on the event-related brain potential to the infrequent stimulus. Psychophysiology 27, 228–235 (1990).
Sussman, E., Ritter, W. & Vaughan, H. G. Jr Predictability of stimulus deviance and the mismatch negativity. NeuroReport 9, 4167–4170 (1998).
Sussman, E. & Gumenyuk, V. Organization of sequential sounds in auditory memory. NeuroReport 16, 1519–1523 (2005).
Sussman, E., Winkler, I., Huotilainen, M., Ritter, W. & Näätänen, R. Top-down effects can modify the initially stimulus-driven auditory organization. Brain Res. Cogn. Brain Res. 13, 393–405 (2002).
Stefanics, G. et al. Auditory temporal grouping in newborn infants. Psychophysiology 44, 697–702 (2007).
van Zuijen, T. L., Sussman, E., Winkler, I., Näätänen, R. & Tervaniemi, M. Auditory organization of sound sequences by a temporal or numerical regularity — a mismatch negativity study comparing musicians and non-musicians. Cogn. Brain Res. 23, 270–276 (2005).
Tervaniemi, M., Huotilainen, M. & Brattico, E. Melodic multi-feature paradigm reveals auditory profiles in music-sound encoding. Front. Hum. Neurosci. 8, 496 (2014).
Denham, S. L. et al. in Physiology, Psychoacoustics and Cognition in Normal and Impaired Hearing (eds. van Dijk, P. et al.) 409–417 (Springer, 2016).
Breska, A. & Deouell, L. Y. Neural mechanisms of rhythm-based temporal prediction: delta phase-locking reflects temporal predictability but not rhythmic entrainment. PLOS Biol. 15, 2001665 (2017).
Morillon, B., Schroeder, C. E., Wyart, V. & Arnal, L. H. Temporal prediction in lieu of periodic stimulation. J. Neurosci. 36, 2342–2347 (2016).
Haegens, S. & Zion Golumbic, E. Rhythmic facilitation of sensory processing: a critical review. Neurosci. Biobehav. Rev. 86, 150–165 (2018).
Cohn, N., Paczynski, M. & Kutas, M. Not so secret agents: event-related potentials to semantic roles in visual event comprehension. Brain Cogn. 119, 1–9 (2017).
Hafri, A., Papafragou, A. & Trueswell, J. C. Getting the gist of events: recognition of two-participant actions from brief displays. J. Exp. Psychol. Gen. 142, 880–905 (2013).
Bertelson, P. & Radeau, M. Cross-modal bias and perceptual fusion with auditory–visual spatial discordance. Percept. Psychophys. 29, 578–584 (1981).
Choe, C. S., Welch, R. B., Gilford, R. M. & Juola, J. F. The “ventriloquist effect”: visual dominance or response bias? Percept. Psychophys. 18, 55–60 (1975).
Bruns, P. The ventriloquist illusion as a tool to study multisensory processing: an update. Front. Integr. Neurosci. 13, 51 (2019).
Wang, X. & Xu, L. Speech perception in noise: masking and unmasking. J. Otol. 16, 109–119 (2020).
Greenlaw, K. M., Puschmann, S. & Coffey, E. B. J. Decoding of envelope vs. fundamental frequency during complex auditory stream segregation. Neurobiol. Lang. 1, 268–287 (2020).
Holmes, E., Parr, T., Griffiths, T. D. & Friston, K. J. Active inference, selective attention, and the cocktail party problem. Neurosci. Biobehav. Rev. 131, 1288–1304 (2021).
Puvvada, K. C. & Simon, J. Z. Cortical representations of speech in a multitalker auditory scene. J. Neurosci. 37, 9189–9196 (2017).
Thomassen, S. & Bendixen, A. in Proc. 23rd Int. Congress on Acoustics (eds Ochmann, M., Vorländer, M. & Fels, J.) 5685–5691 (Deutsche Gesellschaft für Akustik, 2019).
Zacks, J. M., Speer, N. K., Swallow, K. M., Braver, T. S. & Reynolds, J. R. Event perception: a mind–brain perspective. Psychol. Bull. 133, 273–293 (2007).
Tauzin, T. Simple visual cues of event boundaries. Acta Psychol. 158, 8–18 (2015).
Aman, L., Picken, S., Andreou, L.-V. & Chait, M. Sensitivity to temporal structure facilitates perceptual analysis of complex auditory scenes. Hear. Res. 400, 108111 (2021).
Cervantes Constantino, F., Pinggera, L., Paranamana, S., Kashino, M. & Chait, M. Detection of appearing and disappearing objects in complex acoustic scenes. PLoS ONE 7, 46167 (2012).
Sohoglu, E. & Chait, M. Detecting and representing predictable structure during auditory scene analysis. eLife 5, e19113 (2016).
Siedenburg, K. & Müllensiefen, D. in Timbre: Acoustics, Perception, and Cognition (eds Siedenburg, K., Saitis, C., McAdams, S., Popper, A. & Fay, R.) 87–118 (Springer Cham, 2019).
Eisenberg, M. L., Zacks, J. M. & Flores, S. Dynamic prediction during perception of everyday events. Cogn. Res. Princ. Implic. 3, 53 (2018).
Monroy, C. D., Gerson, S. A. & Hunnius, S. Translating visual information into action predictions: statistical learning in action and nonaction contexts. Mem. Cognit. 46, 600–613 (2018).
Winkler, I. & Schröger, E. Auditory perceptual objects as generative models: setting the stage for communication by sound. Brain Lang. 148, 1–22 (2015).
Coy, N., Bendixen, A., Grimm, S., Roeber, U. & Schröger, E. Deviants violating higher-order auditory regularities can become predictive and facilitate behaviour. Atten. Percept. Psychophys. 85, 2731–2750 (2023).
Herman, D. et al. Mismatch negativity as a marker of auditory pattern separation. Cereb. Cortex 33, 10181–10193 (2023).
Winkler, I., Zuijen, T. L., Sussman, E., Horváth, J. & Näätänen, R. Object representation in the human auditory system. Eur. J. Neurosci. 24, 625–634 (2006).
Bizley, J. K., Maddox, R. K. & Lee, A. K. C. Defining auditory–visual objects: behavioral tests and physiological mechanisms. Trends Neurosci. 39, 74–85 (2016).
Colonius, H. & Diederich, A. Formal models and quantitative measures of multisensory integration: a selective overview. Eur. J. Neurosci. 51, 1161–1178 (2020).
Cornelio, P., Velasco, C. & Obrist, M. Multisensory integration as per technological advances: a review. Front. Neurosci. 15, 652611 (2021).
Rohe, T. & Noppeney, U. Cortical hierarchies perform Bayesian causal inference in multisensory perception. PLoS Biol. 13, 1002073 (2015).
Spence, C. & Di Stefano, N. Sensory translation between audition and vision. Psychon. Bull. Rev. https://doi.org/10.3758/s13423-023-02343-w (2023).
Turner, B. M., Gao, J., Koenig, S., Palfy, D. & McClelland, J. L. The dynamics of multimodal integration: the averaging diffusion model. Psychon. Bull. Rev. 24, 1819–1843 (2017).
Zmigrod, S. & Hommel, B. Feature integration across multimodal perception and action: a review. Multisens. Res. 26, 143–157 (2013).
Zhang, W.-H. et al. Complementary congruent and opposite neurons achieve concurrent multisensory integration and segregation. eLife 8, e43753 (2019).
Sommers, M. S., Tye-Murray, N. & Spehar, B. Auditory–visual speech perception and auditory–visual enhancement in normal-hearing younger and older adults. Ear Hear. 26, 263–275 (2005).
Lewkowicz, D. J., Schmuckler, M. & Agrawal, V. The multisensory cocktail party problem in adults: perceptual segregation of talking faces on the basis of audiovisual temporal synchrony. Cognition 214, 104743 (2021).
Fornaciai, M. & Luca, M. Causality shifts the perceived temporal order of audiovisual events. J. Exp. Psychol. Hum. Percept. Perform. 46, 890–900 (2020).
Chalas, N., Omigie, D., Poeppel, D. & Wassenhove, V. Hierarchically nested networks optimize the analysis of audiovisual speech. iScience 6, 106257 (2023).
Mallick, D. B., Magnotti, J. F. & Beauchamp, M. S. Variability and stability in the McGurk effect: contributions of participants, stimuli, time, and response type. Psychon. Bull. Rev. 22, 1299–1307 (2015).
McGurk, H. & MacDonald, J. Hearing lips and seeing voices. Nature 264, 746–748 (1976).
Spence, C. & Soto-Faraco, S. in Oxford Handbook of Auditory Science: Hearing, Oxford Library of Psychology (ed.Plack, C. J.) 271–296 (Oxford Academic, 2010).
Stekelenburg, J. J. & Vroomen, J. Neural correlates of multisensory integration of ecologically valid audiovisual events. J. Cogn. Neurosci. 19, 1964–1973 (2007).
Czigler, I. & Kojouharova, P. Visual mismatch negativity: a mini-review of non-pathological studies with special populations and stimuli. Front. Hum. Neurosci. 15, 781234 (2022).
Grundei, M., Schröder, P., Gijsen, S. & Blankenburg, F. EEG mismatch responses in a multimodal roving stimulus paradigm provide evidence for probabilistic inference across audition, somatosensation, and vision. Hum. Brain Mapp. 44, 3644–3668 (2023).
Shen, G., Smyk, N. J., Meltzoff, A. N. & Marshall, P. J. Neuropsychology of human body parts: exploring categorical boundaries of tactile perception using somatosensory mismatch responses. J. Cogn. Neurosci. 30, 1858–1869 (2018).
Grundei, M., Schmidt, T. T. & Blankenburg, F. A multimodal cortical network of sensory expectation violation revealed by fMRI. Hum. Brain Mapp. 44, 5871–5891 (2023).
Snyder, J. S. & Elhilali, M. Recent advances in exploring the neural underpinnings of auditory scene perception. Ann. N. Y. Acad. Sci. 1396, 39–55 (2017).
Szabó, B. T., Denham, S. L. & Winkler, I. Computational models of auditory scene analysis: a review. Front. Neurosci. 10, 524 (2016).
Krishnan, L. E. M. & Shamma, S. Segregating complex sound sources through temporal coherence. PLoS Comput. Biol. 10, 1003985 (2014).
Mill, R. W., Bőhm, T. M., Bendixen, A., Winkler, I. & Denham, S. L. Modelling the emergence and dynamics of perceptual organisation in auditory streaming. PLoS Comput. Biol. 9, 1002925 (2013).
Altmann, G. T. M. (ed.) Cognitive Models of Speech Processing: Psycholinguistic and Computational Perspectives (MIT Press, 1995).
Benetos, E., Dixon, S., Duan, Z. & Ewert, S. Automatic music transcription: an overview. IEEE Signal. Process. Mag. 36, 20–30 (2019).
Koelsch, S. Toward a neural basis of music perception — a review and updated model. Front. Psychol. 2, 110 (2011).
Large, E. W. et al. Dynamic models for musical rhythm perception and coordination. Front. Comput. Neurosci. 17, 1151895 (2023).
Harrison, P. M. C., Bianco, R., Chait, M. & Pearce, M. T. PPM-Decay: a computational model of auditory prediction with memory decay. PLoS Comput. Biol. 16, 1008304 (2020).
Winkler, I., Denham, S., Mill, R., Bőhm, T. M. & Bendixen, A. Multistability in auditory stream segregation: a predictive coding view. Philos. Trans. R. Soc. Lond. B Biol. Sci. 367, 1001–1012 (2012).
Alain, C. et al. Neural ɑ oscillations index context-driven perception of ambiguous vowel sequences. iScience 26, 108457 (2023).
Kondo, H. M., Farkas, D., Denham, S. L., Asai, T. & Winkler, I. Auditory multistability and neurotransmitter concentrations in the human brain. Philos. Trans. R. Soc. Lond. B Biol. Sci. 372, 20160110 (2017).
Caras, M. L. et al. Non-sensory influences on auditory learning and plasticity. J. Assoc. Res. Otolaryngol. 23, 151–166 (2022).
Reynolds, J. R., Zacks, J. M. & Braver, T. S. A computational model of event segmentation from perceptual prediction. Cogn. Sci. 31, 613–643 (2007).
Franklin, N. T., Norman, K. A., Ranganath, C., Zacks, J. M. & Gershman, S. J. Structured event memory: a neuro-symbolic model of event cognition. Psychol. Rev. 127, 327–361 (2020).
Cusimano, M., Hewitt, L. B. & McDermott, J. H. Bayesian auditory scene synthesis explains human perception of illusions and everyday sounds. Preprint at bioRxiv https://doi.org/10.1101/2023.04.27.538626 (2023).
Bekinschtein, T. A. et al. Neural signature of the conscious processing of auditory regularities. Proc. Natl Acad. Sci. USA 106, 1672–1677 (2009).
Horváth, J., Czigler, I., Sussman, E. & Winkler, I. Simultaneously active pre-attentive representations of local and global rules for sound sequences. Cogn. Brain Res. 12, 131–144 (2001).
Skerritt-Davis, B. & Elhilali, M. Neural encoding of auditory statistics. J. Neurosci. 41, 6726–6739 (2021).
Sussman, E., Ritter, W. & Vaughan, H. G. An investigation of the auditory streaming effect using event-related brain potentials. Psychophysiology 36, 22–34 (1999).
Aiken, S. J. & Picton, T. W. Human cortical responses to the speech envelope. Ear Hear. 29, 139–157 (2008).
Ding, N. & Simon, J. Z. Emergence of neural encoding of auditory objects while listening to competing speakers. Proc. Natl Acad. Sci. USA 109, 11854–11859 (2012).
Reetzke, R., Gnanateja, G. N. & Chandrasekaran, B. Neural tracking of the speech envelope is differentially modulated by attention and language experience. Brain Lang. 213, 104891 (2021).
Hommel, B. Theory of event coding (TEC) V2.0: representing and controlling perception and action. Atten. Percept. Psychophys. 81, 2139–2154 (2019).
Korka, B., Widmann, A., Waszak, F., Darriba, Á. & Schröger, E. The auditory brain in action: intention determines predictive processing in the auditory system — a review of current paradigms and findings. Psychon. Bull. Rev. 29, 321–342 (2022).
He, C. & Trainor, L. J. Finding the pitch of the missing fundamental in infants. J. Neurosci. 29, 7718–8822 (2009).
Luck, S. J. An Introduction to the Event-Related Potential Technique (MIT Press, 2014).
Escera, C. Contributions of the subcortical auditory system to predictive coding and the neural encoding of speech. Curr. Opin. Behav. Sci. 54, 101324 (2023).
Pantev, C., Hoke, M., Lütkenhöner, B. & Lehnertz, K. Tonotopic organization of the auditory cortex: pitch versus frequency representation. Science 246, 486–488 (1989).
Bendixen, A., Jones, S. J., Klump, G. & Winkler, I. Probability dependence and functional separation of the object-related and mismatch negativity event-related potential components. Neuroimage 50, 285–290 (2010).
Näätänen, R., Gaillard, A. W. K. & Mäntysalo, S. Early selective-attention effect on evoked potential reinterpreted. Acta Psychol. 42, 313–329 (1978).
Winkler, I. Interpreting the mismatch negativity. J. Psychophysiol. 21, 147–163 (2007).
Winkler, I., Teder-Sälejärvi, W. A., Horváth, J., Näätänen, R. & Sussman, E. Human auditory cortex tracks task-irrelevant sound sources. NeuroReport 14, 2053–2056 (2003).
Bendixen, A., Prinz, W. G., Horváth, J., Trujillo-Barreto, N. J. & Schröger, E. Rapid extraction of auditory feature contingencies. Neuroimage 41, 1111–1119 (2008).
Mittag, M., Takegata, R. & Winkler, I. Transitional probabilities are prioritized over stimulus/pattern probabilities in auditory deviance detection: memory basis for predictive sound processing. J. Neurosci. 36, 9572–9579 (2016).
Paavilainen, P., Arajärvi, P. & Takegata, R. Preattentive detection of nonsalient contingencies between auditory features. NeuroReport 18, 159–163 (2007).
Garrido, M. I., Kilner, J. M., Stephan, K. E. & Friston, K. J. The mismatch negativity: a review of underlying mechanisms. Clin. Neurophysiol. 120, 453–463 (2009).
Poublan-Couzardot, A. et al. Time-resolved dynamic computational modeling of human EEG recordings reveals gradients of generative mechanisms for the MMN response. PLoS Comput. Biol. 19, 1010557 (2023).
Fink, L., Hörster, M., Poeppel, D., Wald-Fuhrmann, M. & Larrouy-Maestri, P. Features underlying speech versus music as categories of auditory experience. Preprint at PsyArXiv https://doi.org/10.31234/osf.io/2635u (2023).
Bigand, E. & Poulin-Charronnat, B. Are we “experienced listeners”? A review of the musical capacities that do not depend on formal musical training. Cognition 100, 100–130 (2006).
Ren, Y. & Brown, T. I. Beyond the ears: a review exploring the interconnected brain behind the hierarchical memory of music. Psychon. Bull. Rev. https://doi.org/10.3758/s13423-023-02376-1 (2023).
Gervain, J., Cruz‐Pavía, I. & Gerken, L. Behavioral and imaging studies of infant artificial grammar learning. Top. Cogn. Sci. 12, 815–827 (2020).
Ragert, M., Fairhurst, M. T. & Keller, P. E. Segregation and integration of auditory streams when listening to multi-part music. PLoS ONE 9, 84085 (2014).
Tóth, B. et al. The effects of speech processing units on auditory stream segregation and selective attention in a multi-talker (cocktail party) situation. Cortex 130, 387–400 (2020).
Di Liberto, G. M. et al. Cortical encoding of melodic expectations in human temporal cortex. eLife 9, e51784 (2020).
Gwilliams, L., King, J.-R., Marantz, A. & Poeppel, D. Neural dynamics of phoneme sequences reveal position-invariant code for content and order. Nat. Commun. 13, 6606 (2022).
Gwilliams, L., Marantz, A., Poeppel, D. & King, J.-R. Top-down information shapes lexical processing when listening to continuous speech. Lang. Cogn. Neurosci. https://doi.org/10.1080/23273798.2023.2171072 (2023).
Koelsch, S., Schröger, E. & Tervaniemi, M. Superior pre-attentive auditory processing in musicians. NeuroReport 10, 1309–1313 (1999).
Micheyl, C., Delhommeau, K., Perrot, X. & Oxenham, A. J. Influence of musical and psychoacoustical training on pitch discrimination. Hear. Res. 219, 36–47 (2006).
Tervaniemi, M., Just, V., Koelsch, S., Widmann, A. & Schröger, E. Pitch discrimination accuracy in musicians vs nonmusicians: an event-related potential and behavioral study. Exp. Brain Res. 161, 1–10 (2004).
Chartrand, J.-P. & Belin, P. Superior voice timbre processing in musicians. Neurosci. Lett. 405, 164–167 (2006).
Chartrand, J.-P., Peretz, I. & Belin, P. Auditory recognition expertise and domain specificity. Brain Res. 1220, 191–198 (2008).
Münzer, S., Berti, S. & Pechmann, T. Encoding of timbre, speech and tones: musicians vs. non-musicians. Psychol. Beitr. 44, 187–202 (2002).
Jacobsen, T. et al. Pre-attentive auditory processing of lexicality. Brain Lang. 88, 54–67 (2004).
Winkler, I. et al. Brain responses reveal the learning of foreign language phonemes. Psychophysiology 36, 638–642 (1999).
Zaltz, Y., Globerson, E. & Amir, N. Auditory perceptual abilities are associated with specific auditory experience. Front. Psychol. 8, 2080 (2017).
Bögels, S., Magyari, L. & Levinson, S. C. Neural signatures of response planning occur midway through an incoming question in conversation. Sci. Rep. 5, 12881 (2015).
Magyari, L., Bastiaansen, M. C. M., Ruiter, J. P. & Levinson, S. C. Early anticipation lies behind the speed of response in conversation. J. Cogn. Neurosci. 26, 2530–2539 (2014).
François, C. & Schön, D. Neural sensitivity to statistical regularities as a fundamental biological process that underlies auditory learning: the role of musical practice. Hear. Res. 308, 122–128 (2014).
Koelsch, S., Vuust, P. & Friston, K. Predictive processes and the peculiar case of music. Trends Cogn. Sci. 23, 63–77 (2019).
Jacobsen, T., Schröger, E., Winkler, I. & Horváth, J. Familiarity affects the processing of task-irrelevant ignored sounds. J. Cogn. Neurosci. 17, 1704–1713 (2005).
Belin, P., Zatorre, R. J. & Ahad, P. Human temporal-lobe response to vocal sounds. Cogn. Brain Res. 13, 17–26 (2002).
Bermudez, P., Lerch, J. P., Evans, A. C. & Zatorre, R. J. Neuroanatomical correlates of musicianship as revealed by cortical thickness and voxel-based morphometry. Cereb. Cortex 19, 1583–1596 (2008).
Criscuolo, A., Pando-Naude, V., Bonetti, L., Vuust, P. & Brattico, E. An ALE meta-analytic review of musical expertise. Sci. Rep. 12, 11726 (2022).
Pallesen, K. J. et al. Cognitive control in auditory working memory is enhanced in musicians. PLoS ONE 5, 11120 (2010).
Strait, D. L. & Kraus, N. Biological impact of auditory expertise across the life span: musicians as a model of auditory learning. Hear. Res. 308, 109–121 (2014).
Pettijohn, K. A. & Radvansky, G. A. Narrative event boundaries, reading times, and expectation. Mem. Cognit. 44, 1064–1075 (2016).
Speer, N. K., Zacks, J. M. & Reynolds, J. R. Human brain activity time-locked to narrative event boundaries. Psychol. Sci. 18, 449–455 (2007).
Acknowledgements
The authors thank I. Czigler and M. Racsmány for discussing ideas that shaped the review and D. Salisbury and B. A. Coffman for personal communications of the illustrations in parts d,e and f of Box 1. The writing of this review was supported by the Hungarian National Research, Development and Innovation Office (grant K132642 to I.W.).
Author information
Authors and Affiliations
Contributions
The authors contributed equally to all aspects of the article.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Reviews Psychology thanks Robert Baumgartner, Sabine Grimm and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Winkler, I., Denham, S.L. The role of auditory source and action representations in segmenting experience into events. Nat Rev Psychol 3, 223–241 (2024). https://doi.org/10.1038/s44159-024-00287-z
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s44159-024-00287-z