Children’s early speech often bears little resemblance to that of adults, and yet parents and other caregivers are able to interpret that speech and react accordingly. Here we investigate how adult listeners’ inferences reflect sophisticated beliefs about what children are trying to communicate, as well as how children are likely to pronounce words. Using a Bayesian framework for modelling spoken word recognition, we find that computational models can replicate adult interpretations of children’s speech only when they include strong, context-specific prior expectations about the messages that children will want to communicate. This points to a critical role of adult cognitive processes in supporting early communication and reveals how children can actively prompt adults to take actions on their behalf even when they have only a nascent understanding of the adult language. We discuss the wide-ranging implications of the powerful listening capabilities of adults for theories of first language acquisition.
This is a preview of subscription content, access via your institution
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Rent or buy this article
Prices vary by article type
Prices may be subject to local taxes which are calculated during checkout
All data used to train language models come from public child language transcripts retrieved through the Child Language Data Exchange System (CHILDES50) using childes-db74. Test datasets come from the Providence corpus27, which have been made publicly available through the PhonBank75 project (https://phonbank.talkbank.org/phon/Eng-NA/Providence.zip). For this project data were obtained through childes-db74 (https://childes-db.stanford.edu).
All model training and analysis code is available through our GitHub repository at https://github.com/smeylan/child-directed-listening. Fine-tuned models and pre-processed child transcripts can be accessed through our Open Science Foundation repository at osf.io/v7c3e/.
Chomsky, N. Aspects of the Theory of Syntax (MIT Press, 1965).
Pinker, S. Formal models of language learning. Cognition 7, 217–283 (1979).
Saffran, J. R., Aslin, R. N. & Newport, E. L. Statistical learning by 8-month-old infants. Science 274, 1926–1928 (1996).
Landauer, T. K. & Dumais, S. T. A solution to Plato’s problem: the latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychol. Rev. 104, 211 (1997).
Dupoux, E. Cognitive science in the era of artificial intelligence: a roadmap for reverse-engineering the infant language-learner. Cognition 173, 43–59 (2018).
Hoff, E. How social contexts support and shape language development. Dev. Rev. 26, 55–88 (2006).
Onnis, L. Caregiver communication to the child as moderator and mediator of genes for language. Behav. Brain Res. 325, 197–202 (2017).
Markus, J., Mundy, P., Morales, M., Delgado, C. E. F. & Yale, M. Individual differences in infant skills as predictors of child–caregiver joint attention and language. Soc. Dev. 9, 302–315 (2000).
Roseberry, S., Hirsh-Pasek, K. & Golinkoff, R. M. Skype me! Socially contingent interactions help toddlers learn language. Child Dev. 85, 956–970 (2014).
Rowland, C. F., Pine, J. M., Lieven, E. V. & Theakston, A. L. Determinants of acquisition order in wh-questions: re-evaluating the role of caregiver speech. J. Child Lang. 30, 609–635 (2003).
Stein, A., Malmberg, L. E., Sylva, K., Barnes, J. & Leach, P. The influence of maternal depression, caregiving, and socioeconomic status in the post-natal year on children’s language development. Child Care Health Dev. 34, 603–612 (2008).
Fusaroli, R., Weed, E., Fein, D. & Naigles, L. Caregiver linguistic alignment to autistic and typically developing children. Cognition 236, 105422 (2021).
Newport, E. L. Motherese: The Speech of Mothers to Young Children (Univ. Pennsylvania, 1975).
Huttenlocher, J., Haight, W., Bryk, A., Seltzer, M. & Lyons, T. Early vocabulary growth: relation to language input and gender. Dev. Psychol. 27, 236 (1991).
Hart, B. & Risley, T. R. Meaningful Differences in the Everyday Experience of Young American Children (Paul H Brookes Publishing, 1995).
Rowe, M. L. A longitudinal investigation of the role of quantity and quality of child-directed speech in vocabulary development. Child Dev. 83, 17620–1774 (2012).
Golinkoff, R. M., Hoff, E., Rowe, M. L., Tamis-LeMonda, C. S. & Hirsh-Pasek, K. Language matters: denying the existence of the 30-million-word gap has serious consequences. Child Dev. 90, 985–992 (2019).
Cartmill, E. A. et al. Quality of early parent input predicts child vocabulary 3 years later. Proc. Natl Acad. Sci. USA 110, 11278–11283 (2013).
Weizman, Z. O. & Snow, C. E. Lexical input as related to children’s vocabulary acquisition: effects of sophisticated exposure and support for meaning. Dev. Psychol. 37, 265–279 (2001).
Bergelson, E. et al. What do North American babies hear? A large-scale cross-corpus analysis. Dev. Sci. 22, e12724 (2019).
Cristia, A., Dupoux, E., Gurven, M. & Stieglitz, J. Child-directed speech is infrequent in a forager-farmer population: a time allocation study. Child Dev. 90, 759–773 (2019).
Golinkoff, R. M. ‘I beg your pardon?’: the preverbal negotiation of failed messages. J. Child Lang. 13, 455–476 (1986).
Golinkoff, R. M. & Gordon, L. What makes communication run? Characteristics of immediate successes. First Lang. 8, 103–124 (1988).
Tomasello, M., Conti-Ramsden, G. & Ewert, B. Young children’s conversations with their mothers and fathers: differences in breakdown and repair. J. Child Lang. 17, 115–130 (1990).
Frank, M. C., Braginsky, M., Yurovsky, D. & Marchman, V. A. Variability and Consistency in Early Language Learning: The Wordbank Project (MIT Press, 2021).
Demuth, K., Culbertson, J. & Alter, J. Word-minimality, epenthesis and coda licensing in the early acquisition of English. Lang. Speech 49, 137–174 (2006).
Demuth, K. & McCullough, E. The prosodic (re)organization of children’s early English articles. J. Child Lang. 36, 173–200 (2009).
Shannon, C. E. Prediction and entropy of printed English. Bell Syst. Tech. J. 30, 50–64 (1951).
Levy, R. A noisy-channel model of human sentence comprehension under uncertain input. In Proc. 2008 Conference on Empirical Methods in Natural Language Processing (EMNLP) 234–243 (Association for Computational Linguistics, 2008).
Gibson, E., Bergen, L. & Piantadosi, S. T. Rational integration of noisy evidence and prior semantic expectations in sentence interpretation. Proc. Natl Acad. Sci. USA 110, 8051–8056 (2013).
Meylan, S. C., Nair, S. & Griffiths, T. L. Evaluating models of robust word recognition with serial reproduction. Cognition 210, 104553 (2021).
Norris, D. & McQueen, J. M. Shortlist B: a Bayesian model of continuous speech recognition. Psychol. Rev. 115, 357–395 (2008).
Chater, N. & Oaksford, M. The Probabilistic Mind: Prospects for Bayesian Cognitive Science (Oxford Univ. Press, 2008).
Perfors, A., Tenenbaum, J. B., Griffiths, T. L. & Xu, F. A tutorial introduction to Bayesian models of cognitive development. Cognition 120, 302–321 (2011).
Miller, G. A., Heise, G. A. & Lichten, W. The intelligibility of speech as a function of the context of the test materials. J. Exp. Psychol. 41, 329 (1951).
Howes, D. On the relation between the intelligibility and frequency of occurrence of English words. J. Acoust. Soc. Am. 29, 296–305 (1957).
Norris, D., McQueen, J. M. & Cutler, A. Prediction, Bayesian inference and feedback in speech recognition. Lang. Cogn. Neurosci. 31, 4–18 (2016).
Rohde, H. & Ettlinger, M. Integration of pragmatic and phonetic cues in spoken word recognition. J. Exp. Psychol. Learn. Mem. Cogn. 38, 967–983 (2012).
Altmann, G. T. M. & Kamide, Y. Incremental interpretation at verbs: restricting the domain of subsequent reference. Cognition 73, 247–264 (1999).
Kamide, Y., Altmann, G. T. M. & Haywood, S. L. The time-course of prediction in incremental sentence processing: evidence from anticipatory eye movements. J. Mem. Lang. 49, 133–156 (2003).
Tanenhaus, M. K., Spivey-Knowlton, M. J., Eberhard, K. M. & Sedivy, J. C. Integration of visual and linguistic information in spoken language comprehension. Science 268, 1632–1634 (1995).
Kleinschmidt, D. F. & Jaeger, T. F. Robust speech perception: recognize the familiar, generalize to the similar, and adapt to the novel. Psychol. Rev. 122, 148–203 (2015).
Reddy, D. R. (ed.) Speech Recognition: Invited Papers Presented at the 1974 IEEE Symposium (Elsevier, 1975).
Wagner, R. A. & Fischer, M. J. The string-to-string correction problem. J. ACM 21, 168–173 (1974).
Devlin, J., Chang, M., Lee, K., and Toutanova, K. BERT: pre-training of deep bidirectional transformers for language understanding. In Proc. 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies Vol. 1, 4171–4186 (Association for Computational Linguistics, 2019).
Radford, A. et al. Language Models are Unsupervised Multitask Learners (OpenAI, 2019).
Meister, C. et al. Revisiting the uniform information density hypothesis. In Proc. 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP) 963–980 (Association for Computational Linguistics, 2021).
Schrimpf, M. et al. The neural architecture of language: integrative modeling converges on predictive processing. Proc. Natl Acad. Sci. USA https://doi.org/10.1073/pnas.2105646118 (2021).
Manning, C. & Schutze, H. Foundations of Statistical Natural Language Processing (MIT Press, 1999).
MacWhinney, B. The CHILDES Project: Tools for Analyzing Talk. Transcription Format and Programs Vol. 1 (Psychology Press, 2000).
Godfrey, J. J., Holliman, E. C. & McDaniel, J. Switchboard: telephone speech corpus for research and development. In IEEE International Conference on Acoustics, Speech, and Signal Processing Vol. 1, 517–520 (IEEE Computer Society, 1992).
DeLong, E. R., DeLong, D. M. & Clarke-Pearson, D. L. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 44, 837–845 (1988).
Levy, R. Expectation-based syntactic comprehension. Cognition 106, 1126–1177 (2008).
Hale, J. A probabilistic Earley parser as a psycholinguistic model. In Proc. 2nd Meeting of the North American Chapter of the Association for Computational Linguistics N01-1021 (Association for Computational Linguistics, 2001).
Barr, D. J., Levy, R., Scheepers, C. & Tily, H. J. Random effects structure for confirmatory hypothesis testing: keep it maximal. J. Mem. Lang. https://doi.org/10.1016/j.jml.2012.11.001 (2013).
Chouinard, M. M. & Clark, E. V. Adult reformulations of child errors as negative evidence. J. Child Lang. 30, 637–669 (2003).
Marcus, G. F. Negative evidence in language acquisition. Cognition 46, 53–85 (1993).
Demetras, M. J., Post, K. N. & Snow, C. E. Feedback to first language learners: the role of repetitions and clarification questions. J. Child Lang. 13, 275–292 (1986).
Dore, J. Holophrases, speech acts and language universals. J. Child Lang. 2, 21–40 (1975).
Fenson, L. et al. MacArthur-Bates Communicative Development Inventories (Paul H. Brookes Publishing Company, 2007).
Mohri, M., Pereira, F. & Riley, M. Weighted finite-state transducers in speech recognition. Comput. Speech Lang. 16, 69–88 (2002).
Gorman, K. et al. The SIGMORPHON 2020 shared task on multilingual grapheme-to-phoneme conversion. In Proc. 17th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology 40–50 (Association for Computational Linguistics, 2020).
Novak, J. R., Minematsu, N. & Hirose, K. Phonetisaurus: exploring grapheme-to-phoneme conversion with joint n-gram models in the wfst framework. Nat. Lang. Eng. 22, 907–938 (2016).
Dempster, A. P., Laird, N. M. & Rubin, D. B. Maximum likelihood from incomplete data via the em algorithm. J. R. Stat. Soc. B 39, 1–22 (1977).
Gorman, K., Kirov, C., Roark, B. & Sproat, R. Structured abbreviation expansion in context. In Findings of the Association for Computational Linguistics: EMNLP 2021 995–1005 (Association for Computational Linguistics, 2021).
Galescu, L. & Allen, J. F. Bi-directional conversion between graphemes and phonemes using a joint n-gram model. In 4th ISCA Tutorial and Research Workshop (ITRW) on Speech Synthesis (International Speech Communication Association, 2001).
Novak, J.R., Minematsu, N. & Hirose, K. WFST-based grapheme-to-phoneme conversion: Open source tools for alignment, model-building and decoding. In Proc. 10th International Workshop on Finite State Methods and Natural Language Processing 45–49 (Association for Computational Linguistics, 2012).
Salazar, J., Liang, D., Nguyen, T. Q. & Kirchhoff, K. Masked language model scoring. In Proc. 58th Annual Meeting of the Association for Computational Linguistics 2699–2712 (Association for Computational Linguistics, 2020).
Jawahar, G., Sagot, B., and Seddah, D. What does BERT learn about the structure of language? In Proc. 57th Annual Meeting of the Association for Computational Linguistics 3651–3657 (Association for Computational Linguistics, 2019).
Wolf, T. et al. Transformers: state-of-the-art natural language processing. In Proc. 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations 38–45 (Association for Computational Linguistics, 2020).
Hofmann, V., Pierrehumbert, J., & Schütze, H. Superbizarre is not superb: derivational morphology improves BERT’s interpretation of complex words. In Proc. 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing Vol. 1, 3594–3608 (Association for Computational Linguistics, 2021).
Shibata, Y. et al. Byte Pair Encoding: A Text Compression Scheme That Accelerates Pattern Matching Technical Report DOI-TR-161 (Department of Informatics, Kyushu University, 1999).
Chen, S. F. & Goodman, J. An empirical study of smoothing techniques for language modeling. Comput. Speech Lang. 13, 359–394 (1999).
Sanchez, A. et al. childes-db: a flexible and reproducible interface to the child language data exchange system. Behav. Res. Methods 51, 1928–1941 (2019).
Rose, Y., & MacWhinney, B. in The Oxford Handbook of Corpus Phonology (eds Durand J. et al.) 380–401 (Oxford Univ. Press, 2014).
Child-directed listening. Open Science Framework https://osf.io/v7c3e/ (2021).
Fawcett, T. An introduction to ROC analysis. Pattern Recognit. Lett. 27, 861–874 (2006).
We thank J. Mankewitz, S. Nair and R. Jansen for providing feedback on early drafts as well as members of the Computational Psycholinguistics Lab at MIT and the Bergelson Lab at Duke for valuable discussion. We thank K. Gorman, T. Eisape and P. Qian for several helpful technical consultations. S. Zhi contributed to the implementation of the pronunciation module. This work was supported by NSF grants BCS-1551866 (R.P.L.), BCS-1844710 (R.P.L.) and BCS-2121074 (R.P.L.); NIH grant 1F32HD097982 (S.C.M.) and DP5 OD019812-01 (E.B.); and the CONVO grant to MIT Brain and Cognitive Sciences from the Simons Center for the Social Brain (R.P.L., S.C.M. and N.H.W.). R.F. received no specific funding for this work. The funders above had no role in study design, data collection and analysis, or the decision to publish or preparation of the manuscript.
The authors declare no competing interests.
Peer review information
Nature Human Behaviour thanks Riccardo Fusaroli, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Meylan, S.C., Foushee, R., Wong, N.H. et al. How adults understand what young children say. Nat Hum Behav (2023). https://doi.org/10.1038/s41562-023-01698-3