Cultural influences on word meanings revealed through large-scale semantic alignment

Thompson, Bill; Roberts, Seán G.; Lupyan, Gary

doi:10.1038/s41562-020-0924-8

Article
Published: 10 August 2020

Cultural influences on word meanings revealed through large-scale semantic alignment

Nature Human Behaviour volume 4, pages 1029–1038 (2020)Cite this article

5951 Accesses
46 Citations
218 Altmetric
Metrics details

Subjects

Abstract

If the structure of language vocabularies mirrors the structure of natural divisions that are universally perceived, then the meanings of words in different languages should closely align. By contrast, if shared word meanings are a product of shared culture, history and geography, they may differ between languages in substantial but predictable ways. Here, we analysed the semantic neighbourhoods of 1,010 meanings in 41 languages. The most-aligned words were from semantic domains with high internal structure (number, quantity and kinship). Words denoting natural kinds, common actions and artefacts aligned much less well. Languages that are more geographically proximate, more historically related and/or spoken by more-similar cultures had more aligned word meanings. These results provide evidence that the meanings of common words vary in ways that reflect the culture, history and geography of their users.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: High alignment between English (‘Tuesday’) and Danish (‘Tirsdag’).**

**Fig. 2: Low alignment between English (‘beautiful’) and French (‘beau’).**

**Fig. 3: Semantic alignment of 21 semantic domains.**

**Fig. 4: Semantic alignment of number words.**

**Fig. 5: Semantic alignment by part of speech.**

**Fig. 6: Semantic distances for Indo-European languages.**

Lexibank, a public repository of standardized wordlists with computed phonological and lexical features

Article Open access 16 June 2022

Dissecting The Analects: an NLP-based exploration of semantic similarities and differences across English translations

Article Open access 05 January 2024

Shared structure of fundamental human experience revealed by polysemy network of basic vocabularies across languages

Article Open access 11 March 2024

Data availability

Data and reproducible analyses are available at https://osf.io/tngba/.

Code availability

Code to implement the alignment algorithm is available at https://osf.io/tngba/.

References

Gleitman, L. & Fisher, C. In The Cambridge Companion to Chomsky (ed. McGilvray, J.) 123–142 (Cambridge Univ. Press, 2005).
Snedeker, J. & Gleitman, L. in Weaving a Lexicon illustrated edn (eds. Hall, D. G. & Waxman, S.) 257–294 (MIT Press, 2004).
Pinker, S. The Language Instinct (Harper Collins, 1994).
Berlin, B. & Kay, P. Basic Color Terms: Their Universality and Evolution (Univ. California Press, 1969).
Li, P. & Gleitman, L. Turning the tables: language and spatial reasoning. Cognition 83, 265–294 (2002).
CAS PubMed Google Scholar
Evans, N. & Levinson, S. C. The myth of language universals: language diversity and its importance for cognitive science. Behav. Brain Sci. 32, 429–448 (2009).
PubMed Google Scholar
Lupyan, G. The centrality of language in human cognition. Lang. Learn. 66, 516–553 (2016).
Google Scholar
Lupyan, G. & Dale, R. Why are there different languages? The role of adaptation in linguistic diversity. Trends Cogn. Sci. 20, 649–660 (2016).
PubMed Google Scholar
Davidson, D. On the very idea of a conceptual scheme. P. Am. Philos. Soc. 47, 5–20 (1973).
Google Scholar
Lupyan, G. & Zettersten, M. in Minnesota Symposia on Child Psychology Vol. 40 (in the press).
Whorf, B. Language, Thought, and Reality (MIT Press, 1956).
Zgusta, L. Manual of Lexicography (Mouton, 1971).
Haspelmath, M. Lexical Borrowing: Concepts and Issues. Loanwords in the World’s Languages: A Comparative Handbook 35–54 (De Gruyter Mouton, 2009).
Myers-Scotton, C. Contact Linguistics: Bilingual Encounters and Grammatical Outcomes (Oxford Univ. Press, 2002).
Xu, Y., Duong, K., Malt, B. C., Jiang, S. & Srinivasan, M. Conceptual relations predict colexification across languages. Cognition 201, 104280 (2020).
PubMed Google Scholar
Regier, T., Carstensen, A. & Kemp, C. Languages support efficient communication about the environment: words for snow revisited. PloS ONE 11, e0151138 (2016).
PubMed PubMed Central Google Scholar
Winter, B., Perlman, M. & Majid, A. Vision dominates in perceptual language: English sensory vocabulary is optimized for usage. Cognition 179, 213–220 (2018).
PubMed Google Scholar
Majid, A. et al. Differential coding of perception in the world’s languages. Proc. Natl Acad. Sci. USA 115, 11369–11376 (2018).
CAS PubMed Google Scholar
San Roque, L., Kendrick, K. H., Norcliffe, E. & Majid, A. Universal meaning extensions of perception verbs are grounded in interaction. Cogn. Linguist. 29, 371–406 (2018).
Google Scholar
Svensén, B. A Handbook of Lexicography: The Theory and Practice of Dictionary-Making 1st edn (Cambridge Univ. Press, 2009).
Cuyckens, H., Dirven, R. & Taylor, J. R. Cognitive Approaches to Lexical Semantics Vol. 23 (Walter de Gruyter, 2009).
Barnett, G. A. Bilingual semantic organization: a multidimensional analysis. Journal of Cross Cult. Psychol. 8, 315–330 (1977).
Google Scholar
Moldovan, C. D., Sánchez-Casas, R., Demestre, J. & Ferré, P. Interference effects as a function of semantic similarity in the translation recognition task in bilinguals of Catalan and Spanish. Psicologica 33, 77–110 (2012).
Google Scholar
Tokowicz, N., Kroll, J. F., De Groot, A. M. & Van Hell, J. G. Number-of-translation norms for Dutch–English translation pairs: a new tool for examining language production. Behav. Res. Methods Instrum. Comput. 34, 435–451 (2002).
PubMed Google Scholar
Dijkstra, T., Miwa, K., Brummelhuis, B., Sappelli, M. & Baayen, H. How cross-language similarity and task demands affect cognate recognition. J. Mem. Lang. 62, 284–301 (2010).
Google Scholar
Allen, D. & Conklin, K. Cross-linguistic similarity norms for Japanese–English translation equivalents. Behavi. Res. Methods 46, 540–563 (2014).
Google Scholar
Bradley, M. M. & Lang, P. J. Affective Norms for English Words (ANEW): Instruction Manual and Affective Ratings https://www.uvm.edu/pdodds/teaching/courses/2009-08UVM-300/docs/others/everything/bradley1999a.pdf (University of Florida, 1999).
Fairfield, B., Ambrosini, E., Mammarella, N. & Montefinese, M. Affective norms for italian words in older adults: age differences in ratings of valence, arousal and dominance. PLoS ONE 12, e0169472 (2017).
Miller, G. A., Beckwith, R., Fellbaum, C., Gross, D. & Miller, K. J. Introduction to wordnet: an on-line lexical database. Int. J. Lexicogr. 3, 235–244 (1990).
Google Scholar
Sigman, M. & Cecchi, G. A. Global organization of the wordnet lexicon. Proc. Natl Acad. Sci. USA 99, 1742–1747 (2002).
CAS PubMed Google Scholar
Majid, A., Jordan, F. & Dunn, M. Semantic systems in closely related languages. Lang. Sci. 49, 1–18 (2015).
Calude, A. S. & Verkerk, A. The typology and diachrony of higher numerals in Indo-European: a phylogenetic comparative study. J. Lang. Evol. 1, 91–108 (2016).
Google Scholar
Verkerk, A. Where do all the motion verbs come from? The speed of development of manner verbs and path verbs in Indo-European. Diachronica 32, 69–104 (2015).
Google Scholar
Youn, H. et al. On the universal structure of human lexical semantics. Proc. Natl Acad. Sci. USA 113, 1766–1771 (2016).
CAS PubMed Google Scholar
Vivas, L., Montefinese, M., Bolognesi, M. & Vivas, J. Core features: measures and characterization for different languages. Cogn. Process. https://doi.org/10.1007/s10339-020-00969-5 (2020).
Jackson, J. C. et al. Emotion semantics show both cultural variation and universal structure. Science 366, 1517–1522 (2019).
Firth, J. R. Papers in Linguistics 1934-1951 (Oxford Univ. Press, 1957).
Lund, K. & Burgess, C. Producing high-dimensional semantic spaces from lexical co-occurrence. Behav. Res. Methods Instrum. Comput. 28, 203–208 (1996).
Google Scholar
Elman, J. An alternative view of the mental lexicon. Trends Cogn. Sci. 8, 301–306 (2004).
PubMed Google Scholar
Landauer, T. K. & Dumais, S. T. A solution to Plato’s problem: the latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychol. Review 104, 211–240 (1997).
Google Scholar
Mikolov, T., Chen, K., Corrado, G. & Dean, J. Efficient estimation of word representations in vector space. Preprint at arXiv https://arxiv.org/abs/1301.3781 (2013).
Baroni, M., Dinu, G. & Kruszewski, G. Don’t count, predict! a systematic comparison of context-counting vs. context-predicting semantic vectors. in Proc. 52nd Annual Meeting of the Association for Computational Linguistics Vol. 1 (eds Toutanova, K. & Wu, H.) 238–247 (Association for Computational Linguistics, 2014).
Baroni, M. & Lenci, A. Distributional memory: a general framework for corpus-based semantics. Comput. Linguist. 36, 673–721 (2010).
Google Scholar
Hollis, G. & Westbury, C. The principals of meaning: extracting semantic dimensions from co-occurrence models of semantics. Psychon. Bull. Rev. 23, 1744–1756 (2016).
PubMed Google Scholar
Nematzadeh, A., Meylan, S. C. & Griffiths, T. L. Evaluating vector-space models of word representation, or, the unreasonable effectiveness of counting words near other words. in Proc. 39th Annual Meeting of the Cognitive Science Society (eds Granger, R., Hahn, U. & Sutton, R.) 859–864 (Cognitive Science Society, 2017).
Levy, O. & Goldberg, Y. Neural word embedding as implicit matrix factorization. Proc. 27th International Conference on Neural Information Processing Systems (eds Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N. D. & Weinberger, K. Q.) 2177–2185 (MIT Press, 2014).
Hill, F., Reichart, R. & Korhonen, A. Simlex-999: Evaluating semantic models with (genuine) similarity estimation. Comput. Linguist. 41, 665–695 (2015).
Caliskan, A., Bryson, J. J. & Narayanan, A. Semantics derived automatically from language corpora contain human-like biases. Science 356, 183–186 (2017).
CAS PubMed Google Scholar
Boleda, G. Distributional semantics and linguistic theory. Ann. Rev. Linguist. 6, 213–234 (2020).
Google Scholar
Tshitoyan, V. et al. Unsupervised word embeddings capture latent knowledge from materials science literature. Nature 571, 95–98 (2019).
CAS PubMed Google Scholar
De Deyne, S., Perfors, A. & Navarro, D. J. Predicting human similarity judgments with distributional models: the value of word associations. in Proc. COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers (eds Matsumoto, Y. & Prasad, R.) 1861–1870 (COLING 2016 Organizing Committee, 2016).
Šipka, D. Lexical Conflict: Theory and Practice (Cambridge Univ. Press, 2015).
Dellert, J. et al. NorthEuraLex: a wide-coverage lexical database of Northern Eurasia. Lang. Resour. Eval. 54, 273–301 (2020).
PubMed Google Scholar
Bojanowski, P., Grave, E., Joulin, A. & Mikolov, T. Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017).
Google Scholar
Lison, P. & Tiedemann, J. Opensubtitles2016: extracting large parallel corpora from movie and TV subtitles. In Proc. International Conference on Language Resources and Evaluation (LREC 2016) (eds Calzolari, N., et al) 923–929 (European Language Resources Association, 2016).
Grave, E., Bojanowski, P., Gupta, P., Joulin, A. & Mikolov, T. Learning word vectors for 157 languages. In Proc. International Conference on Language Resources and Evaluation (LREC 2018) (eds Nicoletta Calzolari, N. et al) 3483–3487 (European Language Resources Association, 2018).
Duñabeitia, J. A. et al. MultiPic: a standardized set of 750 drawings with norms for six European languages. Q. J. Exp. Psychol. 71, 808–816 (2018).
Google Scholar
Brysbaert, M., Warriner, A. B. & Kuperman, V. Concreteness ratings for 40 thousand generally known English word lemmas. Behav. Res. Methods 46, 904–911 (2014).
PubMed Google Scholar
Jones, D. Human kinship, from conceptual structure to grammar. Behav. Brain Sci. 33, 367–381 (2010).
PubMed Google Scholar
Kemp, C. & Regier, T. Kinship categories across languages reflect general communicative principles. Science 336, 1049–1054 (2012).
CAS PubMed Google Scholar
Givón, T. On the development of the numeral ‘one’ as an indefinite marker. Folia Linguist. Hist. 15, 35–54 (1981).
Google Scholar
Rzymski, C. et al. The database of cross-linguistic colexifications, reproducible analysis of cross-linguistic polysemies. Sci. Data 7, 13 (2020).
PubMed PubMed Central Google Scholar
Dehaene, S. & Mehler, J. Cross-linguistic regularities in the frequency of number words. Cognition 43, 1–29 (1992).
CAS PubMed Google Scholar
Vecchi, E. M., Baroni, M. & Zamparelli, R. Linear maps of the impossible: capturing semantic anomalies in distributional space. in Proc. Workshop on Distributional Semantics and Compositionality (eds Biemann, C. & Giesbrecht, E.) 1–9 (Association for Computational Linguistics, 2011).
Speer, R., Chin, J., Lin, A., Jewett, S. & Nathan, L. Luminosoinsight/wordfreq: v.2.2 (2018); https://doi.org/10.5281/zenodo.1443582
Pagel, M., Atkinson, Q. D. & Meade, A. Frequency of word-use predicts rates of lexical evolution throughout Indo-European history. Nature 449, 717–720 (2007).
CAS PubMed Google Scholar
Kirby, K. R. et al. D-place: a global database of cultural, linguistic and environmental diversity. PloS ONE 11, e0158391 (2016).
PubMed PubMed Central Google Scholar
Murdock, G. P. & Provost, C. Factors in the division of labor by sex: a cross-cultural analysis. Ethnology 12, 203–225 (1973).
Google Scholar
Sellen, D. W. & Smay, D. B. Relationships between subsistence and age at weaning in "preindustrial" societies. Human Nat. 12, 47–87 (2001).
CAS Google Scholar
Apostolou, M. Bridewealth as an instrument of male parental control over mating: evidence from the standard cross-cultural sample. J. Evol. Psychol. 8, 205–216 (2010).
Google Scholar
Meggers, B. J. Environmental limitation on the development of culture. Am. Anthropol. 56, 801–824 (1954).
Google Scholar
Peoples, H. C. & Marlowe, F. W. Subsistence and the evolution of religion. Human Nat. 23, 253–269 (2012).
Google Scholar
Botero, C. A. et al. The ecology of religious beliefs. Proc. Natl Acad. Sci. USA 111, 16784–16789 (2014).
CAS PubMed Google Scholar
Gavin, M. C. et al. The global geography of human subsistence. R. Soc. Open Sci. 5, 171897 (2018).
PubMed PubMed Central Google Scholar
Martin, M. K. & Voorhies, B. Female of the Species (Columbia Univ. Press, 1975).
Goodenough, W. H. Basic economy and community. Behav. Sci. Notes 4, 291–298 (1969).
Google Scholar
Lakoff, G., Espenson, J. & Schwartz, A. The Mastermetaphor List 2nd ed. (Univ. California Press, 1994)
Wiseman, R. Interpreting ancient social organization: conceptual metaphors and image schemas. Time Mind 8, 159–190 (2015).
Google Scholar
Bouckaert, R. et al. Mapping the origins and expansion of the Indo-European language family. Science 337, 957–960 (2012).
CAS PubMed PubMed Central Google Scholar
Hammarström, H., Forkel, R., Haspelmath, M. & Bank, S. clld/glottolog: Glottolog database 4.2.1 https://glottolog.org/ (Max Planck Institute for the Science of Human History, 2020).
Srinivasan, M. & Rabagliati, H. How concepts and conventions structure the lexicon: cross-linguistic evidence from polysemy. Lingua 157, 124–152 (2015).
Google Scholar
Gordon, P. Numerical cognition without words: evidence from Amazonia. Science 306, 496–499 (2004).
CAS PubMed Google Scholar
Tillman, K. A. & Barner, D. Learning the language of time: children’s acquisition of duration words. Cogn. Psychol. 78, 57–77 (2015).
PubMed Google Scholar
Gelman, R. & Butterworth, B. Number and language: how are they related? Trends Cogn. Sci. 9, 6–10 (2005).
PubMed Google Scholar
Chen, D., Peterson, J. C. & Griffiths, T. L. Evaluating vector-space models of analogy. Preprint at arXiv https://arxiv.org/abs/1705.04416 (2017).
Huebner, P. A. & Willits, J. A. Structured semantic knowledge can emerge automatically from predicting word sequences in child-directed speech. Front. Psychol. 9, 133 (2018).
PubMed PubMed Central Google Scholar
Ramiro, C., Srinivasan, M., Malt, B. C. & Xu, Y. Algorithms in the historical emergence of word senses. Proc. Natl Acad. Sci. USA 115, 2323–2328 (2018).
CAS PubMed Google Scholar
Grave, E., Joulin, A. & Berthet, Q. Unsupervised alignment of embeddings with wasserstein procrustes. Preprint at arXiv https://arxiv.org/abs/1805.11222 (2018).
Peters, M. E. et al. Deep contextualized word representations. Preprint at arXiv https://arxiv.org/abs/1802.05365 (2018).
Wierzbicka, A. Semantics: Primes and Universals (Oxford Univ. Press, 1996).
Goddard, C. & Wierzbicka, A. Meaning and Universal Grammar: Theory and Empirical Findings (John Benjamins Publishing, 2002).
Aitchison, J. Words in the Mind: An Introduction to the Mental Lexicon 4th edn (Wiley-Blackwell, 2012).
Matthewson, L. Is the meta-language really natural? Theor. Linguist. 29, 263–274 (2008).
List, J. M., Greenhill, S., Rzymski, C., Schweikhard, N. & Forkel, R. (eds) Concepticon 2.0 (Max Planck Institute for the Science of Human History, 2019).
Conneau, A., Lample, G., Ranzato, M., Denoyer, L. & Jégou, H. Word translation without parallel data. Preprint at arXiv https://arxiv.org/abs/1710.04087 (2017).
van Buuren, S. & Groothuis-Oudshoorn, K. mice: multivariate imputation by chained equations in R. J. Stat. Soft. 1–68 (2010).
Huson, D. H. & Bryant, D. Application of phylogenetic networks in evolutionary studies. Mol. Biol. Evol. 23, 254–267 (2005).
PubMed Google Scholar

Download references

Acknowledgements

We thank J. Dellert and A. Majid. B.T. and G.L. acknowledge support from LEVINSON fellowships at the Max Planck Institute for Psycholinguistics. S.G.R. was partially supported by a Leverhulme early career fellowship (ECF-2016-435). G.L. was partially supported by NSF-PAC 1734260. The funders had no role in the conceptualization, design, data collection, analysis, decision to publish or preparation of the manuscript.

Author information

Authors and Affiliations

Department of Computer Science, Princeton University, Princeton, NJ, USA
Bill Thompson
School of English, Communication and Philosophy, Cardiff University, Cardiff, UK
Seán G. Roberts
Department of Anthropology and Archaeology, University of Bristol, Bristol, UK
Seán G. Roberts
Department of Psychology, University of Wisconsin-Madison, Madison, WI, USA
Gary Lupyan

Authors

Bill Thompson
View author publications
You can also search for this author in PubMed Google Scholar
Seán G. Roberts
View author publications
You can also search for this author in PubMed Google Scholar
Gary Lupyan
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

B.T., S.G.R. and G.L. designed the research, collected and analysed data, and contributed to the writing of the manuscript.

Corresponding author

Correspondence to Bill Thompson.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Primary Handling Editor: Charlotte Payne.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Information, sections 1–5.

Reporting Summary

Rights and permissions

Reprints and permissions

About this article

Cite this article

Thompson, B., Roberts, S.G. & Lupyan, G. Cultural influences on word meanings revealed through large-scale semantic alignment. Nat Hum Behav 4, 1029–1038 (2020). https://doi.org/10.1038/s41562-020-0924-8

Download citation

Received: 27 September 2019
Accepted: 02 July 2020
Published: 10 August 2020
Issue Date: October 2020
DOI: https://doi.org/10.1038/s41562-020-0924-8

This article is cited by

Human languages with greater information density have higher communication speed but lower conversation breadth
- Pedro Aceves
- James A. Evans
Nature Human Behaviour (2024)
Modelling individual and cross-cultural variation in the mapping of emotions to speech prosody
- Pol van Rijn
- Pauline Larrouy-Maestri
Nature Human Behaviour (2023)
Appealing dish names to nudge diners to more sustainable food choices: a quasi-experimental study
- Anna Gavrieli
- Sophie Attwood
- Michiel Bakker
BMC Public Health (2022)
The cultural evolution of emotion
- Kristen A. Lindquist
- Joshua Conrad Jackson
- Maria Gendron
Nature Reviews Psychology (2022)
Lexibank, a public repository of standardized wordlists with computed phonological and lexical features
- Johann-Mattis List
- Robert Forkel
- Russell D. Gray
Scientific Data (2022)