Abstract
If the structure of language vocabularies mirrors the structure of natural divisions that are universally perceived, then the meanings of words in different languages should closely align. By contrast, if shared word meanings are a product of shared culture, history and geography, they may differ between languages in substantial but predictable ways. Here, we analysed the semantic neighbourhoods of 1,010 meanings in 41 languages. The most-aligned words were from semantic domains with high internal structure (number, quantity and kinship). Words denoting natural kinds, common actions and artefacts aligned much less well. Languages that are more geographically proximate, more historically related and/or spoken by more-similar cultures had more aligned word meanings. These results provide evidence that the meanings of common words vary in ways that reflect the culture, history and geography of their users.
This is a preview of subscription content, access via your institution
Relevant articles
Open Access articles citing this article.
-
Modelling individual and cross-cultural variation in the mapping of emotions to speech prosody
Nature Human Behaviour Open Access 16 January 2023
-
Appealing dish names to nudge diners to more sustainable food choices: a quasi-experimental study
BMC Public Health Open Access 30 November 2022
-
Lexibank, a public repository of standardized wordlists with computed phonological and lexical features
Scientific Data Open Access 16 June 2022
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Rent or buy this article
Get just this article for as long as you need it
$39.95
Prices may be subject to local taxes which are calculated during checkout






Data availability
Data and reproducible analyses are available at https://osf.io/tngba/.
Code availability
Code to implement the alignment algorithm is available at https://osf.io/tngba/.
References
Gleitman, L. & Fisher, C. In The Cambridge Companion to Chomsky (ed. McGilvray, J.) 123–142 (Cambridge Univ. Press, 2005).
Snedeker, J. & Gleitman, L. in Weaving a Lexicon illustrated edn (eds. Hall, D. G. & Waxman, S.) 257–294 (MIT Press, 2004).
Pinker, S. The Language Instinct (Harper Collins, 1994).
Berlin, B. & Kay, P. Basic Color Terms: Their Universality and Evolution (Univ. California Press, 1969).
Li, P. & Gleitman, L. Turning the tables: language and spatial reasoning. Cognition 83, 265–294 (2002).
Evans, N. & Levinson, S. C. The myth of language universals: language diversity and its importance for cognitive science. Behav. Brain Sci. 32, 429–448 (2009).
Lupyan, G. The centrality of language in human cognition. Lang. Learn. 66, 516–553 (2016).
Lupyan, G. & Dale, R. Why are there different languages? The role of adaptation in linguistic diversity. Trends Cogn. Sci. 20, 649–660 (2016).
Davidson, D. On the very idea of a conceptual scheme. P. Am. Philos. Soc. 47, 5–20 (1973).
Lupyan, G. & Zettersten, M. in Minnesota Symposia on Child Psychology Vol. 40 (in the press).
Whorf, B. Language, Thought, and Reality (MIT Press, 1956).
Zgusta, L. Manual of Lexicography (Mouton, 1971).
Haspelmath, M. Lexical Borrowing: Concepts and Issues. Loanwords in the World’s Languages: A Comparative Handbook 35–54 (De Gruyter Mouton, 2009).
Myers-Scotton, C. Contact Linguistics: Bilingual Encounters and Grammatical Outcomes (Oxford Univ. Press, 2002).
Xu, Y., Duong, K., Malt, B. C., Jiang, S. & Srinivasan, M. Conceptual relations predict colexification across languages. Cognition 201, 104280 (2020).
Regier, T., Carstensen, A. & Kemp, C. Languages support efficient communication about the environment: words for snow revisited. PloS ONE 11, e0151138 (2016).
Winter, B., Perlman, M. & Majid, A. Vision dominates in perceptual language: English sensory vocabulary is optimized for usage. Cognition 179, 213–220 (2018).
Majid, A. et al. Differential coding of perception in the world’s languages. Proc. Natl Acad. Sci. USA 115, 11369–11376 (2018).
San Roque, L., Kendrick, K. H., Norcliffe, E. & Majid, A. Universal meaning extensions of perception verbs are grounded in interaction. Cogn. Linguist. 29, 371–406 (2018).
Svensén, B. A Handbook of Lexicography: The Theory and Practice of Dictionary-Making 1st edn (Cambridge Univ. Press, 2009).
Cuyckens, H., Dirven, R. & Taylor, J. R. Cognitive Approaches to Lexical Semantics Vol. 23 (Walter de Gruyter, 2009).
Barnett, G. A. Bilingual semantic organization: a multidimensional analysis. Journal of Cross Cult. Psychol. 8, 315–330 (1977).
Moldovan, C. D., Sánchez-Casas, R., Demestre, J. & Ferré, P. Interference effects as a function of semantic similarity in the translation recognition task in bilinguals of Catalan and Spanish. Psicologica 33, 77–110 (2012).
Tokowicz, N., Kroll, J. F., De Groot, A. M. & Van Hell, J. G. Number-of-translation norms for Dutch–English translation pairs: a new tool for examining language production. Behav. Res. Methods Instrum. Comput. 34, 435–451 (2002).
Dijkstra, T., Miwa, K., Brummelhuis, B., Sappelli, M. & Baayen, H. How cross-language similarity and task demands affect cognate recognition. J. Mem. Lang. 62, 284–301 (2010).
Allen, D. & Conklin, K. Cross-linguistic similarity norms for Japanese–English translation equivalents. Behavi. Res. Methods 46, 540–563 (2014).
Bradley, M. M. & Lang, P. J. Affective Norms for English Words (ANEW): Instruction Manual and Affective Ratings https://www.uvm.edu/pdodds/teaching/courses/2009-08UVM-300/docs/others/everything/bradley1999a.pdf (University of Florida, 1999).
Fairfield, B., Ambrosini, E., Mammarella, N. & Montefinese, M. Affective norms for italian words in older adults: age differences in ratings of valence, arousal and dominance. PLoS ONE 12, e0169472 (2017).
Miller, G. A., Beckwith, R., Fellbaum, C., Gross, D. & Miller, K. J. Introduction to wordnet: an on-line lexical database. Int. J. Lexicogr. 3, 235–244 (1990).
Sigman, M. & Cecchi, G. A. Global organization of the wordnet lexicon. Proc. Natl Acad. Sci. USA 99, 1742–1747 (2002).
Majid, A., Jordan, F. & Dunn, M. Semantic systems in closely related languages. Lang. Sci. 49, 1–18 (2015).
Calude, A. S. & Verkerk, A. The typology and diachrony of higher numerals in Indo-European: a phylogenetic comparative study. J. Lang. Evol. 1, 91–108 (2016).
Verkerk, A. Where do all the motion verbs come from? The speed of development of manner verbs and path verbs in Indo-European. Diachronica 32, 69–104 (2015).
Youn, H. et al. On the universal structure of human lexical semantics. Proc. Natl Acad. Sci. USA 113, 1766–1771 (2016).
Vivas, L., Montefinese, M., Bolognesi, M. & Vivas, J. Core features: measures and characterization for different languages. Cogn. Process. https://doi.org/10.1007/s10339-020-00969-5 (2020).
Jackson, J. C. et al. Emotion semantics show both cultural variation and universal structure. Science 366, 1517–1522 (2019).
Firth, J. R. Papers in Linguistics 1934-1951 (Oxford Univ. Press, 1957).
Lund, K. & Burgess, C. Producing high-dimensional semantic spaces from lexical co-occurrence. Behav. Res. Methods Instrum. Comput. 28, 203–208 (1996).
Elman, J. An alternative view of the mental lexicon. Trends Cogn. Sci. 8, 301–306 (2004).
Landauer, T. K. & Dumais, S. T. A solution to Plato’s problem: the latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychol. Review 104, 211–240 (1997).
Mikolov, T., Chen, K., Corrado, G. & Dean, J. Efficient estimation of word representations in vector space. Preprint at arXiv https://arxiv.org/abs/1301.3781 (2013).
Baroni, M., Dinu, G. & Kruszewski, G. Don’t count, predict! a systematic comparison of context-counting vs. context-predicting semantic vectors. in Proc. 52nd Annual Meeting of the Association for Computational Linguistics Vol. 1 (eds Toutanova, K. & Wu, H.) 238–247 (Association for Computational Linguistics, 2014).
Baroni, M. & Lenci, A. Distributional memory: a general framework for corpus-based semantics. Comput. Linguist. 36, 673–721 (2010).
Hollis, G. & Westbury, C. The principals of meaning: extracting semantic dimensions from co-occurrence models of semantics. Psychon. Bull. Rev. 23, 1744–1756 (2016).
Nematzadeh, A., Meylan, S. C. & Griffiths, T. L. Evaluating vector-space models of word representation, or, the unreasonable effectiveness of counting words near other words. in Proc. 39th Annual Meeting of the Cognitive Science Society (eds Granger, R., Hahn, U. & Sutton, R.) 859–864 (Cognitive Science Society, 2017).
Levy, O. & Goldberg, Y. Neural word embedding as implicit matrix factorization. Proc. 27th International Conference on Neural Information Processing Systems (eds Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N. D. & Weinberger, K. Q.) 2177–2185 (MIT Press, 2014).
Hill, F., Reichart, R. & Korhonen, A. Simlex-999: Evaluating semantic models with (genuine) similarity estimation. Comput. Linguist. 41, 665–695 (2015).
Caliskan, A., Bryson, J. J. & Narayanan, A. Semantics derived automatically from language corpora contain human-like biases. Science 356, 183–186 (2017).
Boleda, G. Distributional semantics and linguistic theory. Ann. Rev. Linguist. 6, 213–234 (2020).
Tshitoyan, V. et al. Unsupervised word embeddings capture latent knowledge from materials science literature. Nature 571, 95–98 (2019).
De Deyne, S., Perfors, A. & Navarro, D. J. Predicting human similarity judgments with distributional models: the value of word associations. in Proc. COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers (eds Matsumoto, Y. & Prasad, R.) 1861–1870 (COLING 2016 Organizing Committee, 2016).
Šipka, D. Lexical Conflict: Theory and Practice (Cambridge Univ. Press, 2015).
Dellert, J. et al. NorthEuraLex: a wide-coverage lexical database of Northern Eurasia. Lang. Resour. Eval. 54, 273–301 (2020).
Bojanowski, P., Grave, E., Joulin, A. & Mikolov, T. Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017).
Lison, P. & Tiedemann, J. Opensubtitles2016: extracting large parallel corpora from movie and TV subtitles. In Proc. International Conference on Language Resources and Evaluation (LREC 2016) (eds Calzolari, N., et al) 923–929 (European Language Resources Association, 2016).
Grave, E., Bojanowski, P., Gupta, P., Joulin, A. & Mikolov, T. Learning word vectors for 157 languages. In Proc. International Conference on Language Resources and Evaluation (LREC 2018) (eds Nicoletta Calzolari, N. et al) 3483–3487 (European Language Resources Association, 2018).
Duñabeitia, J. A. et al. MultiPic: a standardized set of 750 drawings with norms for six European languages. Q. J. Exp. Psychol. 71, 808–816 (2018).
Brysbaert, M., Warriner, A. B. & Kuperman, V. Concreteness ratings for 40 thousand generally known English word lemmas. Behav. Res. Methods 46, 904–911 (2014).
Jones, D. Human kinship, from conceptual structure to grammar. Behav. Brain Sci. 33, 367–381 (2010).
Kemp, C. & Regier, T. Kinship categories across languages reflect general communicative principles. Science 336, 1049–1054 (2012).
Givón, T. On the development of the numeral ‘one’ as an indefinite marker. Folia Linguist. Hist. 15, 35–54 (1981).
Rzymski, C. et al. The database of cross-linguistic colexifications, reproducible analysis of cross-linguistic polysemies. Sci. Data 7, 13 (2020).
Dehaene, S. & Mehler, J. Cross-linguistic regularities in the frequency of number words. Cognition 43, 1–29 (1992).
Vecchi, E. M., Baroni, M. & Zamparelli, R. Linear maps of the impossible: capturing semantic anomalies in distributional space. in Proc. Workshop on Distributional Semantics and Compositionality (eds Biemann, C. & Giesbrecht, E.) 1–9 (Association for Computational Linguistics, 2011).
Speer, R., Chin, J., Lin, A., Jewett, S. & Nathan, L. Luminosoinsight/wordfreq: v.2.2 (2018); https://doi.org/10.5281/zenodo.1443582
Pagel, M., Atkinson, Q. D. & Meade, A. Frequency of word-use predicts rates of lexical evolution throughout Indo-European history. Nature 449, 717–720 (2007).
Kirby, K. R. et al. D-place: a global database of cultural, linguistic and environmental diversity. PloS ONE 11, e0158391 (2016).
Murdock, G. P. & Provost, C. Factors in the division of labor by sex: a cross-cultural analysis. Ethnology 12, 203–225 (1973).
Sellen, D. W. & Smay, D. B. Relationships between subsistence and age at weaning in "preindustrial" societies. Human Nat. 12, 47–87 (2001).
Apostolou, M. Bridewealth as an instrument of male parental control over mating: evidence from the standard cross-cultural sample. J. Evol. Psychol. 8, 205–216 (2010).
Meggers, B. J. Environmental limitation on the development of culture. Am. Anthropol. 56, 801–824 (1954).
Peoples, H. C. & Marlowe, F. W. Subsistence and the evolution of religion. Human Nat. 23, 253–269 (2012).
Botero, C. A. et al. The ecology of religious beliefs. Proc. Natl Acad. Sci. USA 111, 16784–16789 (2014).
Gavin, M. C. et al. The global geography of human subsistence. R. Soc. Open Sci. 5, 171897 (2018).
Martin, M. K. & Voorhies, B. Female of the Species (Columbia Univ. Press, 1975).
Goodenough, W. H. Basic economy and community. Behav. Sci. Notes 4, 291–298 (1969).
Lakoff, G., Espenson, J. & Schwartz, A. The Mastermetaphor List 2nd ed. (Univ. California Press, 1994)
Wiseman, R. Interpreting ancient social organization: conceptual metaphors and image schemas. Time Mind 8, 159–190 (2015).
Bouckaert, R. et al. Mapping the origins and expansion of the Indo-European language family. Science 337, 957–960 (2012).
Hammarström, H., Forkel, R., Haspelmath, M. & Bank, S. clld/glottolog: Glottolog database 4.2.1 https://glottolog.org/ (Max Planck Institute for the Science of Human History, 2020).
Srinivasan, M. & Rabagliati, H. How concepts and conventions structure the lexicon: cross-linguistic evidence from polysemy. Lingua 157, 124–152 (2015).
Gordon, P. Numerical cognition without words: evidence from Amazonia. Science 306, 496–499 (2004).
Tillman, K. A. & Barner, D. Learning the language of time: children’s acquisition of duration words. Cogn. Psychol. 78, 57–77 (2015).
Gelman, R. & Butterworth, B. Number and language: how are they related? Trends Cogn. Sci. 9, 6–10 (2005).
Chen, D., Peterson, J. C. & Griffiths, T. L. Evaluating vector-space models of analogy. Preprint at arXiv https://arxiv.org/abs/1705.04416 (2017).
Huebner, P. A. & Willits, J. A. Structured semantic knowledge can emerge automatically from predicting word sequences in child-directed speech. Front. Psychol. 9, 133 (2018).
Ramiro, C., Srinivasan, M., Malt, B. C. & Xu, Y. Algorithms in the historical emergence of word senses. Proc. Natl Acad. Sci. USA 115, 2323–2328 (2018).
Grave, E., Joulin, A. & Berthet, Q. Unsupervised alignment of embeddings with wasserstein procrustes. Preprint at arXiv https://arxiv.org/abs/1805.11222 (2018).
Peters, M. E. et al. Deep contextualized word representations. Preprint at arXiv https://arxiv.org/abs/1802.05365 (2018).
Wierzbicka, A. Semantics: Primes and Universals (Oxford Univ. Press, 1996).
Goddard, C. & Wierzbicka, A. Meaning and Universal Grammar: Theory and Empirical Findings (John Benjamins Publishing, 2002).
Aitchison, J. Words in the Mind: An Introduction to the Mental Lexicon 4th edn (Wiley-Blackwell, 2012).
Matthewson, L. Is the meta-language really natural? Theor. Linguist. 29, 263–274 (2008).
List, J. M., Greenhill, S., Rzymski, C., Schweikhard, N. & Forkel, R. (eds) Concepticon 2.0 (Max Planck Institute for the Science of Human History, 2019).
Conneau, A., Lample, G., Ranzato, M., Denoyer, L. & Jégou, H. Word translation without parallel data. Preprint at arXiv https://arxiv.org/abs/1710.04087 (2017).
van Buuren, S. & Groothuis-Oudshoorn, K. mice: multivariate imputation by chained equations in R. J. Stat. Soft. 1–68 (2010).
Huson, D. H. & Bryant, D. Application of phylogenetic networks in evolutionary studies. Mol. Biol. Evol. 23, 254–267 (2005).
Acknowledgements
We thank J. Dellert and A. Majid. B.T. and G.L. acknowledge support from LEVINSON fellowships at the Max Planck Institute for Psycholinguistics. S.G.R. was partially supported by a Leverhulme early career fellowship (ECF-2016-435). G.L. was partially supported by NSF-PAC 1734260. The funders had no role in the conceptualization, design, data collection, analysis, decision to publish or preparation of the manuscript.
Author information
Authors and Affiliations
Contributions
B.T., S.G.R. and G.L. designed the research, collected and analysed data, and contributed to the writing of the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Peer review information Primary Handling Editor: Charlotte Payne.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Information, sections 1–5.
Rights and permissions
About this article
Cite this article
Thompson, B., Roberts, S.G. & Lupyan, G. Cultural influences on word meanings revealed through large-scale semantic alignment. Nat Hum Behav 4, 1029–1038 (2020). https://doi.org/10.1038/s41562-020-0924-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41562-020-0924-8
This article is cited by
-
Modelling individual and cross-cultural variation in the mapping of emotions to speech prosody
Nature Human Behaviour (2023)
-
Appealing dish names to nudge diners to more sustainable food choices: a quasi-experimental study
BMC Public Health (2022)
-
The cultural evolution of emotion
Nature Reviews Psychology (2022)
-
Lexibank, a public repository of standardized wordlists with computed phonological and lexical features
Scientific Data (2022)
-
How does scientific progress affect cultural changes? A digital text analysis
Journal of Economic Growth (2022)