Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Cultural influences on word meanings revealed through large-scale semantic alignment

Abstract

If the structure of language vocabularies mirrors the structure of natural divisions that are universally perceived, then the meanings of words in different languages should closely align. By contrast, if shared word meanings are a product of shared culture, history and geography, they may differ between languages in substantial but predictable ways. Here, we analysed the semantic neighbourhoods of 1,010 meanings in 41 languages. The most-aligned words were from semantic domains with high internal structure (number, quantity and kinship). Words denoting natural kinds, common actions and artefacts aligned much less well. Languages that are more geographically proximate, more historically related and/or spoken by more-similar cultures had more aligned word meanings. These results provide evidence that the meanings of common words vary in ways that reflect the culture, history and geography of their users.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: High alignment between English (‘Tuesday’) and Danish (‘Tirsdag’).
Fig. 2: Low alignment between English (‘beautiful’) and French (‘beau’).
Fig. 3: Semantic alignment of 21 semantic domains.
Fig. 4: Semantic alignment of number words.
Fig. 5: Semantic alignment by part of speech.
Fig. 6: Semantic distances for Indo-European languages.

Similar content being viewed by others

Data availability

Data and reproducible analyses are available at https://osf.io/tngba/.

Code availability

Code to implement the alignment algorithm is available at https://osf.io/tngba/.

References

  1. Gleitman, L. & Fisher, C. In The Cambridge Companion to Chomsky (ed. McGilvray, J.) 123–142 (Cambridge Univ. Press, 2005).

  2. Snedeker, J. & Gleitman, L. in Weaving a Lexicon illustrated edn (eds. Hall, D. G. & Waxman, S.) 257–294 (MIT Press, 2004).

  3. Pinker, S. The Language Instinct (Harper Collins, 1994).

  4. Berlin, B. & Kay, P. Basic Color Terms: Their Universality and Evolution (Univ. California Press, 1969).

  5. Li, P. & Gleitman, L. Turning the tables: language and spatial reasoning. Cognition 83, 265–294 (2002).

    CAS  PubMed  Google Scholar 

  6. Evans, N. & Levinson, S. C. The myth of language universals: language diversity and its importance for cognitive science. Behav. Brain Sci. 32, 429–448 (2009).

    PubMed  Google Scholar 

  7. Lupyan, G. The centrality of language in human cognition. Lang. Learn. 66, 516–553 (2016).

    Google Scholar 

  8. Lupyan, G. & Dale, R. Why are there different languages? The role of adaptation in linguistic diversity. Trends Cogn. Sci. 20, 649–660 (2016).

    PubMed  Google Scholar 

  9. Davidson, D. On the very idea of a conceptual scheme. P. Am. Philos. Soc. 47, 5–20 (1973).

    Google Scholar 

  10. Lupyan, G. & Zettersten, M. in Minnesota Symposia on Child Psychology Vol. 40 (in the press).

  11. Whorf, B. Language, Thought, and Reality (MIT Press, 1956).

  12. Zgusta, L. Manual of Lexicography (Mouton, 1971).

  13. Haspelmath, M. Lexical Borrowing: Concepts and Issues. Loanwords in the World’s Languages: A Comparative Handbook 35–54 (De Gruyter Mouton, 2009).

  14. Myers-Scotton, C. Contact Linguistics: Bilingual Encounters and Grammatical Outcomes (Oxford Univ. Press, 2002).

  15. Xu, Y., Duong, K., Malt, B. C., Jiang, S. & Srinivasan, M. Conceptual relations predict colexification across languages. Cognition 201, 104280 (2020).

    PubMed  Google Scholar 

  16. Regier, T., Carstensen, A. & Kemp, C. Languages support efficient communication about the environment: words for snow revisited. PloS ONE 11, e0151138 (2016).

    PubMed  PubMed Central  Google Scholar 

  17. Winter, B., Perlman, M. & Majid, A. Vision dominates in perceptual language: English sensory vocabulary is optimized for usage. Cognition 179, 213–220 (2018).

    PubMed  Google Scholar 

  18. Majid, A. et al. Differential coding of perception in the world’s languages. Proc. Natl Acad. Sci. USA 115, 11369–11376 (2018).

    CAS  PubMed  Google Scholar 

  19. San Roque, L., Kendrick, K. H., Norcliffe, E. & Majid, A. Universal meaning extensions of perception verbs are grounded in interaction. Cogn. Linguist. 29, 371–406 (2018).

    Google Scholar 

  20. Svensén, B. A Handbook of Lexicography: The Theory and Practice of Dictionary-Making 1st edn (Cambridge Univ. Press, 2009).

  21. Cuyckens, H., Dirven, R. & Taylor, J. R. Cognitive Approaches to Lexical Semantics Vol. 23 (Walter de Gruyter, 2009).

  22. Barnett, G. A. Bilingual semantic organization: a multidimensional analysis. Journal of Cross Cult. Psychol. 8, 315–330 (1977).

    Google Scholar 

  23. Moldovan, C. D., Sánchez-Casas, R., Demestre, J. & Ferré, P. Interference effects as a function of semantic similarity in the translation recognition task in bilinguals of Catalan and Spanish. Psicologica 33, 77–110 (2012).

    Google Scholar 

  24. Tokowicz, N., Kroll, J. F., De Groot, A. M. & Van Hell, J. G. Number-of-translation norms for Dutch–English translation pairs: a new tool for examining language production. Behav. Res. Methods Instrum. Comput. 34, 435–451 (2002).

    PubMed  Google Scholar 

  25. Dijkstra, T., Miwa, K., Brummelhuis, B., Sappelli, M. & Baayen, H. How cross-language similarity and task demands affect cognate recognition. J. Mem. Lang. 62, 284–301 (2010).

    Google Scholar 

  26. Allen, D. & Conklin, K. Cross-linguistic similarity norms for Japanese–English translation equivalents. Behavi. Res. Methods 46, 540–563 (2014).

    Google Scholar 

  27. Bradley, M. M. & Lang, P. J. Affective Norms for English Words (ANEW): Instruction Manual and Affective Ratings https://www.uvm.edu/pdodds/teaching/courses/2009-08UVM-300/docs/others/everything/bradley1999a.pdf (University of Florida, 1999).

  28. Fairfield, B., Ambrosini, E., Mammarella, N. & Montefinese, M. Affective norms for italian words in older adults: age differences in ratings of valence, arousal and dominance. PLoS ONE 12, e0169472 (2017).

  29. Miller, G. A., Beckwith, R., Fellbaum, C., Gross, D. & Miller, K. J. Introduction to wordnet: an on-line lexical database. Int. J. Lexicogr. 3, 235–244 (1990).

    Google Scholar 

  30. Sigman, M. & Cecchi, G. A. Global organization of the wordnet lexicon. Proc. Natl Acad. Sci. USA 99, 1742–1747 (2002).

    CAS  PubMed  Google Scholar 

  31. Majid, A., Jordan, F. & Dunn, M. Semantic systems in closely related languages. Lang. Sci. 49, 1–18 (2015).

  32. Calude, A. S. & Verkerk, A. The typology and diachrony of higher numerals in Indo-European: a phylogenetic comparative study. J. Lang. Evol. 1, 91–108 (2016).

    Google Scholar 

  33. Verkerk, A. Where do all the motion verbs come from? The speed of development of manner verbs and path verbs in Indo-European. Diachronica 32, 69–104 (2015).

    Google Scholar 

  34. Youn, H. et al. On the universal structure of human lexical semantics. Proc. Natl Acad. Sci. USA 113, 1766–1771 (2016).

    CAS  PubMed  Google Scholar 

  35. Vivas, L., Montefinese, M., Bolognesi, M. & Vivas, J. Core features: measures and characterization for different languages. Cogn. Process. https://doi.org/10.1007/s10339-020-00969-5 (2020).

  36. Jackson, J. C. et al. Emotion semantics show both cultural variation and universal structure. Science 366, 1517–1522 (2019).

  37. Firth, J. R. Papers in Linguistics 1934-1951 (Oxford Univ. Press, 1957).

  38. Lund, K. & Burgess, C. Producing high-dimensional semantic spaces from lexical co-occurrence. Behav. Res. Methods Instrum. Comput. 28, 203–208 (1996).

    Google Scholar 

  39. Elman, J. An alternative view of the mental lexicon. Trends Cogn. Sci. 8, 301–306 (2004).

    PubMed  Google Scholar 

  40. Landauer, T. K. & Dumais, S. T. A solution to Plato’s problem: the latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychol. Review 104, 211–240 (1997).

    Google Scholar 

  41. Mikolov, T., Chen, K., Corrado, G. & Dean, J. Efficient estimation of word representations in vector space. Preprint at arXiv https://arxiv.org/abs/1301.3781 (2013).

  42. Baroni, M., Dinu, G. & Kruszewski, G. Don’t count, predict! a systematic comparison of context-counting vs. context-predicting semantic vectors. in Proc. 52nd Annual Meeting of the Association for Computational Linguistics Vol. 1 (eds Toutanova, K. & Wu, H.) 238–247 (Association for Computational Linguistics, 2014).

  43. Baroni, M. & Lenci, A. Distributional memory: a general framework for corpus-based semantics. Comput. Linguist. 36, 673–721 (2010).

    Google Scholar 

  44. Hollis, G. & Westbury, C. The principals of meaning: extracting semantic dimensions from co-occurrence models of semantics. Psychon. Bull. Rev. 23, 1744–1756 (2016).

    PubMed  Google Scholar 

  45. Nematzadeh, A., Meylan, S. C. & Griffiths, T. L. Evaluating vector-space models of word representation, or, the unreasonable effectiveness of counting words near other words. in Proc. 39th Annual Meeting of the Cognitive Science Society (eds Granger, R., Hahn, U. & Sutton, R.) 859–864 (Cognitive Science Society, 2017).

  46. Levy, O. & Goldberg, Y. Neural word embedding as implicit matrix factorization. Proc. 27th International Conference on Neural Information Processing Systems (eds Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N. D. & Weinberger, K. Q.) 2177–2185 (MIT Press, 2014).

  47. Hill, F., Reichart, R. & Korhonen, A. Simlex-999: Evaluating semantic models with (genuine) similarity estimation. Comput. Linguist. 41, 665–695 (2015).

  48. Caliskan, A., Bryson, J. J. & Narayanan, A. Semantics derived automatically from language corpora contain human-like biases. Science 356, 183–186 (2017).

    CAS  PubMed  Google Scholar 

  49. Boleda, G. Distributional semantics and linguistic theory. Ann. Rev. Linguist. 6, 213–234 (2020).

    Google Scholar 

  50. Tshitoyan, V. et al. Unsupervised word embeddings capture latent knowledge from materials science literature. Nature 571, 95–98 (2019).

    CAS  PubMed  Google Scholar 

  51. De Deyne, S., Perfors, A. & Navarro, D. J. Predicting human similarity judgments with distributional models: the value of word associations. in Proc. COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers (eds Matsumoto, Y. & Prasad, R.) 1861–1870 (COLING 2016 Organizing Committee, 2016).

  52. Šipka, D. Lexical Conflict: Theory and Practice (Cambridge Univ. Press, 2015).

  53. Dellert, J. et al. NorthEuraLex: a wide-coverage lexical database of Northern Eurasia. Lang. Resour. Eval. 54, 273–301 (2020).

    PubMed  Google Scholar 

  54. Bojanowski, P., Grave, E., Joulin, A. & Mikolov, T. Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017).

    Google Scholar 

  55. Lison, P. & Tiedemann, J. Opensubtitles2016: extracting large parallel corpora from movie and TV subtitles. In Proc. International Conference on Language Resources and Evaluation (LREC 2016) (eds Calzolari, N., et al) 923–929 (European Language Resources Association, 2016).

  56. Grave, E., Bojanowski, P., Gupta, P., Joulin, A. & Mikolov, T. Learning word vectors for 157 languages. In Proc. International Conference on Language Resources and Evaluation (LREC 2018) (eds Nicoletta Calzolari, N. et al) 3483–3487 (European Language Resources Association, 2018).

  57. Duñabeitia, J. A. et al. MultiPic: a standardized set of 750 drawings with norms for six European languages. Q. J. Exp. Psychol. 71, 808–816 (2018).

    Google Scholar 

  58. Brysbaert, M., Warriner, A. B. & Kuperman, V. Concreteness ratings for 40 thousand generally known English word lemmas. Behav. Res. Methods 46, 904–911 (2014).

    PubMed  Google Scholar 

  59. Jones, D. Human kinship, from conceptual structure to grammar. Behav. Brain Sci. 33, 367–381 (2010).

    PubMed  Google Scholar 

  60. Kemp, C. & Regier, T. Kinship categories across languages reflect general communicative principles. Science 336, 1049–1054 (2012).

    CAS  PubMed  Google Scholar 

  61. Givón, T. On the development of the numeral ‘one’ as an indefinite marker. Folia Linguist. Hist. 15, 35–54 (1981).

    Google Scholar 

  62. Rzymski, C. et al. The database of cross-linguistic colexifications, reproducible analysis of cross-linguistic polysemies. Sci. Data 7, 13 (2020).

    PubMed  PubMed Central  Google Scholar 

  63. Dehaene, S. & Mehler, J. Cross-linguistic regularities in the frequency of number words. Cognition 43, 1–29 (1992).

    CAS  PubMed  Google Scholar 

  64. Vecchi, E. M., Baroni, M. & Zamparelli, R. Linear maps of the impossible: capturing semantic anomalies in distributional space. in Proc. Workshop on Distributional Semantics and Compositionality (eds Biemann, C. & Giesbrecht, E.) 1–9 (Association for Computational Linguistics, 2011).

  65. Speer, R., Chin, J., Lin, A., Jewett, S. & Nathan, L. Luminosoinsight/wordfreq: v.2.2 (2018); https://doi.org/10.5281/zenodo.1443582

  66. Pagel, M., Atkinson, Q. D. & Meade, A. Frequency of word-use predicts rates of lexical evolution throughout Indo-European history. Nature 449, 717–720 (2007).

    CAS  PubMed  Google Scholar 

  67. Kirby, K. R. et al. D-place: a global database of cultural, linguistic and environmental diversity. PloS ONE 11, e0158391 (2016).

    PubMed  PubMed Central  Google Scholar 

  68. Murdock, G. P. & Provost, C. Factors in the division of labor by sex: a cross-cultural analysis. Ethnology 12, 203–225 (1973).

    Google Scholar 

  69. Sellen, D. W. & Smay, D. B. Relationships between subsistence and age at weaning in "preindustrial" societies. Human Nat. 12, 47–87 (2001).

    CAS  Google Scholar 

  70. Apostolou, M. Bridewealth as an instrument of male parental control over mating: evidence from the standard cross-cultural sample. J. Evol. Psychol. 8, 205–216 (2010).

    Google Scholar 

  71. Meggers, B. J. Environmental limitation on the development of culture. Am. Anthropol. 56, 801–824 (1954).

    Google Scholar 

  72. Peoples, H. C. & Marlowe, F. W. Subsistence and the evolution of religion. Human Nat. 23, 253–269 (2012).

    Google Scholar 

  73. Botero, C. A. et al. The ecology of religious beliefs. Proc. Natl Acad. Sci. USA 111, 16784–16789 (2014).

    CAS  PubMed  Google Scholar 

  74. Gavin, M. C. et al. The global geography of human subsistence. R. Soc. Open Sci. 5, 171897 (2018).

    PubMed  PubMed Central  Google Scholar 

  75. Martin, M. K. & Voorhies, B. Female of the Species (Columbia Univ. Press, 1975).

  76. Goodenough, W. H. Basic economy and community. Behav. Sci. Notes 4, 291–298 (1969).

    Google Scholar 

  77. Lakoff, G., Espenson, J. & Schwartz, A. The Mastermetaphor List 2nd ed. (Univ. California Press, 1994)

  78. Wiseman, R. Interpreting ancient social organization: conceptual metaphors and image schemas. Time Mind 8, 159–190 (2015).

    Google Scholar 

  79. Bouckaert, R. et al. Mapping the origins and expansion of the Indo-European language family. Science 337, 957–960 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  80. Hammarström, H., Forkel, R., Haspelmath, M. & Bank, S. clld/glottolog: Glottolog database 4.2.1 https://glottolog.org/ (Max Planck Institute for the Science of Human History, 2020).

  81. Srinivasan, M. & Rabagliati, H. How concepts and conventions structure the lexicon: cross-linguistic evidence from polysemy. Lingua 157, 124–152 (2015).

    Google Scholar 

  82. Gordon, P. Numerical cognition without words: evidence from Amazonia. Science 306, 496–499 (2004).

    CAS  PubMed  Google Scholar 

  83. Tillman, K. A. & Barner, D. Learning the language of time: children’s acquisition of duration words. Cogn. Psychol. 78, 57–77 (2015).

    PubMed  Google Scholar 

  84. Gelman, R. & Butterworth, B. Number and language: how are they related? Trends Cogn. Sci. 9, 6–10 (2005).

    PubMed  Google Scholar 

  85. Chen, D., Peterson, J. C. & Griffiths, T. L. Evaluating vector-space models of analogy. Preprint at arXiv https://arxiv.org/abs/1705.04416 (2017).

  86. Huebner, P. A. & Willits, J. A. Structured semantic knowledge can emerge automatically from predicting word sequences in child-directed speech. Front. Psychol. 9, 133 (2018).

    PubMed  PubMed Central  Google Scholar 

  87. Ramiro, C., Srinivasan, M., Malt, B. C. & Xu, Y. Algorithms in the historical emergence of word senses. Proc. Natl Acad. Sci. USA 115, 2323–2328 (2018).

    CAS  PubMed  Google Scholar 

  88. Grave, E., Joulin, A. & Berthet, Q. Unsupervised alignment of embeddings with wasserstein procrustes. Preprint at arXiv https://arxiv.org/abs/1805.11222 (2018).

  89. Peters, M. E. et al. Deep contextualized word representations. Preprint at arXiv https://arxiv.org/abs/1802.05365 (2018).

  90. Wierzbicka, A. Semantics: Primes and Universals (Oxford Univ. Press, 1996).

  91. Goddard, C. & Wierzbicka, A. Meaning and Universal Grammar: Theory and Empirical Findings (John Benjamins Publishing, 2002).

  92. Aitchison, J. Words in the Mind: An Introduction to the Mental Lexicon 4th edn (Wiley-Blackwell, 2012).

  93. Matthewson, L. Is the meta-language really natural? Theor. Linguist. 29, 263–274 (2008).

  94. List, J. M., Greenhill, S., Rzymski, C., Schweikhard, N. & Forkel, R. (eds) Concepticon 2.0 (Max Planck Institute for the Science of Human History, 2019).

  95. Conneau, A., Lample, G., Ranzato, M., Denoyer, L. & Jégou, H. Word translation without parallel data. Preprint at arXiv https://arxiv.org/abs/1710.04087 (2017).

  96. van Buuren, S. & Groothuis-Oudshoorn, K. mice: multivariate imputation by chained equations in R. J. Stat. Soft. 1–68 (2010).

  97. Huson, D. H. & Bryant, D. Application of phylogenetic networks in evolutionary studies. Mol. Biol. Evol. 23, 254–267 (2005).

    PubMed  Google Scholar 

Download references

Acknowledgements

We thank J. Dellert and A. Majid. B.T. and G.L. acknowledge support from LEVINSON fellowships at the Max Planck Institute for Psycholinguistics. S.G.R. was partially supported by a Leverhulme early career fellowship (ECF-2016-435). G.L. was partially supported by NSF-PAC 1734260. The funders had no role in the conceptualization, design, data collection, analysis, decision to publish or preparation of the manuscript.

Author information

Authors and Affiliations

Authors

Contributions

B.T., S.G.R. and G.L. designed the research, collected and analysed data, and contributed to the writing of the manuscript.

Corresponding author

Correspondence to Bill Thompson.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Primary Handling Editor: Charlotte Payne.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Information, sections 1–5.

Reporting Summary

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Thompson, B., Roberts, S.G. & Lupyan, G. Cultural influences on word meanings revealed through large-scale semantic alignment. Nat Hum Behav 4, 1029–1038 (2020). https://doi.org/10.1038/s41562-020-0924-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41562-020-0924-8

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing