Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Gender stereotypes are reflected in the distributional structure of 25 languages


Cultural stereotypes such as the idea that men are more suited for paid work and women are more suited for taking care of the home and family, may contribute to gender imbalances in science, technology, engineering and mathematics (STEM) fields, among other undesirable gender disparities. Might these stereotypes be learned from language? Here we examine whether gender stereotypes are reflected in the large-scale distributional structure of natural language semantics. We measure gender associations embedded in the statistics of 25 languages and relate these to data on an international dataset of psychological gender associations (N = 656,636). People’s implicit gender associations are strongly predicted by gender associations encoded in the statistics of the language they speak. These associations are further related to the extent that languages mark gender in occupation terms (for example, ‘waiter’/‘waitress’). Our pattern of findings is consistent with the possibility that linguistic associations shape people’s implicit judgements.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: Human judgements of word gender association as a function of gender association from the subtitle-trained embedding model.
Fig. 2: Implicit versus linguistic associations.
Fig. 3: Implicit male–career association and mean gender association.

Data availability

The data that support the findings of this study are available at Source data are provided with this paper.

Code availability

All code that supports the findings of this study is available at


  1. 1.

    Gelman, S. A., Taylor, M. G., Nguyen, S. P., Leaper, C. & Bigler, R. S. Mother–child conversations about gender: understanding the acquisition of essentialist beliefs. Monogr. Soc. Res. Child Dev. 69, 1–142 (2004).

    Article  Google Scholar 

  2. 2.

    Bian, L., Leslie, S. J. & Cimpian, A. Gender stereotypes about intellectual ability emerge early and influence children’s interests. Science 355, 389–391 (2017).

    CAS  Article  Google Scholar 

  3. 3.

    Ceci, S. J. & Williams, W. M. Understanding current causes of women’s underrepresentation in science. Proc. Natl Acad. Sci. USA 108, 3157–3162 (2011).

    CAS  Article  Google Scholar 

  4. 4.

    Leslie, S. J., Cimpian, A., Meyer, M. & Freeland, E. Expectations of brilliance underlie gender distributions across academic disciplines. Science 347, 262–265 (2015).

    CAS  Article  Google Scholar 

  5. 5.

    Miller, D. I., Eagly, A. H. & Linn, M. C. Women’s representation in science predicts national gender–science stereotypes: Evidence from 66 nations. J. Educ. Psychol. 107, 631 (2015).

    Article  Google Scholar 

  6. 6.

    Stoet, G. & Geary, D. C. The gender-equality paradox in science, technology, engineering, and mathematics education. Psychol. Sci. 29, 581–593 (2018).

    Article  Google Scholar 

  7. 7.

    Dryer, M. S. & Haspelmath M. eds. WALS Online (Max Planck Institute for Evolutionary Anthropology, 2013);

  8. 8.

    Rhodes, M. & Brickman, D. Preschoolers’ responses to social comparisons involving relative failure. Psychol. Sci. 19, 968–972 (2008).

    Article  Google Scholar 

  9. 9.

    Cimpian, A., Mu, Y. & Erickson, L. C. Who is good at this game? Linking an activity to a social category undermines children’s achievement. Psychol. Sci. 23, 533–541 (2012).

    Article  Google Scholar 

  10. 10.

    Cimpian, A. & Markman, E. M. The generic/nongeneric distinction influences how children interpret new information about social others. Child Dev. 82, 471–492 (2011).

    Article  Google Scholar 

  11. 11.

    Rhodes, M., Leslie, S. J., Yee, K. M. & Saunders, K. Subtle linguistic cues increase girls’ engagement in science. Psychol. Sci. 30, 455–466 (2019).

    Article  Google Scholar 

  12. 12.

    Greenwald, A. G., McGhee, D. E. & Schwartz, J. L. Measuring individual differences in implicit cognition: the implicit association test. J. Pers. Soc. Psychol. 74, 1464–1480 (1998).

    CAS  Article  Google Scholar 

  13. 13.

    Nosek, B. A., Banaji, M. R. & Greenwald, A. G. Harvesting implicit group attitudes and beliefs from a demonstration web site. Group Dyn. Theory Res. Pract. 6, 101–115 (2002).

    Article  Google Scholar 

  14. 14.

    Payne, B. K., Vuletich, H. A. & Brown-Iannuzzi, J. L. Historical roots of implicit bias in slavery. Proc. Natl Acad. Sci. USA 116, 11693–11698 (2019).

    CAS  PubMed  Google Scholar 

  15. 15.

    Hehman, E., Calanchini, J., Flake, J. K. & Leitner, J. B. Establishing construct validity evidence for regional measures of explicit and implicit racial bias. J. Exp. Psychol. Gen. 148, 1022–1040 (2019).

    Article  Google Scholar 

  16. 16.

    Charlesworth, T. E. & Banaji, M. R. Patterns of implicit and explicit attitudes: I. Long-term change and stability from 2007 to 2016. Psychol. Sci. 30, 174–192 (2019).

    Article  Google Scholar 

  17. 17.

    Firth, J. R. Studies in Linguistic Analysis (Philological Society, 1957).

  18. 18.

    Landauer, T. K. & Dumais, S. T. A solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychol. Rev. 104, 211–240 (1997).

    Article  Google Scholar 

  19. 19.

    Lund, K. & Burgess, C. Producing high-dimensional semantic spaces from lexical co-occurrence. Behav. Res. Methods Instrum. Comput. 28, 203–208 (1996).

    Article  Google Scholar 

  20. 20.

    Lenci, A. Distributional semantics in linguistic and cognitive research. Ital. J. Linguist. 20, 1–31 (2008).

    Google Scholar 

  21. 21.

    Caliskan, A., Bryson, J. J. & Narayanan, A. Semantics derived automatically from language corpora contain human-like biases. Science 356, 183–186 (2017).

    CAS  Article  Google Scholar 

  22. 22.

    Bhatia, S. The semantic representation of prejudice and stereotypes. Cognition 164, 46–60 (2017).

    Article  Google Scholar 

  23. 23.

    von der Malsburg, T., Poppels, T. & Levy, R. Implicit gender bias in linguistic descriptions for expected events: The cases of the 2016 US and 2017 UK election. Psychol. Sci. 31, 115–128 (2020).

    Article  Google Scholar 

  24. 24.

    Garg, N., Schiebinger, L., Jurafsky, D. & Zou, J. Word embeddings quantify 100 years of gender and ethnic stereotypes. Proc. Natl Acad. Sci. USA 115, E3635–E3644 (2018).

    CAS  Article  Google Scholar 

  25. 25.

    Greenwald, A. G. An AI stereotype catcher. Science 356, 133–134 (2017).

    CAS  Article  Google Scholar 

  26. 26.

    Lupyan, G. & Lewis, M. From words-as-mappings to words-as-cues: the role of language in semantic knowledge. Lang. Cogn. Neurosci. 34, 1319–1337 (2017).

    Article  Google Scholar 

  27. 27.

    Greenwald, A. G., Nosek, B. A. & Banaji, M. R. Understanding and using the implicit association test: I. an improved scoring algorithm. J. Pers. Soc. Psychol. 85, 197 (2003).

    Article  Google Scholar 

  28. 28.

    Forscher, P. S. et al. A meta-analysis of procedures to change implicit measures. J. Pers. Soc. Psychol. 117, 522–559 (2019).

    Article  Google Scholar 

  29. 29.

    CIA. The CIA World Factbook 2017 (2016);

  30. 30.

    Mikolov, T., Chen, K., Corrado, G. & Dean, J. Efficient estimation of word representations in vector space. Preprint at arXiv (2013).

  31. 31.

    van Paridon, J. & Thompson, B. subs2vec: word embeddings from subtitles in 55 languages. Preprint at OSF (2019).

  32. 32.

    Lison, P. & Tiedemann, J. OpenSubtitles2016: extracting large parallel corpora from movie and TV subtitles. In Proc. 10th International Conference on Language Resources and Evaluation (ELRA, 2016).

  33. 33.

    Bojanowski, P., Grave, E., Joulin, A. & Mikolov, T. Enriching word vectors with subword information. Trans. Assoc. Comput. Ling. 5, 135–146 (2017).

    Google Scholar 

  34. 34.

    Hussey, I., et al. The Attitudes, Identities, and Individual differences (AIID) Study and Dataset (2019).

  35. 35.

    Falk, A. & Hermle, J. Relationship of gender differences in preferences to economic development and gender equality. Science 362, eaas9899 (2018).

    Article  Google Scholar 

  36. 36.

    Lane, K. A., Banaji, M. R., Nosek, B. A. & Greenwald, A. G. in Implicit Measures of Attitudes (eds. Wittenbrink, B. & Schwarz, N.) 59–102 (2007).

  37. 37.

    Fazio, R. H. & Olson, M. A. Implicit measures in social cognition research: their meaning and use. Annu. Rev. Psychol. 54, 297–327 (2003).

    Article  Google Scholar 

  38. 38.

    Payne, B. K., Vuletich, H. A. & Lundberg, K. B. The bias of crowds: how implicit bias bridges personal and systemic prejudice. Psychol. Inq. 28, 233–248 (2017).

    Article  Google Scholar 

  39. 39.

    Marian, V. & Kaushanskaya, M. Language context guides memory content. Psychon. Bull. Rev. 14, 925–933 (2007).

    Article  Google Scholar 

  40. 40.

    Athanasopoulos, P. Cognitive representation of colour in bilinguals: the case of Greek blues. Biling. Lang. Cogn. 12, 83–95 (2009).

    Article  Google Scholar 

  41. 41.

    Scott, G. G., Keitel, A., Becirspahic, M., Yao, B. & Sereno, S. C. The Glasgow norms: ratings of 5,500 words on nine scales. Behav. Res. Methods 51, 1258–1270 (2019).

    Article  Google Scholar 

  42. 42.

    Joulin, A., Grave, E., Bojanowski, P. & Mikolov, T. Bag of tricks for efficient text classification. Preprint at arXiv (2016).

  43. 43.

    Simons, G. F. & Charles, D. F. (eds) Ethnologue: Languages of the World (SIL International, 2018).

  44. 44.

    Burnard, L. Users Reference Guide for the British National Corpus (Oxford Univ. Computing Services, 1995).

  45. 45.

    Davies M. The Corpus of Contemporary American English (2008);

  46. 46.

    Bates, D., Mächler, M., Bolker, B. & Walker, S. Fitting linear mixed-effects models using lme4. J. Stat. Softw. 67, 1–48 (2015).

    Article  Google Scholar 

  47. 47.

    Misersky, J. et al. Norms on the gender perception of role nouns in Czech, English, French, German, Italian, Norwegian, and Slovak. Behav. Res. Methods 46, 841–871 (2014).

    PubMed  Google Scholar 

  48. 48.

    Haspelmath, M., Dryer, M. S., Gil, D. & Comrie, B. (eds) The World Atlas of Language Structures Online (Max Planck Institute for Evolutionary Anthropology, 2008);

  49. 49.

    The World Bank. World Development Indicators (2017);

Download references


The authors acknowledge National Science Foundation Perception, Action and Cognition 1734260 for funding support for this work. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Author information




M.L. and G.L. designed the research and wrote the manuscript. M.L. conducted the data analysis.

Corresponding author

Correspondence to Molly Lewis.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Primary handling editor: Aisha Bradshaw

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Sample size and demographic characteristics of Project Implicit data.

a, Number of participants by country after exclusions (note that US participants are excluded from the visualization because of the large sample size; N = 545,673). Our final sample included 657,335 participants from 39 countries (see Supplementary Information for exclusion criteria). b, Gender distribution of participants by country after exclusions. Across countries, there tended to be more female participants relative to male participants (M = 0.64 proportion females; SD = 0.06). c, Age distribution of participants by country after exclusions. Ranges correspond to 95% CIs. Red points show median age by country.

Extended Data Fig. 2 Models predicting IAT effect size at the participant level.

Median country age predicts IAT effect size over and above participant age at the participant level: Countries with older populations tend to have individuals with stronger implicit career-gender associations, even after controlling for participant age. The table presents an additive mixed-effect regression predicting IAT D-score at the participant level with participant age and median country age, controlling for participant sex and trial order. The model includes by-country random intercepts. b, The relationship between median country age and IAT effect size holds, even after controlling for the percentage women in STEM. The table presents an additive mixed effect model predicting IAT D-score at the participant level with participant age, median country age and percentage women in STEM in country, controlling for participant sex and trial order. The model includes by-country random intercepts.

Extended Data Fig. 3 Geographic distribution of IAT scores.

a, Residualized implicit career-gender association (IAT score) shown by country. IAT scores are residualized for participant age, gender, and task order (N = 657,335). Larger values (blue) indicate a larger bias to associate men with the concept of career and women with the concept of family. Countries in white correspond to countries for which there was insufficient data to estimate the country-level career-gender association. Inset shows IAT scores for European countries only. Note that while Hindi is identified as the most frequently spoken language in India, India is highly multilingual and so Hindi embeddings may be a poor representation of the linguistic statistics for speakers in India as a group. b, Distribution of raw (unresidualized) implicit career-gender association (IAT D-score) across countries. All countries in our sample showed a tendency to associate men with career and women with family.

Extended Data Fig. 4 Replication of Caliskan et al. (2017) with our corpora.

We replicate the original set of Caliskan, Bryson, and Narayanan (2017; CBN)21 findings using the English-trained versions of the models used in our main analyses (models trained on the Wikipedia and Subtitles corpora). For each model, we calculate an effect size for each of the 10 IAT types reported in CBN: flowers/insects-pleasant/unpleasant, instruments/weapons-pleasant/unpleasant, European-American/Afro-American-pleasant/unpleasant, males/females-career/family, math/arts-male/female, science/arts-male/female, mental-disease/physical-disease-permanent/temporary, and young/old-pleasant/unpleasant (labelled as Word-Embedding Association Test (WEAT) 1-10 in CBN). We calculate the bias using the same effect size metric described in CBN, a standardized difference score of the relative similarity of the target words to the target attributes (that is relative similarity of male to career vs. relative similarity of female to career). This measure is analogous to the behavioural effect size measure where larger values indicate larger bias. The figure shows the effect size measure derived from the English Wikipedia corpus (a) and the English Subtitle corpus (b) plotted against effect size estimates reported by CBN from two different models (trained on Common Crawl and Google News corpora). Point color corresponds to bias type, and point shape corresponds to the two CBN models. With the exception of biases related to race and age, effect sizes from our corpora are comparable to those reported by CBN. In particular, for the gender-career IAT-the bias relevant to our current purposes-we estimate the effect size to be 1.78 (Wikipedia)/1.65 (Subtitle), while CBN estimates it to be 1.81 (Common Crawl)/1.89 (Google News).

Extended Data Fig. 5 Pairwise Correlations partialing out the effect of median country age.

Partial correlations (Pearson’s r) for all measures in Study 1b and 2 using language as the unit of analysis, controlling for median country age. 95% CIs are given in brackets followed by the corresponding p-value. Implicit and explicit male-career association measures are residualized for participant age, gender, and task order. ‘Assoc.’ = association; ‘Lang.’= language; ‘Subt.’/ ‘Wiki.’ = Subtitle/Wikipedia corpora; ‘Prop. Gendered Occup. Terms.’ = proportion of occupation terms that are gendered. ‘Occup. Genderness’ = degree to which occupation terms in a language tend to be associated with a particular gender in the language statistics.

Extended Data Fig. 6 Replication of Study 1b on Wikipedia corpus excluding translations.

Both the Subtitle and Wikipedia corpora likely contain some documents that are translated from other languages (for example, the Wikipedia article on ‘Paris’ is written in French and then translated into English). The parallel content across languages allows us to estimate the gender bias in language statistics, while holding content constant across languages. Nevertheless, content may itself be a driver of gender bias (for example one language may have more articles about male politicians relative to another). To understand the contribution of language-specific content on gender bias, we constructed a corpus of Wikipedia articles in each language that were originally written in the target language (that is, untranslated), and trained word embedding models on the corpus in each language (see Supplemental Methods for details). We then used these models to calculate by-language male-career association scores using the same procedure as in Study 1b. Using models trained on the untranslated corpora, we replicate the key finding from Study 1b showing a positive correlation between the bias measured behaviorally with the IAT and measured in language (r = .60; p = .002; N participants = 656,636). Notably, the effect size is somewhat larger relative to the other two corpora types, presumably because additional bias is introduced by allowing the corpus content to vary across languages.

Extended Data Fig. 7 Models examining UK-US bias difference in AIID dataset (Study 1c).

a, The exact pre-registered analysis of Study 1c is presented. Pairwise correlations between all variables (language bias, behavioral bias, and UK-US difference measures) are shown, averaging across estimates of language bias from the 5 model runs (N participants = 27,045). Error bars are 95% CIs. As stated in the pre-registration, the key test of our hypothesis is that the correlation between the UK - US linguistic difference (‘Language Bias Difference’) and the UK - US behavioral difference (‘Behavioral Bias Difference’) is greater than 0 (shown in red). That data are consistent with this prediction. The confirmatory dataset is shown on the right, along with the smaller exploratory dataset on the left for reference. b, The full results of the mixed-effect model described in the Main Text are presented.

Extended Data Fig. 8 Models predicting implicit male-career association with proportion gender distinct labels and language career-gender association (Study 2).

We predict the magnitude of implicit male-career association by language with an additive linear model. Predictors are proportion of occupation terms that are gendered (‘Prop. Gendered Occup. Terms’) and language male-career association as measured by word embeddings of the IAT words (‘Male-Career Assoc.’). Model coefficients are shown for two models using estimates of language career-gender association from embedding models trained on Subtitle (a) and Wikipedia (b) corpora. The linear models account for 40.63% (Subtitle) and 45.32% (Wikipedia) of the variance in implicit male-career association. ‘Subt.’/ ‘Wiki.’ = Subtitle/Wikipedia corpora.

Extended Data Fig. 9 Gender associations in language and other psychological measures.

Several recent studies6,35 have presented novel theories to account for cases of structural inequality related to gender. Both of these studies argue that psychological differences play a causal role in the emergence of structural inequality. Here, we show that degree of gender bias in language is correlated with these psychological differences at the country level, consistent with the idea that language experience could be playing a causal role in the emergence of psychological differences. a, Gender differences in preferences35 (composite score of ‘six fundamental preferences with regard to social and nonsocial domains: willingness to take risks; patience, which captures preferences over the intertemporal timing of rewards; altruism; trust; and positive and negative reciprocity, which capture the costly willingness to reward kind actions or to punish unkind actions, respectively.’) as a function of language male-career association measured in the Subtitle corpus. These two measures are correlated (r(25) = 0.48 [0.12, 0.73],p= 0.01): Countries with greater differences in gender preferences also have greater gender bias present in their languages. We also find that per capita GDP49 is correlated with language gender male-career association measured in both corpora (Wikipedia: r(35) = 0.64 [0.4, 0.8],p< .0001; Subtitle: r(31) = 0.58 [0.29, 0.77],p< .001). However, the magnitude of the male-career association in the language spoken in a country predicts the magnitude of the male-career association measured via the behavioral IAT, controlling for both national GDP and median country age, in an additive mixed-effect model. b, Gender difference in STEM Self Efficacy6 (‘The sex difference in self efficacy (boys - girls)’) as a function of male-career association measured in the Subtitle corpus. These two measures are correlated (r(28) = 0.59 [0.3, 0.79], p< .001): Countries with greater gender differences in self-efficacy also have greater gender bias present in their languages. Further, self-efficacy mediated the effect of language statistics on percentage of women in stem (path-ab = -0.33, p= 0.01), suggesting that language statistics could be critical causal factor underlying gender differences in STEM participation.

Supplementary information

Supplementary Information

Supplementary Methods, Supplementary Fig. 1 and Supplementary Tables 1–3.

Reporting Summary

Supplementary Table 1

Most frequent language spoken in each country in our sample.

Source data

Source Data Fig. 1

Statistical source data

Source Data Fig. 2

Statistical source data

Source Data Fig. 3

Statistical source data

Source Data Extended Data Fig. 1

Statistical source data

Source Data Extended Data Fig. 2

Statistical source data

Source Data Extended Data Fig. 4

Statistical source data

Source Data Extended Data Fig. 6

Statistical source data

Source Data Extended Data Fig. 7

Statistical source data

Source Data Extended Data Fig. 8

Statistical source data

Source Data Extended Data Fig. 9

Statistical source data

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Lewis, M., Lupyan, G. Gender stereotypes are reflected in the distributional structure of 25 languages. Nat Hum Behav 4, 1021–1028 (2020).

Download citation


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing