Gender stereotypes are reflected in the distributional structure of 25 languages

Lewis, Molly; Lupyan, Gary

doi:10.1038/s41562-020-0918-6

Article
Published: 03 August 2020

Gender stereotypes are reflected in the distributional structure of 25 languages

Nature Human Behaviour volume 4, pages 1021–1028 (2020)Cite this article

5330 Accesses
54 Citations
168 Altmetric
Metrics details

Subjects

Abstract

Cultural stereotypes such as the idea that men are more suited for paid work and women are more suited for taking care of the home and family, may contribute to gender imbalances in science, technology, engineering and mathematics (STEM) fields, among other undesirable gender disparities. Might these stereotypes be learned from language? Here we examine whether gender stereotypes are reflected in the large-scale distributional structure of natural language semantics. We measure gender associations embedded in the statistics of 25 languages and relate these to data on an international dataset of psychological gender associations (N = 656,636). People’s implicit gender associations are strongly predicted by gender associations encoded in the statistics of the language they speak. These associations are further related to the extent that languages mark gender in occupation terms (for example, ‘waiter’/‘waitress’). Our pattern of findings is consistent with the possibility that linguistic associations shape people’s implicit judgements.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Human judgements of word gender association as a function of gender association from the subtitle-trained embedding model.**

**Fig. 2: Implicit versus linguistic associations.**

**Fig. 3: Implicit male–career association and mean gender association.**

Towards Gender Harmony Dataset: Gender Beliefs and Gender Stereotypes in 62 Countries

Article Open access 17 April 2024

The effect of language on performance: do gendered languages fail women in maths?

Article Open access 06 April 2021

Perceived gender and political persuasion: a social media field experiment during the 2020 US Democratic presidential primary election

Article Open access 28 August 2023

Data availability

The data that support the findings of this study are available at https://github.com/mllewis/IATLANG. Source data are provided with this paper.

Code availability

All code that supports the findings of this study is available at https://github.com/mllewis/IATLANG.

References

Gelman, S. A., Taylor, M. G., Nguyen, S. P., Leaper, C. & Bigler, R. S. Mother–child conversations about gender: understanding the acquisition of essentialist beliefs. Monogr. Soc. Res. Child Dev. 69, 1–142 (2004).
Article Google Scholar
Bian, L., Leslie, S. J. & Cimpian, A. Gender stereotypes about intellectual ability emerge early and influence children’s interests. Science 355, 389–391 (2017).
Article CAS Google Scholar
Ceci, S. J. & Williams, W. M. Understanding current causes of women’s underrepresentation in science. Proc. Natl Acad. Sci. USA 108, 3157–3162 (2011).
Article CAS Google Scholar
Leslie, S. J., Cimpian, A., Meyer, M. & Freeland, E. Expectations of brilliance underlie gender distributions across academic disciplines. Science 347, 262–265 (2015).
Article CAS Google Scholar
Miller, D. I., Eagly, A. H. & Linn, M. C. Women’s representation in science predicts national gender–science stereotypes: Evidence from 66 nations. J. Educ. Psychol. 107, 631 (2015).
Article Google Scholar
Stoet, G. & Geary, D. C. The gender-equality paradox in science, technology, engineering, and mathematics education. Psychol. Sci. 29, 581–593 (2018).
Article Google Scholar
Dryer, M. S. & Haspelmath M. eds. WALS Online (Max Planck Institute for Evolutionary Anthropology, 2013); https://wals.info/
Rhodes, M. & Brickman, D. Preschoolers’ responses to social comparisons involving relative failure. Psychol. Sci. 19, 968–972 (2008).
Article Google Scholar
Cimpian, A., Mu, Y. & Erickson, L. C. Who is good at this game? Linking an activity to a social category undermines children’s achievement. Psychol. Sci. 23, 533–541 (2012).
Article Google Scholar
Cimpian, A. & Markman, E. M. The generic/nongeneric distinction influences how children interpret new information about social others. Child Dev. 82, 471–492 (2011).
Article Google Scholar
Rhodes, M., Leslie, S. J., Yee, K. M. & Saunders, K. Subtle linguistic cues increase girls’ engagement in science. Psychol. Sci. 30, 455–466 (2019).
Article Google Scholar
Greenwald, A. G., McGhee, D. E. & Schwartz, J. L. Measuring individual differences in implicit cognition: the implicit association test. J. Pers. Soc. Psychol. 74, 1464–1480 (1998).
Article CAS Google Scholar
Nosek, B. A., Banaji, M. R. & Greenwald, A. G. Harvesting implicit group attitudes and beliefs from a demonstration web site. Group Dyn. Theory Res. Pract. 6, 101–115 (2002).
Article Google Scholar
Payne, B. K., Vuletich, H. A. & Brown-Iannuzzi, J. L. Historical roots of implicit bias in slavery. Proc. Natl Acad. Sci. USA 116, 11693–11698 (2019).
CAS PubMed Google Scholar
Hehman, E., Calanchini, J., Flake, J. K. & Leitner, J. B. Establishing construct validity evidence for regional measures of explicit and implicit racial bias. J. Exp. Psychol. Gen. 148, 1022–1040 (2019).
Article Google Scholar
Charlesworth, T. E. & Banaji, M. R. Patterns of implicit and explicit attitudes: I. Long-term change and stability from 2007 to 2016. Psychol. Sci. 30, 174–192 (2019).
Article Google Scholar
Firth, J. R. Studies in Linguistic Analysis (Philological Society, 1957).
Landauer, T. K. & Dumais, S. T. A solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychol. Rev. 104, 211–240 (1997).
Article Google Scholar
Lund, K. & Burgess, C. Producing high-dimensional semantic spaces from lexical co-occurrence. Behav. Res. Methods Instrum. Comput. 28, 203–208 (1996).
Article Google Scholar
Lenci, A. Distributional semantics in linguistic and cognitive research. Ital. J. Linguist. 20, 1–31 (2008).
Google Scholar
Caliskan, A., Bryson, J. J. & Narayanan, A. Semantics derived automatically from language corpora contain human-like biases. Science 356, 183–186 (2017).
Article CAS Google Scholar
Bhatia, S. The semantic representation of prejudice and stereotypes. Cognition 164, 46–60 (2017).
Article Google Scholar
von der Malsburg, T., Poppels, T. & Levy, R. Implicit gender bias in linguistic descriptions for expected events: The cases of the 2016 US and 2017 UK election. Psychol. Sci. 31, 115–128 (2020).
Article Google Scholar
Garg, N., Schiebinger, L., Jurafsky, D. & Zou, J. Word embeddings quantify 100 years of gender and ethnic stereotypes. Proc. Natl Acad. Sci. USA 115, E3635–E3644 (2018).
Article CAS Google Scholar
Greenwald, A. G. An AI stereotype catcher. Science 356, 133–134 (2017).
Article CAS Google Scholar
Lupyan, G. & Lewis, M. From words-as-mappings to words-as-cues: the role of language in semantic knowledge. Lang. Cogn. Neurosci. 34, 1319–1337 (2017).
Article Google Scholar
Greenwald, A. G., Nosek, B. A. & Banaji, M. R. Understanding and using the implicit association test: I. an improved scoring algorithm. J. Pers. Soc. Psychol. 85, 197 (2003).
Article Google Scholar
Forscher, P. S. et al. A meta-analysis of procedures to change implicit measures. J. Pers. Soc. Psychol. 117, 522–559 (2019).
Article Google Scholar
CIA. The CIA World Factbook 2017 (2016); https://www.cia.gov/library/publications/the-world-factbook/index.html
Mikolov, T., Chen, K., Corrado, G. & Dean, J. Efficient estimation of word representations in vector space. Preprint at arXiv https://arxiv.org/abs/1301.3781 (2013).
van Paridon, J. & Thompson, B. subs2vec: word embeddings from subtitles in 55 languages. Preprint at OSF https://doi.org/10.31234/osf.io/fcrmy (2019).
Lison, P. & Tiedemann, J. OpenSubtitles2016: extracting large parallel corpora from movie and TV subtitles. In Proc. 10th International Conference on Language Resources and Evaluation (ELRA, 2016).
Bojanowski, P., Grave, E., Joulin, A. & Mikolov, T. Enriching word vectors with subword information. Trans. Assoc. Comput. Ling. 5, 135–146 (2017).
Google Scholar
Hussey, I., et al. The Attitudes, Identities, and Individual differences (AIID) Study and Dataset https://osf.io/pcjwf/ (2019).
Falk, A. & Hermle, J. Relationship of gender differences in preferences to economic development and gender equality. Science 362, eaas9899 (2018).
Article Google Scholar
Lane, K. A., Banaji, M. R., Nosek, B. A. & Greenwald, A. G. in Implicit Measures of Attitudes (eds. Wittenbrink, B. & Schwarz, N.) 59–102 (2007).
Fazio, R. H. & Olson, M. A. Implicit measures in social cognition research: their meaning and use. Annu. Rev. Psychol. 54, 297–327 (2003).
Article Google Scholar
Payne, B. K., Vuletich, H. A. & Lundberg, K. B. The bias of crowds: how implicit bias bridges personal and systemic prejudice. Psychol. Inq. 28, 233–248 (2017).
Article Google Scholar
Marian, V. & Kaushanskaya, M. Language context guides memory content. Psychon. Bull. Rev. 14, 925–933 (2007).
Article Google Scholar
Athanasopoulos, P. Cognitive representation of colour in bilinguals: the case of Greek blues. Biling. Lang. Cogn. 12, 83–95 (2009).
Article Google Scholar
Scott, G. G., Keitel, A., Becirspahic, M., Yao, B. & Sereno, S. C. The Glasgow norms: ratings of 5,500 words on nine scales. Behav. Res. Methods 51, 1258–1270 (2019).
Article Google Scholar
Joulin, A., Grave, E., Bojanowski, P. & Mikolov, T. Bag of tricks for efficient text classification. Preprint at arXiv https://arxiv.org/abs/1607.01759 (2016).
Simons, G. F. & Charles, D. F. (eds) Ethnologue: Languages of the World (SIL International, 2018).
Burnard, L. Users Reference Guide for the British National Corpus (Oxford Univ. Computing Services, 1995).
Davies M. The Corpus of Contemporary American English (2008); https://corpus.byu.edu/coca/
Bates, D., Mächler, M., Bolker, B. & Walker, S. Fitting linear mixed-effects models using lme4. J. Stat. Softw. 67, 1–48 (2015).
Article Google Scholar
Misersky, J. et al. Norms on the gender perception of role nouns in Czech, English, French, German, Italian, Norwegian, and Slovak. Behav. Res. Methods 46, 841–871 (2014).
PubMed Google Scholar
Haspelmath, M., Dryer, M. S., Gil, D. & Comrie, B. (eds) The World Atlas of Language Structures Online (Max Planck Institute for Evolutionary Anthropology, 2008); http://wals.info
The World Bank. World Development Indicators (2017); http://data.worldbank.org/indicator/NY.GDP.PCAP.CD

Download references

Acknowledgements

The authors acknowledge National Science Foundation Perception, Action and Cognition 1734260 for funding support for this work. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Author information

Authors and Affiliations

Department of Psychology, Carnegie Mellon University, Pittsburgh, PA, USA
Molly Lewis
Department of Social and Decision Sciences, Carnegie Mellon University, Pittsburgh, PA, USA
Molly Lewis
Psychology Department, University of Wisconsin-Madison, Madison, WI, USA
Gary Lupyan

Authors

Molly Lewis
View author publications
You can also search for this author in PubMed Google Scholar
Gary Lupyan
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

M.L. and G.L. designed the research and wrote the manuscript. M.L. conducted the data analysis.

Corresponding author

Correspondence to Molly Lewis.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Primary handling editor: Aisha Bradshaw

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Sample size and demographic characteristics of Project Implicit data.

a, Number of participants by country after exclusions (note that US participants are excluded from the visualization because of the large sample size; N = 545,673). Our final sample included 657,335 participants from 39 countries (see Supplementary Information for exclusion criteria). b, Gender distribution of participants by country after exclusions. Across countries, there tended to be more female participants relative to male participants (M = 0.64 proportion females; SD = 0.06). c, Age distribution of participants by country after exclusions. Ranges correspond to 95% CIs. Red points show median age by country.

Extended Data Fig. 2 Models predicting IAT effect size at the participant level.

Median country age predicts IAT effect size over and above participant age at the participant level: Countries with older populations tend to have individuals with stronger implicit career-gender associations, even after controlling for participant age. The table presents an additive mixed-effect regression predicting IAT D-score at the participant level with participant age and median country age, controlling for participant sex and trial order. The model includes by-country random intercepts. b, The relationship between median country age and IAT effect size holds, even after controlling for the percentage women in STEM. The table presents an additive mixed effect model predicting IAT D-score at the participant level with participant age, median country age and percentage women in STEM in country, controlling for participant sex and trial order. The model includes by-country random intercepts.

Extended Data Fig. 3 Geographic distribution of IAT scores.

a, Residualized implicit career-gender association (IAT score) shown by country. IAT scores are residualized for participant age, gender, and task order (N = 657,335). Larger values (blue) indicate a larger bias to associate men with the concept of career and women with the concept of family. Countries in white correspond to countries for which there was insufficient data to estimate the country-level career-gender association. Inset shows IAT scores for European countries only. Note that while Hindi is identified as the most frequently spoken language in India, India is highly multilingual and so Hindi embeddings may be a poor representation of the linguistic statistics for speakers in India as a group. b, Distribution of raw (unresidualized) implicit career-gender association (IAT D-score) across countries. All countries in our sample showed a tendency to associate men with career and women with family.

Extended Data Fig. 4 Replication of Caliskan et al. (2017) with our corpora.

We replicate the original set of Caliskan, Bryson, and Narayanan (2017; CBN)²¹ findings using the English-trained versions of the models used in our main analyses (models trained on the Wikipedia and Subtitles corpora). For each model, we calculate an effect size for each of the 10 IAT types reported in CBN: flowers/insects-pleasant/unpleasant, instruments/weapons-pleasant/unpleasant, European-American/Afro-American-pleasant/unpleasant, males/females-career/family, math/arts-male/female, science/arts-male/female, mental-disease/physical-disease-permanent/temporary, and young/old-pleasant/unpleasant (labelled as Word-Embedding Association Test (WEAT) 1-10 in CBN). We calculate the bias using the same effect size metric described in CBN, a standardized difference score of the relative similarity of the target words to the target attributes (that is relative similarity of male to career vs. relative similarity of female to career). This measure is analogous to the behavioural effect size measure where larger values indicate larger bias. The figure shows the effect size measure derived from the English Wikipedia corpus (a) and the English Subtitle corpus (b) plotted against effect size estimates reported by CBN from two different models (trained on Common Crawl and Google News corpora). Point color corresponds to bias type, and point shape corresponds to the two CBN models. With the exception of biases related to race and age, effect sizes from our corpora are comparable to those reported by CBN. In particular, for the gender-career IAT-the bias relevant to our current purposes-we estimate the effect size to be 1.78 (Wikipedia)/1.65 (Subtitle), while CBN estimates it to be 1.81 (Common Crawl)/1.89 (Google News).

Extended Data Fig. 5 Pairwise Correlations partialing out the effect of median country age.

Partial correlations (Pearson’s r) for all measures in Study 1b and 2 using language as the unit of analysis, controlling for median country age. 95% CIs are given in brackets followed by the corresponding p-value. Implicit and explicit male-career association measures are residualized for participant age, gender, and task order. ‘Assoc.’ = association; ‘Lang.’= language; ‘Subt.’/ ‘Wiki.’ = Subtitle/Wikipedia corpora; ‘Prop. Gendered Occup. Terms.’ = proportion of occupation terms that are gendered. ‘Occup. Genderness’ = degree to which occupation terms in a language tend to be associated with a particular gender in the language statistics.

Extended Data Fig. 6 Replication of Study 1b on Wikipedia corpus excluding translations.

Both the Subtitle and Wikipedia corpora likely contain some documents that are translated from other languages (for example, the Wikipedia article on ‘Paris’ is written in French and then translated into English). The parallel content across languages allows us to estimate the gender bias in language statistics, while holding content constant across languages. Nevertheless, content may itself be a driver of gender bias (for example one language may have more articles about male politicians relative to another). To understand the contribution of language-specific content on gender bias, we constructed a corpus of Wikipedia articles in each language that were originally written in the target language (that is, untranslated), and trained word embedding models on the corpus in each language (see Supplemental Methods for details). We then used these models to calculate by-language male-career association scores using the same procedure as in Study 1b. Using models trained on the untranslated corpora, we replicate the key finding from Study 1b showing a positive correlation between the bias measured behaviorally with the IAT and measured in language (r = .60; p = .002; N participants = 656,636). Notably, the effect size is somewhat larger relative to the other two corpora types, presumably because additional bias is introduced by allowing the corpus content to vary across languages.

Extended Data Fig. 7 Models examining UK-US bias difference in AIID dataset (Study 1c).

a, The exact pre-registered analysis of Study 1c is presented. Pairwise correlations between all variables (language bias, behavioral bias, and UK-US difference measures) are shown, averaging across estimates of language bias from the 5 model runs (N participants = 27,045). Error bars are 95% CIs. As stated in the pre-registration, the key test of our hypothesis is that the correlation between the UK - US linguistic difference (‘Language Bias Difference’) and the UK - US behavioral difference (‘Behavioral Bias Difference’) is greater than 0 (shown in red). That data are consistent with this prediction. The confirmatory dataset is shown on the right, along with the smaller exploratory dataset on the left for reference. b, The full results of the mixed-effect model described in the Main Text are presented.

Extended Data Fig. 8 Models predicting implicit male-career association with proportion gender distinct labels and language career-gender association (Study 2).

We predict the magnitude of implicit male-career association by language with an additive linear model. Predictors are proportion of occupation terms that are gendered (‘Prop. Gendered Occup. Terms’) and language male-career association as measured by word embeddings of the IAT words (‘Male-Career Assoc.’). Model coefficients are shown for two models using estimates of language career-gender association from embedding models trained on Subtitle (a) and Wikipedia (b) corpora. The linear models account for 40.63% (Subtitle) and 45.32% (Wikipedia) of the variance in implicit male-career association. ‘Subt.’/ ‘Wiki.’ = Subtitle/Wikipedia corpora.

Extended Data Fig. 9 Gender associations in language and other psychological measures.

Several recent studies^6,35 have presented novel theories to account for cases of structural inequality related to gender. Both of these studies argue that psychological differences play a causal role in the emergence of structural inequality. Here, we show that degree of gender bias in language is correlated with these psychological differences at the country level, consistent with the idea that language experience could be playing a causal role in the emergence of psychological differences. a, Gender differences in preferences³⁵ (composite score of ‘six fundamental preferences with regard to social and nonsocial domains: willingness to take risks; patience, which captures preferences over the intertemporal timing of rewards; altruism; trust; and positive and negative reciprocity, which capture the costly willingness to reward kind actions or to punish unkind actions, respectively.’) as a function of language male-career association measured in the Subtitle corpus. These two measures are correlated (r(25) = 0.48 [0.12, 0.73],p= 0.01): Countries with greater differences in gender preferences also have greater gender bias present in their languages. We also find that per capita GDP⁴⁹ is correlated with language gender male-career association measured in both corpora (Wikipedia: r(35) = 0.64 [0.4, 0.8],p< .0001; Subtitle: r(31) = 0.58 [0.29, 0.77],p< .001). However, the magnitude of the male-career association in the language spoken in a country predicts the magnitude of the male-career association measured via the behavioral IAT, controlling for both national GDP and median country age, in an additive mixed-effect model. b, Gender difference in STEM Self Efficacy⁶ (‘The sex difference in self efficacy (boys - girls)’) as a function of male-career association measured in the Subtitle corpus. These two measures are correlated (r(28) = 0.59 [0.3, 0.79], p< .001): Countries with greater gender differences in self-efficacy also have greater gender bias present in their languages. Further, self-efficacy mediated the effect of language statistics on percentage of women in stem (path-ab = -0.33, p= 0.01), suggesting that language statistics could be critical causal factor underlying gender differences in STEM participation.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lewis, M., Lupyan, G. Gender stereotypes are reflected in the distributional structure of 25 languages. Nat Hum Behav 4, 1021–1028 (2020). https://doi.org/10.1038/s41562-020-0918-6

Download citation

Received: 24 June 2019
Accepted: 26 June 2020
Published: 03 August 2020
Issue Date: October 2020
DOI: https://doi.org/10.1038/s41562-020-0918-6

This article is cited by

Large language models know how the personality of public figures is perceived by the general public
- Xubo Cao
- Michal Kosinski
Scientific Reports (2024)
Human languages with greater information density have higher communication speed but lower conversation breadth
- Pedro Aceves
- James A. Evans
Nature Human Behaviour (2024)
A mentoring programme to spark girls’ interest in STEM
- Neta Blum
Nature Reviews Materials (2023)
Caring or Competent? Apparent Prioritization of Childcare Over Work Affects Evaluations and Stereotyping of Fathers
- Kelsey L. Neuenswander
- Elizabeth L. Haines
- Steven J. Stroessner
Sex Roles (2023)
Worth the Risk? Greater Acceptance of Instrumental Harm Befalling Men than Women
- Maja Graso
- Tania Reynolds
- Karl Aquino
Archives of Sexual Behavior (2023)