Abstract
Cultural stereotypes such as the idea that men are more suited for paid work and women are more suited for taking care of the home and family, may contribute to gender imbalances in science, technology, engineering and mathematics (STEM) fields, among other undesirable gender disparities. Might these stereotypes be learned from language? Here we examine whether gender stereotypes are reflected in the large-scale distributional structure of natural language semantics. We measure gender associations embedded in the statistics of 25 languages and relate these to data on an international dataset of psychological gender associations (N = 656,636). People’s implicit gender associations are strongly predicted by gender associations encoded in the statistics of the language they speak. These associations are further related to the extent that languages mark gender in occupation terms (for example, ‘waiter’/‘waitress’). Our pattern of findings is consistent with the possibility that linguistic associations shape people’s implicit judgements.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
The data that support the findings of this study are available at https://github.com/mllewis/IATLANG. Source data are provided with this paper.
Code availability
All code that supports the findings of this study is available at https://github.com/mllewis/IATLANG.
References
Gelman, S. A., Taylor, M. G., Nguyen, S. P., Leaper, C. & Bigler, R. S. Mother–child conversations about gender: understanding the acquisition of essentialist beliefs. Monogr. Soc. Res. Child Dev. 69, 1–142 (2004).
Bian, L., Leslie, S. J. & Cimpian, A. Gender stereotypes about intellectual ability emerge early and influence children’s interests. Science 355, 389–391 (2017).
Ceci, S. J. & Williams, W. M. Understanding current causes of women’s underrepresentation in science. Proc. Natl Acad. Sci. USA 108, 3157–3162 (2011).
Leslie, S. J., Cimpian, A., Meyer, M. & Freeland, E. Expectations of brilliance underlie gender distributions across academic disciplines. Science 347, 262–265 (2015).
Miller, D. I., Eagly, A. H. & Linn, M. C. Women’s representation in science predicts national gender–science stereotypes: Evidence from 66 nations. J. Educ. Psychol. 107, 631 (2015).
Stoet, G. & Geary, D. C. The gender-equality paradox in science, technology, engineering, and mathematics education. Psychol. Sci. 29, 581–593 (2018).
Dryer, M. S. & Haspelmath M. eds. WALS Online (Max Planck Institute for Evolutionary Anthropology, 2013); https://wals.info/
Rhodes, M. & Brickman, D. Preschoolers’ responses to social comparisons involving relative failure. Psychol. Sci. 19, 968–972 (2008).
Cimpian, A., Mu, Y. & Erickson, L. C. Who is good at this game? Linking an activity to a social category undermines children’s achievement. Psychol. Sci. 23, 533–541 (2012).
Cimpian, A. & Markman, E. M. The generic/nongeneric distinction influences how children interpret new information about social others. Child Dev. 82, 471–492 (2011).
Rhodes, M., Leslie, S. J., Yee, K. M. & Saunders, K. Subtle linguistic cues increase girls’ engagement in science. Psychol. Sci. 30, 455–466 (2019).
Greenwald, A. G., McGhee, D. E. & Schwartz, J. L. Measuring individual differences in implicit cognition: the implicit association test. J. Pers. Soc. Psychol. 74, 1464–1480 (1998).
Nosek, B. A., Banaji, M. R. & Greenwald, A. G. Harvesting implicit group attitudes and beliefs from a demonstration web site. Group Dyn. Theory Res. Pract. 6, 101–115 (2002).
Payne, B. K., Vuletich, H. A. & Brown-Iannuzzi, J. L. Historical roots of implicit bias in slavery. Proc. Natl Acad. Sci. USA 116, 11693–11698 (2019).
Hehman, E., Calanchini, J., Flake, J. K. & Leitner, J. B. Establishing construct validity evidence for regional measures of explicit and implicit racial bias. J. Exp. Psychol. Gen. 148, 1022–1040 (2019).
Charlesworth, T. E. & Banaji, M. R. Patterns of implicit and explicit attitudes: I. Long-term change and stability from 2007 to 2016. Psychol. Sci. 30, 174–192 (2019).
Firth, J. R. Studies in Linguistic Analysis (Philological Society, 1957).
Landauer, T. K. & Dumais, S. T. A solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychol. Rev. 104, 211–240 (1997).
Lund, K. & Burgess, C. Producing high-dimensional semantic spaces from lexical co-occurrence. Behav. Res. Methods Instrum. Comput. 28, 203–208 (1996).
Lenci, A. Distributional semantics in linguistic and cognitive research. Ital. J. Linguist. 20, 1–31 (2008).
Caliskan, A., Bryson, J. J. & Narayanan, A. Semantics derived automatically from language corpora contain human-like biases. Science 356, 183–186 (2017).
Bhatia, S. The semantic representation of prejudice and stereotypes. Cognition 164, 46–60 (2017).
von der Malsburg, T., Poppels, T. & Levy, R. Implicit gender bias in linguistic descriptions for expected events: The cases of the 2016 US and 2017 UK election. Psychol. Sci. 31, 115–128 (2020).
Garg, N., Schiebinger, L., Jurafsky, D. & Zou, J. Word embeddings quantify 100 years of gender and ethnic stereotypes. Proc. Natl Acad. Sci. USA 115, E3635–E3644 (2018).
Greenwald, A. G. An AI stereotype catcher. Science 356, 133–134 (2017).
Lupyan, G. & Lewis, M. From words-as-mappings to words-as-cues: the role of language in semantic knowledge. Lang. Cogn. Neurosci. 34, 1319–1337 (2017).
Greenwald, A. G., Nosek, B. A. & Banaji, M. R. Understanding and using the implicit association test: I. an improved scoring algorithm. J. Pers. Soc. Psychol. 85, 197 (2003).
Forscher, P. S. et al. A meta-analysis of procedures to change implicit measures. J. Pers. Soc. Psychol. 117, 522–559 (2019).
CIA. The CIA World Factbook 2017 (2016); https://www.cia.gov/library/publications/the-world-factbook/index.html
Mikolov, T., Chen, K., Corrado, G. & Dean, J. Efficient estimation of word representations in vector space. Preprint at arXiv https://arxiv.org/abs/1301.3781 (2013).
van Paridon, J. & Thompson, B. subs2vec: word embeddings from subtitles in 55 languages. Preprint at OSF https://doi.org/10.31234/osf.io/fcrmy (2019).
Lison, P. & Tiedemann, J. OpenSubtitles2016: extracting large parallel corpora from movie and TV subtitles. In Proc. 10th International Conference on Language Resources and Evaluation (ELRA, 2016).
Bojanowski, P., Grave, E., Joulin, A. & Mikolov, T. Enriching word vectors with subword information. Trans. Assoc. Comput. Ling. 5, 135–146 (2017).
Hussey, I., et al. The Attitudes, Identities, and Individual differences (AIID) Study and Dataset https://osf.io/pcjwf/ (2019).
Falk, A. & Hermle, J. Relationship of gender differences in preferences to economic development and gender equality. Science 362, eaas9899 (2018).
Lane, K. A., Banaji, M. R., Nosek, B. A. & Greenwald, A. G. in Implicit Measures of Attitudes (eds. Wittenbrink, B. & Schwarz, N.) 59–102 (2007).
Fazio, R. H. & Olson, M. A. Implicit measures in social cognition research: their meaning and use. Annu. Rev. Psychol. 54, 297–327 (2003).
Payne, B. K., Vuletich, H. A. & Lundberg, K. B. The bias of crowds: how implicit bias bridges personal and systemic prejudice. Psychol. Inq. 28, 233–248 (2017).
Marian, V. & Kaushanskaya, M. Language context guides memory content. Psychon. Bull. Rev. 14, 925–933 (2007).
Athanasopoulos, P. Cognitive representation of colour in bilinguals: the case of Greek blues. Biling. Lang. Cogn. 12, 83–95 (2009).
Scott, G. G., Keitel, A., Becirspahic, M., Yao, B. & Sereno, S. C. The Glasgow norms: ratings of 5,500 words on nine scales. Behav. Res. Methods 51, 1258–1270 (2019).
Joulin, A., Grave, E., Bojanowski, P. & Mikolov, T. Bag of tricks for efficient text classification. Preprint at arXiv https://arxiv.org/abs/1607.01759 (2016).
Simons, G. F. & Charles, D. F. (eds) Ethnologue: Languages of the World (SIL International, 2018).
Burnard, L. Users Reference Guide for the British National Corpus (Oxford Univ. Computing Services, 1995).
Davies M. The Corpus of Contemporary American English (2008); https://corpus.byu.edu/coca/
Bates, D., Mächler, M., Bolker, B. & Walker, S. Fitting linear mixed-effects models using lme4. J. Stat. Softw. 67, 1–48 (2015).
Misersky, J. et al. Norms on the gender perception of role nouns in Czech, English, French, German, Italian, Norwegian, and Slovak. Behav. Res. Methods 46, 841–871 (2014).
Haspelmath, M., Dryer, M. S., Gil, D. & Comrie, B. (eds) The World Atlas of Language Structures Online (Max Planck Institute for Evolutionary Anthropology, 2008); http://wals.info
The World Bank. World Development Indicators (2017); http://data.worldbank.org/indicator/NY.GDP.PCAP.CD
Acknowledgements
The authors acknowledge National Science Foundation Perception, Action and Cognition 1734260 for funding support for this work. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.
Author information
Authors and Affiliations
Contributions
M.L. and G.L. designed the research and wrote the manuscript. M.L. conducted the data analysis.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Peer review information Primary handling editor: Aisha Bradshaw
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Sample size and demographic characteristics of Project Implicit data.
a, Number of participants by country after exclusions (note that US participants are excluded from the visualization because of the large sample size; N = 545,673). Our final sample included 657,335 participants from 39 countries (see Supplementary Information for exclusion criteria). b, Gender distribution of participants by country after exclusions. Across countries, there tended to be more female participants relative to male participants (M = 0.64 proportion females; SD = 0.06). c, Age distribution of participants by country after exclusions. Ranges correspond to 95% CIs. Red points show median age by country.
Extended Data Fig. 2 Models predicting IAT effect size at the participant level.
Median country age predicts IAT effect size over and above participant age at the participant level: Countries with older populations tend to have individuals with stronger implicit career-gender associations, even after controlling for participant age. The table presents an additive mixed-effect regression predicting IAT D-score at the participant level with participant age and median country age, controlling for participant sex and trial order. The model includes by-country random intercepts. b, The relationship between median country age and IAT effect size holds, even after controlling for the percentage women in STEM. The table presents an additive mixed effect model predicting IAT D-score at the participant level with participant age, median country age and percentage women in STEM in country, controlling for participant sex and trial order. The model includes by-country random intercepts.
Extended Data Fig. 3 Geographic distribution of IAT scores.
a, Residualized implicit career-gender association (IAT score) shown by country. IAT scores are residualized for participant age, gender, and task order (N = 657,335). Larger values (blue) indicate a larger bias to associate men with the concept of career and women with the concept of family. Countries in white correspond to countries for which there was insufficient data to estimate the country-level career-gender association. Inset shows IAT scores for European countries only. Note that while Hindi is identified as the most frequently spoken language in India, India is highly multilingual and so Hindi embeddings may be a poor representation of the linguistic statistics for speakers in India as a group. b, Distribution of raw (unresidualized) implicit career-gender association (IAT D-score) across countries. All countries in our sample showed a tendency to associate men with career and women with family.
Extended Data Fig. 4 Replication of Caliskan et al. (2017) with our corpora.
We replicate the original set of Caliskan, Bryson, and Narayanan (2017; CBN)21 findings using the English-trained versions of the models used in our main analyses (models trained on the Wikipedia and Subtitles corpora). For each model, we calculate an effect size for each of the 10 IAT types reported in CBN: flowers/insects-pleasant/unpleasant, instruments/weapons-pleasant/unpleasant, European-American/Afro-American-pleasant/unpleasant, males/females-career/family, math/arts-male/female, science/arts-male/female, mental-disease/physical-disease-permanent/temporary, and young/old-pleasant/unpleasant (labelled as Word-Embedding Association Test (WEAT) 1-10 in CBN). We calculate the bias using the same effect size metric described in CBN, a standardized difference score of the relative similarity of the target words to the target attributes (that is relative similarity of male to career vs. relative similarity of female to career). This measure is analogous to the behavioural effect size measure where larger values indicate larger bias. The figure shows the effect size measure derived from the English Wikipedia corpus (a) and the English Subtitle corpus (b) plotted against effect size estimates reported by CBN from two different models (trained on Common Crawl and Google News corpora). Point color corresponds to bias type, and point shape corresponds to the two CBN models. With the exception of biases related to race and age, effect sizes from our corpora are comparable to those reported by CBN. In particular, for the gender-career IAT-the bias relevant to our current purposes-we estimate the effect size to be 1.78 (Wikipedia)/1.65 (Subtitle), while CBN estimates it to be 1.81 (Common Crawl)/1.89 (Google News).
Extended Data Fig. 5 Pairwise Correlations partialing out the effect of median country age.
Partial correlations (Pearson’s r) for all measures in Study 1b and 2 using language as the unit of analysis, controlling for median country age. 95% CIs are given in brackets followed by the corresponding p-value. Implicit and explicit male-career association measures are residualized for participant age, gender, and task order. ‘Assoc.’ = association; ‘Lang.’= language; ‘Subt.’/ ‘Wiki.’ = Subtitle/Wikipedia corpora; ‘Prop. Gendered Occup. Terms.’ = proportion of occupation terms that are gendered. ‘Occup. Genderness’ = degree to which occupation terms in a language tend to be associated with a particular gender in the language statistics.
Extended Data Fig. 6 Replication of Study 1b on Wikipedia corpus excluding translations.
Both the Subtitle and Wikipedia corpora likely contain some documents that are translated from other languages (for example, the Wikipedia article on ‘Paris’ is written in French and then translated into English). The parallel content across languages allows us to estimate the gender bias in language statistics, while holding content constant across languages. Nevertheless, content may itself be a driver of gender bias (for example one language may have more articles about male politicians relative to another). To understand the contribution of language-specific content on gender bias, we constructed a corpus of Wikipedia articles in each language that were originally written in the target language (that is, untranslated), and trained word embedding models on the corpus in each language (see Supplemental Methods for details). We then used these models to calculate by-language male-career association scores using the same procedure as in Study 1b. Using models trained on the untranslated corpora, we replicate the key finding from Study 1b showing a positive correlation between the bias measured behaviorally with the IAT and measured in language (r = .60; p = .002; N participants = 656,636). Notably, the effect size is somewhat larger relative to the other two corpora types, presumably because additional bias is introduced by allowing the corpus content to vary across languages.
Extended Data Fig. 7 Models examining UK-US bias difference in AIID dataset (Study 1c).
a, The exact pre-registered analysis of Study 1c is presented. Pairwise correlations between all variables (language bias, behavioral bias, and UK-US difference measures) are shown, averaging across estimates of language bias from the 5 model runs (N participants = 27,045). Error bars are 95% CIs. As stated in the pre-registration, the key test of our hypothesis is that the correlation between the UK - US linguistic difference (‘Language Bias Difference’) and the UK - US behavioral difference (‘Behavioral Bias Difference’) is greater than 0 (shown in red). That data are consistent with this prediction. The confirmatory dataset is shown on the right, along with the smaller exploratory dataset on the left for reference. b, The full results of the mixed-effect model described in the Main Text are presented.
Extended Data Fig. 8 Models predicting implicit male-career association with proportion gender distinct labels and language career-gender association (Study 2).
We predict the magnitude of implicit male-career association by language with an additive linear model. Predictors are proportion of occupation terms that are gendered (‘Prop. Gendered Occup. Terms’) and language male-career association as measured by word embeddings of the IAT words (‘Male-Career Assoc.’). Model coefficients are shown for two models using estimates of language career-gender association from embedding models trained on Subtitle (a) and Wikipedia (b) corpora. The linear models account for 40.63% (Subtitle) and 45.32% (Wikipedia) of the variance in implicit male-career association. ‘Subt.’/ ‘Wiki.’ = Subtitle/Wikipedia corpora.
Extended Data Fig. 9 Gender associations in language and other psychological measures.
Several recent studies6,35 have presented novel theories to account for cases of structural inequality related to gender. Both of these studies argue that psychological differences play a causal role in the emergence of structural inequality. Here, we show that degree of gender bias in language is correlated with these psychological differences at the country level, consistent with the idea that language experience could be playing a causal role in the emergence of psychological differences. a, Gender differences in preferences35 (composite score of ‘six fundamental preferences with regard to social and nonsocial domains: willingness to take risks; patience, which captures preferences over the intertemporal timing of rewards; altruism; trust; and positive and negative reciprocity, which capture the costly willingness to reward kind actions or to punish unkind actions, respectively.’) as a function of language male-career association measured in the Subtitle corpus. These two measures are correlated (r(25) = 0.48 [0.12, 0.73],p= 0.01): Countries with greater differences in gender preferences also have greater gender bias present in their languages. We also find that per capita GDP49 is correlated with language gender male-career association measured in both corpora (Wikipedia: r(35) = 0.64 [0.4, 0.8],p< .0001; Subtitle: r(31) = 0.58 [0.29, 0.77],p< .001). However, the magnitude of the male-career association in the language spoken in a country predicts the magnitude of the male-career association measured via the behavioral IAT, controlling for both national GDP and median country age, in an additive mixed-effect model. b, Gender difference in STEM Self Efficacy6 (‘The sex difference in self efficacy (boys - girls)’) as a function of male-career association measured in the Subtitle corpus. These two measures are correlated (r(28) = 0.59 [0.3, 0.79], p< .001): Countries with greater gender differences in self-efficacy also have greater gender bias present in their languages. Further, self-efficacy mediated the effect of language statistics on percentage of women in stem (path-ab = -0.33, p= 0.01), suggesting that language statistics could be critical causal factor underlying gender differences in STEM participation.
Supplementary information
Supplementary Information
Supplementary Methods, Supplementary Fig. 1 and Supplementary Tables 1–3.
Supplementary Table 1
Most frequent language spoken in each country in our sample.
Source data
Source Data Fig. 1
Statistical source data
Source Data Fig. 2
Statistical source data
Source Data Fig. 3
Statistical source data
Source Data Extended Data Fig. 1
Statistical source data
Source Data Extended Data Fig. 2
Statistical source data
Source Data Extended Data Fig. 4
Statistical source data
Source Data Extended Data Fig. 6
Statistical source data
Source Data Extended Data Fig. 7
Statistical source data
Source Data Extended Data Fig. 8
Statistical source data
Source Data Extended Data Fig. 9
Statistical source data
Rights and permissions
About this article
Cite this article
Lewis, M., Lupyan, G. Gender stereotypes are reflected in the distributional structure of 25 languages. Nat Hum Behav 4, 1021–1028 (2020). https://doi.org/10.1038/s41562-020-0918-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41562-020-0918-6
This article is cited by
-
Large language models know how the personality of public figures is perceived by the general public
Scientific Reports (2024)
-
Human languages with greater information density have higher communication speed but lower conversation breadth
Nature Human Behaviour (2024)
-
A mentoring programme to spark girls’ interest in STEM
Nature Reviews Materials (2023)
-
Caring or Competent? Apparent Prioritization of Childcare Over Work Affects Evaluations and Stereotyping of Fathers
Sex Roles (2023)
-
Worth the Risk? Greater Acceptance of Instrumental Harm Befalling Men than Women
Archives of Sexual Behavior (2023)