Abstract
Fewer women than men pursue careers in science, technology, engineering and mathematics (STEM), despite girls outperforming boys at school in the relevant subjects. According to the ‘variability hypothesis’, this overrepresentation of males is driven by gender differences in variance; greater male variability leads to greater numbers of men who exceed the performance threshold. Here, we use recent metaanalytic advances to compare gender differences in academic grades from over 1.6 million students. In line with previous studies we find strong evidence for lower variation among girls than boys, and of higher average grades for girls. However, the gender differences in both mean and variance of grades are smaller in STEM than nonSTEM subjects, suggesting that greater variability is insufficient to explain male overrepresentation in STEM. Simulations of these differences suggest the top 10% of a class contains equal numbers of girls and boys in STEM, but more girls in nonSTEM subjects.
Introduction
A child entering school has endless answers to the question ‘what do you want to be when you grow up?’ By the end of school, these have narrowed to a set of career aspirations that are consistent with his or her selfconcept (the way an individual perceives themselves, and believes they are perceived by others^{1}). If the child is a girl, then she is likely to graduate with career aspirations with lower earning potential than a male classmate^{2}. This phenomenon contributes to ‘occupational segregation’, and there are numerous incentives to reduce its prevalence. Schooling has a strong influence on the career aspirations of students^{3}, so addressing gender differences in the workforce requires that we understand how gender affects school achievement.
Selfconcept is heavily influenced by school achievement^{1,4}, and highperforming students are more likely to pursue wellpaid careers, such as science, technology, engineering and mathematics (STEM)based jobs^{5}. Girls tend to earn higher school grades than boys, including in STEM subjects^{6}, so why does this advantage not transfer into the workforce? The variability hypothesis, also called the greater male variability hypothesis, has been used to explain this apparent contradiction^{7}—it is based on the tendency for males to show greater variability than females for psychological traits^{8} (and for other traits across multiple species^{9}), leading to relatively fewer females with exceptional ability^{10}. However, the gender gap in employment within many highly paid occupations exceeds gender differences in variability (e.g. some mathintensive occupations employ far fewer women than the proportion of girls who score in the top 1% of maths tests^{11}). Therefore, occupational segregation cannot be simply caused by fewer women having the requisite ability for highstatus jobs.
Girls are susceptible to conforming to stereotypes (stereotype threat^{12}) in the traditionally maledominated fields of STEM, and girls who try to succeed in these fields are hindered by backlash effects^{13}. STEM are highpaying fields that employ fewer women than men^{14,15}, and also require a high level of mathematical ability^{16}. Evidence from standardised tests administered to children and adolescents indicates a greater gender difference in variation in performance in STEM subjects than other subjects^{17,18,19}, and an excess of males amongst the topachieving students^{20,21,22}. Therefore, a girl who performs well at school may notice that a greater proportion of the students who do better than her in mathematics and science classes are male, when compared to the proportion in other subjects. This, when combined with stereotype threat and the risk of backlash for behaving against gender stereotypes^{13}, could deter girls from pursuing a STEMrelated career. Based on this hypothesis, and assuming equivalency of gender differences for standardised tests and class grades, we present an illustration of the predicted grade distributions for female and male students in Fig. 1.
Gender differences in variability have been tested using scores on standardised tests^{19,23}, but we are unaware of any study describing gender differences in the variability of teacherassigned grades. While there are moderatetostrong correlations (sensu^{24}) between grades and test scores^{25,26,27,28}, there is also a stark gender difference. Girls tend to receive lower test scores relative to their school grades, whereas boys receive higher test scores relative to their school grades. There are multiple conjectures to explain this discrepancy in mean gender differences between tests and grades (e.g. on average, girls behave better, which gives them an advantage in grades, but they fare worse when tested on novel material that was not covered in class)^{29}. Regardless of the source of these differences, teacherassigned grades are likely to affect students’ lives, and it is a reasonable conjecture that they have a greater impact on students’ academic selfconcept than standardised test scores^{1}. Furthermore, grades are at least as good a predictor of success at university (measured by grade point average and graduation rate)^{30,31}. Therefore, if gender differences in variability were impacting girls’ decisions to pursue STEM, we would expect to see these differences reflected in school grades.
Here, we present a systematic metaanalysis on the effect of gender on variance in academic achievement using teacherassigned grades. While grades are a more subjective measurement than test scores, we also include data from university students, whose grades are less affected by teachers’ assessment of behaviour. While earlier metaanalyses have examined how mean academic achievement differs between the sexes^{6,32,33}, mean and variance differences should be examined together, as their magnitudes can be correlated (mean–variance relationship^{34}). Fortunately, a recently published method allows for a metaanalytic comparison of variances that takes into account any mean–variance relationship^{35}.
Based on the variability hypothesis, we expected female grades to be less variable than those of males. To test this hypothesis, we extended a previous metaanalysis by Voyer and Voyer^{6} on differences in the mean grades of students from ages 6 through to university. We used a more appropriate effect size to compare means, and another effect size to compare variances (Methods). We found that grades for female students were less variable than male grades. Then, focusing on school students (a relatively unbiased sample compared to university students), we found that: (1) the gender difference in variability has not changed noticeably over the last 80 years (1931–2013); (2) gender differences in grade variability are already present in childhood, and do not increase during adolescence; (3) finally, gender differences in grade variance were larger for STEM than nonSTEM subjects, contrary to our expectations shown in Fig. 1.
Results
Description of dataset
Our dataset contained 346 effects sizes extracted from 227 studies (Supplementary Data 1), representing 820,158 female and 826,629 male students. Fiftytwo percent of the effect sizes were for ‘global’ grades (i.e. GPA), 26% were for STEM (mathematics and science), 19% for nonSTEM (language, humanities, social science) and 3% for miscellaneous subjects. North American data dominated the dataset, with 70% of the effect sizes. Within the North American sample, 24% of studies were on a racially diverse cohort of students, 23% were on majority White/Caucasian students, 9% were on majority Black/African American students, 1% were on majority Hispanic/Latino students, and 43% of studies did not provide information on the racial composition of students. In total, 62% of the effect sizes came from school students (247,582 girls and 253,073 boys), and the remainder from university students. The original grades were awarded on a few different grading scales (Supplementary Figs. 1 and 2).
Gender differences in variability
Overall, girls had significantly higher grades than boys by 6.3% (natural logarithm of response ratio (lnRR_{overall}(mean): 0.061, 95% confidence interval, CI: 0.052 to 0.070), with 10.8% less variation among girls than among boys (natural logarithm coefficient of variation ratio (lnCVR_{overall}(variance): −0.114, CI: −0.133 to −0.095) (Supplementary Table 2; Fig. 2). The gender differences in mean grades were significantly larger at school than at university by 2.7% (lnRR_{school–uni diff}: −0.028, CI: −0.044 to −0.011; Supplementary Table 3). The gender differences in variation were also larger at school than at university, but the difference of 4.2% was nonsignificant (lnCVR_{school−uni diff}: 0.041, CI: 0.002 to 0.080; Supplementary Table 3). To test for moderating factors, we only used the school data in subsequent analyses. We excluded university students because there is selfselection among students in terms of who applies for (and is then accepted at) a university. This selection process makes undergraduates and postgraduates unrepresentative of the general population. The results from analyses for the whole dataset and for the university subset are provided in Supplementary Tables 2–10, 12, and 15–25 (the university subset also had small sample sizes for STEM and nonSTEM subjects, making results from moderator analyses sensitive to outlier studies).
Moderating effects of study year and student age
The higher mean and lower variability of girls’ than boys’ grades have not changed significantly over the past eight decades (Supplementary Table 4, Supplementary Fig. 8A: lnRR_{study year scaled (slope)}: 0.019, CI: −0.017 to 0.055; Supplementary Table 4; Supplementary Fig. 8D: lnCVR_{study year scaled (slope)}: −0.029, CI: −0.083 to 0.025). Within genders, variability in grades showed a nonsignificant trend towards decreasing over time, but significantly more so for girls than boys (Supplementary Table 5, Supplementary Fig. 8G: natural logarithms of the coefficient of variation (lnCV)_{study year boys–girls (slope diff)}: 0.032, CI: 0.004 to 0.060). Student age did not affect the gap between girls and boys mean grades or the gender difference in grade variability (Supplementary Fig. 9, Supplementary Table 6). Within genders, variability in grades showed a nonsignificant tendency to decrease as students aged (Supplementary Table 7, Supplementary Fig. 9G: lnCV_{student age boys–girls (slope)}: 0.010, CI: −0.067 to 0.087), and to decrease faster for boys than girls (Supplementary Table 7, Supplementary Fig. 9G: lnCV_{student age boys−girls (slope diff)}: −0.035, CI: −0.062 to −0.007).
Moderating effects of subject type: STEM versus nonSTEM
Girls’ significant advantage of 7.8% in mean grades in nonSTEM was more than double their 3.1% advantage in STEM. (Fig. 2a, Supplementary Table 8: nonSTEM: lnRR_{nonSTEM}: 0.075, CI: 0.049 to 0.102; STEM: lnRR_{STEM}: 0.031, CI: 0.011 to 0.051; the difference: lnRR_{nonSTEMSTEM diff}: −0.044, CI: −0.065 to −0.024). Variation in grades among girls was significantly lower than that among boys in every subject type, but the sexes were more similar in STEM than nonSTEM subjects (Fig. 2b, Supplementary Table 9; STEM: 7.6% less variable grades; lnCVR_{STEM}: −0.079, CI: −0.115 to −0.043; nonSTEM: 13.3% less variable grades; lnCVR_{nonSTEM}: −0.149, CI: −0.199 to −0.099; the difference: lnCVR_{nonSTEMSTEM diff}: 0.070, CI: 0.028 to 0.111). The greater gender similarity in variability in STEM was due to girls’ grades being significantly more variable in STEM than nonSTEM subjects (Fig. 2c, Supplementary Table 10, lnCV_{girls STEM–nonSTEM diff}: −0.101, CI: −0.170 to −0.033). In contrast, the variability of boys’ grades did not differ significantly between STEM and nonSTEM subjects (Fig. 2c, Supplementary Table 10, lnCV_{boys STEM–nonSTEM diff}: −0.030, CI: −0.102 to 0.042).
The small values of all metaanalytic estimates of gender differences in means and variances imply a large overlap in the grade distributions between the two sexes. The simulated distributions of girls’ and boys’ grades in Fig. 3 show the distributions of grades overlap more in STEM (94.2%) than nonSTEM (88.2%) subjects. For example, within the top 10% of the distribution the gender ratio is even for STEM, and slightly femaleskewed for nonSTEM. Results of additional analyses are presented in Supplementary Tables 13–25.
Discussion
Our overall result was consistent with elements of the variability hypothesis: female students’ grades were less variable than those of male students, but in contrast to expectations, the greatest difference in variability occurred in nonSTEM subjects. Average female grades were also higher than males, corroborating the findings of Voyer and Voyer^{6} (Fig. 2). Gender differences in grade variability of school pupils was unaffected by their age, weakly affected by the year of study, and most strongly affected by whether or not the subject was STEM.
From grade one onward, we found that girls’ grades were less variable than those of boys. Across the last 80 years, the variability in school grades has slightly decreased for both boys and girls (albeit slightly faster for girls). This decline might reflect increased student performance^{36}, or greater reluctance to fail students, i.e. grade inflation^{37}. These scenarios assume that there is a ceiling effect on grades, whereby variance is reduced because weaker students are shifted upwards, whereas the highest performing students are bumped up against the ‘ceiling’ of the highest possible grade awarded on the grading scale. Although we do not see strong evidence for a ceiling effect in our dataset (Supplementary Fig. 5), below we discuss how the ceiling affect could underestimate the magnitude of gender differences in variability.
Contrary to our expectations (Fig. 1), and those of many others^{10}, the gender difference in variability was smaller for STEM than nonSTEM subjects (Fig. 2). When the small gender gap in grade variability is combined with the small gender difference in mean grades, it indicates that in STEM subjects, the distributions of girls’ and boys’ grades are more similar than in nonSTEM subjects (Fig. 3). One possible explanation is that boys’ are more affected by the ceiling affect in STEM than nonSTEM. For example, if a grading scale cannot distinguish between students in the top 1% or top 0.1%, and if there exists a male skew in the top 0.1% only in STEM but non in nonSTEM, then gender differences in variance would be underestimated in STEM. Wai et al.^{22} tried to get around this ceiling effect by analysing seventhgrade test scores explicitly designed to differentiate between exceptional students. They found a female:male ratio of 0.25 in the top 1% of students in STEM subjects, which is more imbalanced than our data suggests (Fig. 3c). While this finding is intriguing, it should be noted that STEM careers are not restricted to the exceptionally talented (although fields that subscribe to the belief that talent is important for success tend to employ fewer women^{38}). Therefore, while our data does not preclude a gender gap among the exceptionally talented, it nevertheless indicates a practical similarity in girls’ and boys’ academic achievements, which are likely to provide an imperfect but valid measure of the ability to pursue STEM (Fig. 3).
Because students’ grades impact their academic selfconcept and predict their future educational attainment (e.g. refs. ^{1,5}), we might therefore predict roughly equal participation of men and women in STEM careers. However, the equivalence of girls’ and boys’ performance in STEM subjects in school does not translate into equivalent participation in STEM later in life. Is this because grades are not measuring the abilities required to succeed in STEM? Or does the relative advantage girls have over boys in nonSTEM subjects at school lead them to rationally favour career choices with fewer competitors? We consider each of these questions in turn.
We analysed school grades, where girls show a wellestablished advantage over boys^{25}, whereas most previous tests of gender differences in variability have focussed on test scores^{18,19,23}. To explore whether the smaller variability difference in STEM compared to nonSTEM is confined to school grades, we performed a supplementary analysis of a large international dataset of standardised test scores of 15yearolds (see Supplementary Note 2 for details). This supplementary analysis found gender differences in variance that were consistent across subjects; girls’ test scores were more consistent than boys, with equivalent gender differences in nonSTEM and STEM subjects (Supplementary Fig. 11). However, girls only showed a mean advantage in nonSTEM. Therefore, it appears that the mean differences between test scores and grades are caused by shifts in the position of girls’ and boys’ distributions, rather than changes in the shape of distributions in STEM compared to nonSTEM (girls’ distributions of both grades and test scores are narrower than boys’ distributions, but the difference is not more pronounced in STEM). If girls perceive they have fewer competitors in nonSTEM subjects because, on average, fewer boys perform better than girls, this might lead to a preference for nonSTEM over STEM careers^{39,40}.
Gender differences in expectations of success can arise due to backlash effects against individuals who defy the stereotype of their gender, and/or due to gender differences in ‘abilities tilt’ (having comparatively high ability in one discipline compared to another). Women in maledominated pursuits, including STEM, face a paradox: if they conform to gender stereotypes, they might be perceived as less competent, but if they defy gender stereotypes and perform ‘like a man’, then their progress can be halted by ‘backlash’ from both men and women^{13,41}. Furthermore, analyses of test scores have revealed that girls are more likely than boys to show an abilities tilt in the direction favouring nonSTEM subjects (i.e. receive higher scores in nonSTEM compared to STEM)^{42}. Our data are consistent with girls showing an ability tilt in the direction of nonSTEM subjects, although we cannot compare individual student grades (Supplementary Table 11). Intriguingly, there is evidence that balanced highachieving students—who possess the potential to succeed in disparate fields—prefer nonSTEM careers^{43}, and that girls are more likely to be balanced than boys, at least among high achievers^{44}. A female skew towards balanced abilities could be a manifestation of them showing lower levels of betweendiscipline variability (i.e. greater consistency across disciplines). Gender differences in betweendiscipline variability, rather than withindiscipline variability, is an interesting avenue for future research.
A girl’s answer to the question of ‘what do you want to be when you grow up?’ will be shaped by her own beliefs about gender, and the collective beliefs of the society she is raised in^{45}. While our results support the variability hypotheses, we have shown that the magnitude of the gender gap in STEM grades is small, and only becomes maleskewed at the very top of the distribution (Fig. 3). Therefore, by the time a girl graduates, she is just as likely as a boy to have earned high enough grades to pursue a career in STEM. When she evaluates her options, however, the STEM path is trod by more male competitors than nonSTEM, and presents additional internal and external threats due to her and societies’ gendered beliefs (stereotype threat and backlash effects). To increase recruitment of girls into STEM, this path should be made more attractive for them. A future study could estimate how maleskewed we would expect STEM careers to be based solely on gender differences in academic achievement, by quantifying the academic grades of current STEM employees. Our study focussed on gender differences in academic achievement, but understanding gender differences in any trait would be improved by simultaneously comparing gender differences in mean and in variability.
Methods
Literature search and study selection
We performed a systematic literature search following guidelines from PRISMA (Preferred Reporting Items for Systematic Reviews and MetaAnalyses^{46}). The PRISMA flow diagram depicting our search and screening process is shown in Fig. 4. We broadly followed the search protocol used by Voyer and Voyer^{6}. We searched three databases for articles published between August 2011 and May 2015: ERIC, SCOPUS and ISI Web of Science. We did not use the PsychINFO or PsycARTICLES databases used by Voyer and Voyer^{6}, as they were malfunctioning at the time of our search. We searched for articles containing the term ‘school grade/s’, ‘school achievement/s’, ‘school mark/s’ or ‘grade point average/s’. The exact search strings used for each database and additional details of the literature search are provided in Supplementary Methods. While there was no clear signal of publication bias in the school subset (Supplementary Tables 12, 25), a limitation of our literature search is that we did not actively search for unpublished studies or theses.
Eligibility criteria
To be included for data extraction at the fulltext screening phase, studies needed to present teacherassigned grades or global GPA (grade point average, i.e. grades averaged across many subjects) for a cohort containing both male and female students. The students could be from grade one and above. These criteria excluded kindergarten and singlesex studies, and selfreported grades or test data. Because of sociocultural effects on gender differences, we required samples of students that took classes together; we therefore excluded online courses. We also excluded retrospective studies comparing adults that were not in the same study cohort. Where longitudinal data was reported, we included only the first year of data that met the inclusion criteria. In the case of studies that reported high school GPA for an undergraduate sample, we only included the university grades, if reported, and we deemed the high school grades ineligible. This is because the high school grades of groups of undergraduates do not come from the same cohort—they represent a subsample of students from disparate high schools, and only those students who performed well enough to attend university. When we identified studies that reported data from the same large database, we only included the study with the largest sample size, and excluded the rest to avoid pseudoreplication. The list of excluded studies, with reasons for exclusion, is presented in Supplementary Data 2.
Data extraction and coding
From the original papers, we extracted the sample sizes, means, and standard deviations for male and female academic grades. For the studies used by Voyer and Voyer^{6}, we attempted to contact authors if any of these data were missing. All contacted authors were also asked to provide any additional data (published or unpublished) they might have available. If we received no response after 1 month, we sent a followup email. Only unstandardised grade data was collected. When presented data was standardised, we contacted authors to request the corresponding unstandardised values. For the studies published after August 2011, we only contacted authors if variance data was missing. In total, data from authors was acquired for 15 studies, including two unpublished studies.
Moderator variables
In addition to the descriptive statistics for grades of males and females, we extracted a number of moderator variables, all of which are presented in Supplementary Table 1. We generally followed the variables used by Voyer and Voyer^{6} (e.g. racial composition), as well as recording additional information (e.g. age of students). An analysis of the moderating effect of racial composition on the gender gap in school grades is presented in the Supplementary Note 1 and Supplementary Tables 1, 3. Continuous moderators were scaled and centred (resulting in mean of 0, and standard deviation of 1) prior to the analyses. We used multiple imputations to fill in missing values of study year and students’ mean age (details in Supplementary Methods).
Effect sizes
Using standardised effect sizes allowed us to combine original data collected on different scales (grades were recorded on different scales among included studies). To test for differences in mean grades between genders, we used the natural logarithm response ratio (hereafter referred to as lnRR), and its corresponding sampling error variance \({{s}}_{{\mathrm{lnRR}}}^2\)^{47}.
where:
\(\bar x_{\mathrm{f}}\) and \(\bar x_{\mathrm{m}}\) = the mean grade of female and male students, respectively,
\(s_{\mathrm{m}}^2\) and \(s_{\mathrm{f}}^2\) = the variance in grades of female and male students, respectively,
n_{m} and n_{f} = the number of male and female students in each sample, respectively.
Positive values of lnRR imply greater mean grades for girls.
We extended the literature search in Voyer and Voyer^{6} by 5 years, and our analysis of mean grades differed from theirs in two ways: (1) we included only studies where we could compare variances, and; (2) we used lnRR instead of the standardised mean difference in performance (SMD or Hedges g^{24}; see Supplementary Equations 1–4). We chose to use lnRR because, unlike SMD, it is unaffected by differences in variance (standard deviation) between groups. However, for comparison with Voyer’s^{6} results, we have repeated the lnRR analyses using SMD as the effect size. The results for both lnRR and SMD analyses—which are very similar to each other—are presented in the Supplementary Figure 4, and Supplementary Tables 2–4, 6, 8, 12, 13, 16, 19, 22, 25.
To assess differences in variance of grades of boys and girls, we used the natural logarithm coefficient of variation ratio (lnCVR) and its associated sampling error variance \({{s}}_{{{\mathrm{lnCVR}}}}^2\)^{35}.
where:
CV_{f} and CV_{m} = the coefficient of variation for males and females \(\left( {\frac{s}{{\bar x}}} \right).\)
\(\rho _{\ln \bar x_C,\ln s_C}\) and \(\rho _{\ln \bar x_E,\ln s_E}\) = the correlations between the logged means and standard deviations of the male and female students, respectively.
All other notation is described above. Positive values of lnCVR imply greater variance in girls’ grades relative to boys’ grades. By dividing the female and male standard deviations by their respective means, we controlled for the effect of a proportional relationship (the mean–variance relationship) between the standard deviation and the mean. To test how the variance in grades has changed over time, we also computed the natural logarithm of the coefficient of variation (lnCV) for boys and girls separately, and its associated sampling error variance^{35}:
All notation as described above. For the same mean, a more negative value of lnCV implies a smaller variance.
Statistical analyses
We performed our main analyses on lnCVR and lnRR, and their associated error terms, using the rma.mv function in the R (v.3.4.2) package metafor v.2.00^{48}. Onethird of effect sizes were not independent, because they came from the same study and/or the same cohort of students. We therefore included cohort ID and comparison ID as random effects in each model (the levels of study ID overlapped too much with cohort ID to model both levels simultaneously; e.g. in the school data, 120 studies and 141 cohorts, respectively). We also modelled covariance between effect sizes, assuming that effect sizes from the same cohort had 0.5 correlations between grades in different subjects (recommended in ref. ^{49}) because sampling error variances among these effect sizes based on the same cohort are likely to be correlated. We added this covariance matrix as our sampling error variance matrix (V argument in the rma.mv function). In addition, to account for the two main types of nonindependence in our data (hierarchical/nested and correlation/covariance structures), we used the robust function within the metafor package to generate fixed effects estimates and confidence intervals, based on robust variance estimation, from each rma.mv model. To test for the overall effect of gender on mean and variance in school grades, we constructed metaanalytical models with no fixed effects (i.e. metaanalytic model or interceptonly model). We tested whether the results were significantly different between school and university by including the ‘school or university’ categorical moderator in a metaregression model on the whole dataset. We then ran separate metaanalytical models on the school and university data subsets to quantify respective heterogeneities (Supplementary Methods). To test whether the gender gap in school grades varied between subjects, we included subject type (STEM, nonSTEM, Global, Other/NR) as a fixed effect in metaregression analyses. To test whether the gender difference in school grades has changed over historical time, or with student age, we included either study year or average student age as a fixed effect. To test whether the variance of either males or females has changed over historical time, or with student age, we used lnCV as the response variable, and the fixed effects of sex and study year, or sex and age, and their interactions. Point estimates from all statistical models were considered statistically significant when their CI did not span zero.
Robustness of results
There is a possibility of a bias in our results due to overreporting of positive findings in published studies, so we tested our data for publication bias using multilevelmodel versions of funnel plots and Egger’s regression^{50,51}. We also performed alternative analyses of key components of our study to test whether our conclusions are robust. Overlaps of grade distributions were inferred using simulation methods. Details and results of these analyses are presented in Supplementary Methods and Supplementary Tables 15–18, 20–23.
Data availability
All data, code, and models that were used to generate results text, figures, and tables in the main text and supplementary information are available to download from dedicated repositories on the Open Science Framework^{52,53}.
References
 1.
Möller, J., Pohlmann, B., Köller, O. & Marsh, H. W. A metaanalytic path analysis of the internal/external frame of reference model of academic achievement and academic selfconcept. Rev. Educ. Res. 79, 1129–1167 (2009).
 2.
Mandel, H. The role of occupational attributes in gender earnings inequality, 19702010. Soc. Sci. Res. 55, 122–138 (2016).
 3.
Holmes, K., Gore, J., Smith, M. & Lloyd, A. An integrated analysis of school students’ aspirations for stem careers: which student and school factors are most predictive? Int. J. Sci. Math. Educ. 29, 1–21 (2017).
 4.
Marsh, H. W., Trautwein, U., Lüdtke, O., Köller, O. & Baumert, J. Academic selfconcept, interest, grades, and standardized test scores: reciprocal effects models of causal ordering. Child Dev. 76, 397–416 (2005).
 5.
French, M. T., Homer, J. F., Popovici, I. & Robins, P. K. What you do in high school matters: high school GPA, educational attainment, and labor market earnings as a young adult. East. Econ. J. 41, 370–386 (2015).
 6.
Voyer, D. & Voyer, S. D. Gender differences in scholastic achievement: a metaanalysis. Psychol. Bull. 140, 1174–1204 (2014).
 7.
Shields, S. A. The variability hypothesis: the history of a biological model of sex. Signs 7, 1–30 (1982).
 8.
Johnson, W., Carothers, A. & Deary, I. J. Sex differences in variability in general intelligence: a new look at the old question. Perspect. Psychol. Sci. 3, 518–531 (2008).
 9.
Reinhold, K. & Engqvist, L. The variability is in the sex chromosomes. Evolution 67, 3662–3668 (2013).
 10.
Halpern, D. F. et al. The science of sex differences in science and mathematics. Psycho. Sci. Public Interest 8, 1–51 (2007).
 11.
Wang, M. T. & Degol, J. L. Gender gap in science, technology, engineering, and mathematics (stem): current knowledge, implications for practice, policy, and future directions. Educ. Psychol. Rev. 29, 119–140 (2017).
 12.
Spencer, S. J., Logel, C. & Davies, P. G. Stereotype threat. Annu. Rev. Psychol. 67, 415–437 (2016).
 13.
Rudman, L. A. & Phelan, J. E. Backlash effects for disconfirming gender stereotypes in organizations. Res. Organ. Behav. 28, 61–79 (2008).
 14.
OECD. STEM workers receive a significant earnings premium over other workers with the same level of education: private wage and salary, workers aged 25 and over. https://doi.org/10.1787/eco_surveysusa2012graph47en (2012).
 15.
Holman, L., StuartFox, D. & Hauser, C. E. The gender gap in science: how long until women are equally represented?. PLoS Biol. 16, e2004956 (2018).
 16.
Penner, A. M. Gender differences in extreme mathematical achievement: an international perspective on biological and social factors. Am. J. Sociol. 114, S138–S170 (2008).
 17.
Feingold, A. Gender differences in variability in intellectual abilities: a crosscultural perspective. Sex Roles 30, 81–92 (1994).
 18.
Hedges, L. V. & Nowell, A. Sex differences in mental test scores, variability, and numbers of highscoring individuals. Science 269, 41–45 (1995).
 19.
Reilly, D., Neumann, D. L. & Andrews, G. Sex differences in mathematics and science achievement: a metaanalysis of National Assessment of Educational Progress assessments. J. Educ. Psychol. 107, 645–662 (2015).
 20.
Cimpian, J. R., Lubienski, S. T., Timmer, J. D., Makowski, M. B. & Miller, E. K. Have gender gaps in math closed? Achievement, teacher perceptions, and learning behaviors across two ECLSK cohorts. AERA Open 2, 1–19 (2016).
 21.
Lakin, J. M. Sex differences in reasoning abilities: surprising evidence that male–female ratios in the tails of the quantitative reasoning distribution have increased. Intelligence 41, 263–274 (2013).
 22.
Wai, J., Cacchio, M., Putallaz, M. & Makel, M. C. Sex differences in the right tail of cognitive abilities: a 30 year examination. Intelligence 38, 412–423 (2010).
 23.
Baye, A. & Monseur, C. Gender differences in variability and extreme scores in an international context. Largescale Assess. Educ. 4, 541 (2016).
 24.
Cohen, J. Statistical Power Analysis for the Behavioral Sciences (Revised Edition) (Academic Press, New York, 1977).
 25.
Duckworth, A. L. & Seligman, M. E. P. Selfdiscipline gives girls the edge: gender in selfdiscipline, grades, and achievement test scores. J. Educ. Psychol. 98, 198–208 (2006).
 26.
McCandless, B. R., Roberts, A. & Starnes, T. Teachers’ marks, achievement test scores, and aptitude relations with respect to social class, race, and sex. J. Educ. Psychol. 63, 153–159 (1972).
 27.
Borghans, L., Golsteyn, B. H. H., Heckman, J. J. & Humphries, J. E. What grades and achievement tests measure. Proc. Natl Acad. Sci. USA 113, 13354–13359 (2016).
 28.
Zwick, R. & Green, J. G. New perspectives on the correlation of scholastic assessment test scores, high school grades, and socioeconomic factors. J. Educ. Meas. 44, 1–23 (2007).
 29.
Cornwell, C., Mustard, D. B. & Van Parys, J. Noncognitive skills and the gender disparities in test scores and teacher assessments: evidence from primary school. J. Hum. Resour. 48, 236–264 (2013).
 30.
Betts, J. R. & Morell, D. The determinants of undergraduate grade point average: the relative importance of family background, high school resources, and peer group effects. J. Hum. Resour. 34, 268–293 (1999).
 31.
Zhang, G., Anderson, T. J., Ohland, M. W. & Thorndyke, B. R. Identifying factors influencing engineering student graduation: a longitudinal and crossinstitutional study. J. Eng. Educ. 93, 313–320 (2004).
 32.
ElseQuest, N. M., Hyde, J. S. & Linn, M. C. Crossnational patterns of gender differences in mathematics: a metaanalysis. Psychol. Bull. 136, 103–127 (2010).
 33.
Lindberg, S. M., Hyde, J. S., Petersen, J. L. & Linn, M. C. New trends in gender and mathematics performance: a metaanalysis. Psychol. Bull. 136, 1123–1135 (2010).
 34.
Taylor, L. R. Aggregation, variance and the mean. Nature 189, 732–735 (1961).
 35.
Nakagawa, S. et al. Metaanalysis of variation: ecological and evolutionary applications and beyond. Methods Ecol. Evol. 6, 143–152 (2015).
 36.
Trahan, L. H., Stuebing, K. K., Fletcher, J. M. & Hiscock, M. The Flynn effect: a metaanalysis. Psychol. Bull. 140, 1332–1360 (2014).
 37.
Lackey, L. W. & Lackey, W. J. Grade inflation: potential causes and solutions. Int. J. Eng. Educ. 22, 130–139 (2006).
 38.
Leslie, S.J., Cimpian, A., Meyer, M. & Freeland, E. Expectations of brilliance underlie gender distributions across academic disciplines. Science 347, 262–265 (2015).
 39.
Niederle, M. & Vesterlund, L. Explaining the gender gap in math test scores: the role of competition. J. Econ. Perspect. 24, 129–144 (2010).
 40.
Gneezy, U. & Rustichini, A. Gender and competition at a young age. Am. Econ. Rev. 94, 377–381 (2004).
 41.
Rudman, L. A. & Fairchild, K. Reactions to counterstereotypic behavior: the role of backlash in cultural stereotype maintenance. J. Pers. Soc. Psychol. 87, 157–176 (2004).
 42.
Coyle, T. R., Snyder, A. C. & Richmond, M. C. Sex differences in ability tilt: support for investment theory. Intelligence 50, 209–220 (2015).
 43.
Wang, M. T., Eccles, J. S. & Kenny, S. Not lack of ability but more choice: individual and gender differences in choice of careers in science, technology, engineering, and mathematics. Psychol. Sci. 24, 770–775 (2013).
 44.
Valla, J. M. & Ceci, S. J. Breadthbased models of women’s underrepresentation in STEM fields: an integrative commentary on Schmidt (2011) and Nye et al. (2012). Perspect. Psychol. Sci. 9, 219–224 (2014).
 45.
RiegleCrumb, C., King, B. & Moore, C. Do they stay or do they go? The switching decisions of individuals who enter gender atypical college majors. Sex Roles 74, 436–449 (2016).
 46.
Moher, D., Liberati, A., Tetzlaff, J. & Altman, D. G. PRISMA Group. Preferred reporting items for systematic reviews and metaanalyses: the PRISMA statement. J. Clin. Epidemiol. 62, 1006–1012 (2009).
 47.
Hedges, L. V., Gurevitch, J. & Curtis, P. S. The metaanalysis of response ratios in experimental ecology. Ecology 80, 1150–1156 (1999).
 48.
Viechtbauer, W. Conducting metaanalyses in R with the metafor package. J. Stat. Softw. 36, 1–48 (2010).
 49.
Noble, D. W., Lagisz, M., O’Dea, R. E. & Nakagawa, S. Nonindependence and sensitivity analyses in ecological and evolutionary metaanalyses. Mol. Ecol. 26, 2410–2425 (2017).
 50.
Egger, M., Smith, G. D., Schneider, M. & Minder, C. Bias in metaanalysis detected by a simple, graphical test. BMJ 315, 629–634 (1997).
 51.
Nakagawa, S. & Santos, E. S. A. Methodological issues and advances in biological metaanalysis. Evol. Ecol. 26, 1253–1274 (2012).
 52.
O’Dea, R. E., Lagisz, M., Jennions, M. D. & Nakagawa, S. Data for “Gender differences in individual variation in academic grades fail to fit expected patterns for STEM”. Open Science Framework https://osf.io/efm9t (2018).
 53.
O’Dea, R. E., Lagisz, M., Jennions, M. D. & Nakagawa, S. Code for “Gender differences in individual variation in academic grades fail to fit expected patterns for STEM”. Open Science Framework https://osf.io/q68ae (2018).
Acknowledgements
We thank the following authors for kindly providing data used in analysis: Dr. Stephen Borde, Dr. Christy Byrd, Dr. Christina Davies, Professor Rollande Deslandes, Professor JeanMarc Dewaele, Professor Noor Azina Binti Ismail, Dr. Marianne Johnson, Dr. Amy Lutz, Dr. Amy Sibulkin, Dr. Helena Smrtnik Vitulić and Dr. Daniel Taylor. Sincere thanks to Dr. Khandis Blake, Dr. Daniel Noble, Dr. Joel Pick, Professor Cordelia Fine, for providing constructive comments that greatly improved the manuscript.
Author information
Affiliations
Contributions
S.N. and M.D.J. conceived the study, R.E.O. and M.L. collected data, R.E.O., M.L. and S.N. conducted analyses. All authors contributed to interpretation of the results and writing the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
O’Dea, R.E., Lagisz, M., Jennions, M.D. et al. Gender differences in individual variation in academic grades fail to fit expected patterns for STEM. Nat Commun 9, 3777 (2018). https://doi.org/10.1038/s41467018062920
Received:
Accepted:
Published:
Further reading

Formative assessment with interactive whiteboards: A oneyear longitudinal study of primary students’ mathematical performance
Computers & Education (2020)

Transformation of the media landscape: Infotainment versus expository narrations for communicating science in online videos
Public Understanding of Science (2020)

Measurement Invariance and Differential Item Functioning Across Gender Within a Latent Class Analysis Framework: Evidence From a HighStakes Test for University Admission in Saudi Arabia
Frontiers in Psychology (2020)

Children’s evaluations of deviant peers in the context of science and technology: The role of gender group norms and status
Journal of Experimental Child Psychology (2020)

The role of spatial, verbal, numerical, and general reasoning abilities in complex word problem solving for young female and male adults
Mathematics Education Research Journal (2020)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.