Inclusion unlocks the creative potential of gender diversity in teams

Several studies have highlighted the potential contribution of gender diversity to creativity, also noted challenges stemming from conflicts and a deficit of trust. Thus, we argue that gender diversity requires inclusion as well to see increased collective creativity. We analyzed teams in 4011 video game projects, recording weighted network data from past collaborations. We developed four measures of inclusion, based on de-segregation, strong ties across genders, and the incorporation of women into the core of the team’s network. We measured creativity by the distinctiveness of game features compared to prior games. Our results show that gender diversity without inclusion does not contribute to creativity, while at maximal inclusion one standard deviation change in diversity results in .04–.09 standard deviation increase in creativity. On the flipside, at maximal inclusion but low diversity (when there is a ‘token’ female team member highly integrated in a male network) we see a negative impact on creativity. Considering the history of game projects in a developer firm, we see that adding diversity first, and developing inclusion later can lead to higher diversity and inclusion, compared to the alternative of recruiting developers with already existing cross-gender ties. This suggests that developer firms should encourage building inclusive collaboration ties in-house.


Introduction
Groups with diverse members can be engines of creativity. Project teams -small collectives recruited for a defined task -are often used to address creative tasks, where a nonroutine solution is needed to solve a problem (Amabile 1983;Kozlowski and Bell 2013), and such teams boost returns to resources in innovative organizations (Cohen and Bailey 1997;Wuchty, Jones, and Uzzi 2007). There is evidence that teams possess collective intelligence beyond the mean or maximal individual intelligence of team members (Woolley et al. 2010). It is also often demonstrated, that the collective intelligence and creative capacity of teams is a function of their cognitive diversity (Horwitz and Horwitz 2007). When team members come from diverse demographic backgrounds and have diverse past experiences, they have a higher openness to divergent thinking (Levi 2017), and they are more willing to constructively challenge the status quo (Amabile et al. 1996).
Gender diversity specifically has been shown to boost collective intelligence (Woolley et al. 2010;Xie et al. 2020), and the low proportion of non-dominant genders dampens innovative potential in teams (Beede et al. 2011;Hofstra et al. 2020). Women, transgender, and gendernonconforming people (TGNC) are under-represented in STEM fields -especially in computer science and software careers (Cheryan et al. 2017;Haverkamp et al. 2021) -, and even if they embark on a career in technology, they are less appreciated and successful, and are more likely to leave at various key stages compared to men (Clark Blickenstaff 2005;Vedres and Vasarhelyi 2019). It is important to analyze gender diversity in STEM teams to understand how diversity contributes to innovation when females are in minority, and often face discrimination (Brooke 2019).
Despite a general agreement about the promise of diversity for creativity, studies on how team diversity leads to an increase in team performance has not reached a clear consensus (Joshi and Roh 2009;van Knippenberg, De Dreu, and Homan 2004). It is clear that group creativity is not a simple function of individual creativity, but a complex interplay of compositional diversity, internal team structures, and the organizational-cultural environment of the team (Vedres and Stark 2010;Woodman, Sawyer, and Griffin 1993). On one hand diversity itself, while contributing to openness to creative solutions, can contribute to weakened team cohesion, and heightened conflict (Webber and Donahue 2001). On the other hand, the right routines and communication structure within the team can multiply the power of diversity for innovation (Cohen and Bailey 1997). Thus we need to consider diversity together with inclusion to understand the potential of diversity for collective creativity (Milliken and Martins 1996).
To understand creativity in diverse collectives we need to pay attention to inclusion as well (De Dreu and West 2001;Ferdman and Deane 2014;Mor Barak et al. 2016). Diversity without inclusion can lead to mistrust and a breakdown of communications (Pelled 1996), preventing a true dialog where diverse approaches to the problem at hand can be explored (Nishii and Goncalo 2008;Pearsall, Ellis, and Evans 2008). The mere increase in the proportion of women in a field will not eliminate their discrimination (Begeny et al. 2020). When diverse teams are well integrated, even if diversity results in conflicts, such conflicts can be beneficial to performance in complex, non-routine tasks (Jehn 1995). We argue that in teams with a discriminated minority -the case with gender in STEM -, without inclusion diversity will not have a positive impact on collective creativity, as various perspectives that diverse participants bring to the team would not have a chance to be contrasted and utilized.

Gender diversity and collaboration in video game development
We analyze teams in the video game industry from 1994 to 2009. Video games is today by far the largest entertainment industry, that overtook movies and music in terms of gross revenues in 2003, and by 2009 it became larger than movies and music combined (Kooistra 2019). Video game development is a field that prizes creativity and distinctiveness (de Vaan, Vedres, and Stark 2015), but it is a male-dominated field, where only about 17% of developers were female in 2010, and about 20% of them are female today (Bailey, Miyata, and Yoshida 2021). The content of video games is decidedly male, as only about 13% of all characters depicted in video games are female (Williams et al. 2009). Fields with low proportion of female participants feature strong prejudice and discrimination against women (Begeny et al. 2020), thus if we are able to show creative advantage to gender diversity in a strongly male dominated field, such as video games, it would serve as a strong evidence for the power of gender diversity.
We measure weighted collaborative ties between developers as the number of prior joint participations in game development projects, following others who have analyzed collaboration in co-authorship (Newman 2001), movies (Lutter 2015), musicals , video game development (de Vaan et al. 2015), or jazz music (Phillips 2011).
These approaches take a bipartite graph of person-to-event affiliations (affiliations to papers published, movies, games, or albums released), and analyze the person-to-person projection, an undirected weighted graph, where !" = ∑ !# "# , if is a shared affiliation for and at time − 1 that predates time of the focal event analyzed.

Measuring inclusion in diverse teams
Inclusion in a work team context can be defined as actively engaging with team members across differences (Ferdman and Deane 2014). We cannot speak of inclusion, when different team members are excluded from meaningful contact and collaboration, either by isolating individuals that are different, or allowing the team to fragment into homophilous subgroups. Inclusion is also absent when team members are fully assimilated, and their diversity becomes muted and irrelevant in collaborations (Shore et al. 2011).
There is no consensus about how inclusion should be measured. A wide range of measures were proposed, that include dimensions of individual or group experiences, leadership, norms and values (Ferdman and Deane 2014), influence on decisions, and access to resources (Mor- Barak and Cherin 1998), a sense of belonging, and authenticity (Jansen et al. 2014), organizational support, and tolerance towards uniqueness (Chung et al. 2020). These measures require reactive data collection techniques (like surveys or interview methods), and are not scalable to large observational data.
In this article we focus on the relational aspects of inclusion, and we rely on weighted graph measures of how well gender minority members in the team are connected into the collaboration network, as evidenced by past projects. We build on past research that developed related measures conceptualized as network heterogeneity (Reagans and Zuckerman 2001), or the co-presence of incumbency and network diversity (Guimera et al. 2005). We develop a range of measures for inclusion, from a minimal level -lack of segregation along gender -to a stronger level -the presence of gender minority in the network core of the team.
We define three dimensions of inclusion. First, we define inclusion as mixing: the lack of segregation along gender. Second, we define inclusion as bonding, where inclusion is stronger if the ties that connect across gender categories are of higher weight. And third, we define inclusion as incorporation when gender minorities are included in the core of the team's network. Figure 1 illustrates these measures examples of weighted collaboration graphs, where the number of nodes and the proportion of genders are kept constant.  Our first measure of inclusion is mixing: the lack of segmentation by attributes in the team. If team members are segregated, the team cannot benefit from exchanges across gender lines. Limiting collaborative connections that would allow for negotiating diverse perspectives, essential for innovation (Northcraft et al. 1995). When gender underlie subgroup formation, team identity erodes, and the salience of gender identity increases often to the extent of gender conflict (Carton and Cummings 2012). Network fragmentation along attributes can be measured by assortativity, the over-representation of ties within categories (Newman 2002).
Our second measure of inclusion is bonding, that captures the strength of ties across gender categories. Strong ties are seen to be vehicles of trust (Granovetter 1973), and they offer high-bandwidth interpersonal channels that are crucial in innovative contexts, when the information environment is complex, and updates frequently (Aral and Van Alstyne 2011). The stronger the ties in mixed gender dyads, the broader the social bandwidth a team can rely on to develop novel solutions.
Our third measure of inclusion is incorporating: the proportion of female team members in the core of the team's network. Being in the core opens access to informal leadership, and thus offers the opportunity for women to have a say in decision making (Shore et al. 2011). Women in leadership positions tend to encourage participation, and facilitate broader information sharing (Rosener 2011), and encourage innovation and risk-taking (Adams and Funk 2012). Our measure of combined inclusion is then the product of the three raw measures, representing co-occurrence of various forms of inclusion. (See Quantitative Measures in Materials and Methods for formulas of inclusion metrics.)

The impact of diversity and inclusion on creativity
Gender diversity in video game development is low (the proportion of females is .15 overall), even less than the proportion of females in STEM and computer programming (Beckhusen 2016;United States Department of Labor 2015), that is around twenty percent. As shown on Figure 2, the female proportion of game developers had been slowly increasing from .12 in 1994 to .18 in 2009. There is no comparable increase in inclusion, as the combined inclusion index hovers around the average of .06 without a significant trend.  As expected by the literature (Bear and Woolley 2011;Beede et al. 2011;Hofstra et al. 2020;Woolley et al. 2010;Xie et al. 2020), we find that gender diversity is positively related to creativity in video game projects, as an increase in gender diversity means a slight increase in creativity, without considering inclusion. Considering games as units of analysis, one standard deviation increase in gender diversity measured by the Blau diversity index results in .09 (95% CI: .06; .12) standard deviation increase in creativity measured by game distinctiveness. At the firm level this relationship is slightly stronger, as one standard deviation increase in developer firm average gender diversity results in .13 (95% CI: .08; .18) standard deviation increase in average developer firm level distinctiveness. (See firm-level point estimates in Figure S2, panel d in Supplementary Materials.) Figure 3. shows points estimates of standardized gender diversity, forms of inclusion, and the interaction of gender diversity and inclusion. Once we include any of the three forms of inclusion in our model (or combined inclusion as the product of our three inclusion measures), we no longer see a positive main effect for gender diversity. The signs and significance levels of these point estimates are comparable when we control for developer firm level heterogeneity (with random or fixed effects for developer firm intercepts), or when we aggregate the data to the developer firm level. In fact, the main effect for gender diversity becomes significant negative at the developer firm level (point estimate of 0.36, and 95% CI: -.68; -.05). This suggests that gender diversity without inclusion does not contribute to group creativity.
The main effect of inclusion is mostly negative: two of our three measures (mixing and bonding) show significant negative coefficients, both at the game level (mixing: -.07, 95% CI: -.12; -.01; bonding: -.13, 95% CI: -.21; -.06), and at the developer firm level as well (mixing: -.14, 95% CI: -.24; -.05; bonding: -.17, 95% CI: -.30; -.04). These estimates indicate that inclusion without gender diversity does not help creativity. Of course, inclusion is not interpretable for zero diversity; these estimates indicate that increasing inclusion for minimal levels of diversity will not help creativity. It is likely that high inclusion at low levels of diversity leads to assimilation and tokenism that was shown to nullify creative benefits to diversity (Shore et al. 2011).
Creativity in game development benefits from gender diversity with inclusion. Game developer teams should include female collaborators, and also integrate them with the rest of the team to boost creativity. The interaction of gender diversity and inclusion is positive and significant for all of our inclusion measures and shows a significant if moderate contribution to distinctiveness. At the game level, one standard deviation change of gender diversity and inclusion jointly results in an increase from .04 to .06 standard deviations in game distinctiveness (in addition to the main effects of diversity and inclusion). At the developer firm level, one standard deviation change of gender diversity and inclusion jointly results in increases from .04 to .09 standard deviations in firm level average game distinctiveness. Our point estimates suggest that teams should make sure female collaborators are included into team networks to be more creative, but teams with average gender diversity (at the mean, and in the interquartile range of gender diversity) will not see any significant difference between inclusion and lack of inclusion in terms of their creativity. We explore our predictions along varying levels of diversity, both for minimal and maximal inclusion, by keeping all controls constant at their means. Figure 4. shows the predictions at the developer firm level.
Across all three inclusion measures, and also their combined index, our predictions indicate that video game developer teams cannot increase their creativity at any level of gender diversity, if female developers are not included (their inclusion measures equal zero). In case of bonding and the combined index, this finding stays robust even if re-label 50 percent of unknown gendered team members as males. (See Materials and Methods, Impact of Gender Robustness on Modelling for more details, and Figure S4 for Point Estimates on relabeled data Figure S2, panel d in Supplementary Materials.) This suggests that adding "newbie" female team members without prior collaborative ties to male team members will not contribute to increased creativity in the first instance.
In contrast to predictions with minimal inclusion, we see that at maximal inclusion an increase in gender diversity leads to increase in creativity, measured by game distinctiveness. If a team moves from the lowest gender diversity in our dataset to the highest, while maintaining maximal inclusion, it can boost distinctiveness by 10% in the case of mixing, by 20% in the case of bonding, by 7% in the case of incorporating, and by 22% for combined inclusion. Our results indicate that a firm with only male developers would find it difficult to realize creativity benefits from adding female developers initially. Considering firms that include female developers for the first time after working with only male developers (there were 306 such firms in our dataset of the complete set of 1354 firms), we do not see any increase in creativity. The distinctiveness score of the first game with any females is even slightly (but not significantly) lower than the preceding game with males only. What our predictions indicate is that developer firms need to reach diversity that is higher than the top quartile to start seeing benefits from gender diversity, when female developers are also included in the team's network. Firms with low diversity would see inclusion decreasing the creativity of the team. This presents a barrier to seeing the incentives to boost diversity in video game developer firms. Nevertheless, several developer firms did successfully increase diversity and inclusion, and in the next section we attend to game histories of firms to understand the processes that can lead to higher levels of diversity and inclusion, despite the lack of early benefits.

Firm-level processes that lead to diversity and inclusion
How can firms boost diversity and inclusion? We turn to analyze histories of game developer firms, to understand if intervention in diversity, or intervention in inclusion is what leads to higher levels of diversity and inclusion at the end of their histories. In the first case, if boosting diversity is the key to advancing both diversity and inclusion, firms can add female developers to their teams, and then subsequently see an increase in inclusion, when female developers build ties to male developers in repeated game projects. In the second case, if firms can intervene by adding inclusion, the key is to hire subsets of developers with gender diversity and pre-existing ties between female and male team members. In this case, inclusion is not "home grown", but rather a function of clustered migration of individuals among firms. This process also occurs frequently, as was recently described as the "trojan horse" mechanism (Arvidsson, Collet, and Hedström 2021), driven by a sequence of clustered migration of individuals who have prior collaborative ties between them.
To capture the primary drivers of firm-level processes, we use transfer entropy, a measure that captures the amount of information values in one time series have about subsequent values of another time series (Barnett, Barrett, and Seth 2009;Schreiber 2000). Low transfer entropy means that a prior increase in the first process is not followed by an increase in the second process, while high transfer entropy means a strong directional coupling. We measure transfer entropy between two processes: the developer firm-level times series of diversity and inclusion, in both directions. We enter the resulting variables, transfer entropy $→& and &→$ in an OLS model that predicts the final diversity and inclusion, and the trends of diversity and inclusion at the developer firm level. Figure 5 shows two examples, one where diversity predicts inclusion, and a second where inclusion predicts diversity.  Figure 6 shows point estimates predicting firm-level diversity and inclusion. We found that boosting diversity and inclusion seems to be a product of a diversity-driven process, where in a firm-level sequence of video game projects changes in diversity result in changes in inclusion. This process in practice can be conceptualized as the hiring of female developers regardless of their prior histories of collaboration with other team members, and subsequently adding inclusion by repeated collaborations between female and male developers -a form of "home grown" inclusion. The reverse direction of temporal influence between processes does not seem to lead to increased diversity or inclusion: When firms add inclusion first (and diversity is a result of this subsequently), we should expect no measurable advantage in increased diversity or inclusion. In practice such a process would mean hiring dyads of female and male developers with pre-existing collaboration ties, which we could label as "acquired inclusion". This indicates that developer firms should not hesitate to add novice female developers to their teams -even if they cannot expect immediate creativity benefits in the team with female developers without inclusion, as female developers would not yet have cross-gender collaborative ties. Female developers will accumulate collaborative ties, and thus achieve inclusion subsequently, and the team can expect to see a boost in creativity.

Discussion
As others have already found, gender diversity in itself is a predictor of creativity (Bear and Woolley 2011;Beede et al. 2011;Hofstra et al. 2020;Xie et al. 2020). However, when we take inclusion also into account, the main effect of gender diversity on creativity is not significant, suggesting that diverse collaborators in a team also need to be included to for the team to see creativity benefits. Gender diversity interacts with inclusion in a way that diversity without inclusion does not bring any advantages in creativity, regardless of the extent of diversity. The creative benefit of diversity increases only as much as inclusion increases.
Our results indicate that organizations should pay attention to inclusion as well, not only to diversity. There is a rich literature on inclusion stressing the importance of integrating employees and team members with diverse attributes (Chung et al. 2020(Chung et al. , 2020Ferdman and Deane 2014;Jansen et al. 2014;Mor-Barak and Cherin 1998), but systematic and large scale measurement tools for inclusion, diversity, and creativity were not developed in conjunction. We operationalize inclusion using the network of past collaboration, developing three diverse metrics that all support the same conclusion: gender diversity without inclusion does not lead to benefits in creativity.
At the same time, we also see evidence for why tokenism: the minimal presence of gender minority is not effective. When a team adds one female member, one can expect no creativity benefits. In fact, when we observe developer firms with exclusively male teams in their history adding a female developer for the first time, there is a slight decrease in creativity.
This underscores prior findings about the limits of tokenism (Farh et al. 2020;Guldiken et al. 2019): when gender diversity is low, inclusion acts more as assimilation that silences the creative potential in diversity. The process by which organizations include diverse collaborators is not indifferent, especially as a developer firm looking to increase diversity from zero, or very low levels would not see early benefits in creativity. We found evidence that it is more beneficial to first increase diversity, and then inclusion, rather than the other way around. Organizations should aim to recruit novice, unconnected female collaborators, and then increase inclusion by employing these novice female tema members in repeated projects. The alternative approach of recruiting diverse team members already with a history of cross-gender collaborations in prior projects does not lead to sustained increase in diversity and inclusion. Organizations need to add and include a relatively high proportion of female developers (about 23%significantly higher than the industry average of 19%) to start seeing creativity benefits. This is a likely contributor to the sustained marginalization of female developers in the field, reinforcing beliefs in the benefits of male-skewed team composition.

Limitations
Limitations of our study chiefly relate to the definition and measurement of diversity and collaboration. Gender identity is not binary, however such personal information could be only analyzed if self-claimed gender identity is provided, therefore we could not incorporate nonbinary gender into this study. We are also aware of the limitations and the potential biases of name-based inferring methods, such algorithms perform better on Western-names (Karimi et al. 2016). To account for the potential bias that the presence of unknowns within teams implies we performed robustness checks and found even if 50 percent of unknowns are male the positive interaction of gender diversity and inclusion persists in the case of combined index and bonding but mixing and incorporating are more sensitive to such bias.
Our measurement of past collaboration was restricted to collaborations within the population of games in our dataset, thus we have no data about past collaborations in game projects not in the database, or projects in other industries. To fully capture inclusion we would also need to have multiplex network data about communication and other relevant on-project relationships, as well as a subjective sense of acceptance. Our measure of diversity did not take dimensions beyond gender into account, while in collaborative settings complex intersectional diversity is at play.

Data
We collected data from the video game industry, relying on MobyGames.com 1 Our dataset contains 8,617 unique video games, with a list of each game's developer teams, critic's reviews, and stylistic elements such as genres, perspective (e.g., first-person shooter, roleplaying) and the platforms it can be played on (e.g., PlayStation, Nintendo Switch, etc.). We also record each game's developer studio, publishing house, and the year of the first release. The video game industry has gone through a major change, with the rising popularity of mobile games in the early 2010s. As the industry became more competitive, and labour shortages hit the tech industry, companies stopped publishing the entire credit list of their project teams, probably to avoid offers sent to their employees from competitors. Therefore, our analysis covers games published between the 1980s to 2010.
Since our database goes back to the very beginnings of the video game industry, we can infer everyone's full career path; connecting unique user accounts with the games they had worked on in a consecutive order. It allows us to create team-level weighted networks for each video game: two team members are connected based on how many times they had worked on the same game.
For our analysis we only considered games which were published between 1993 and 2009, and had less than 2000 connection among team members, had at least one female team member, and less than 50% of team members gender could have been inferred. We excluded all re-released and mobile games. Since gender diversity is a key interest of our study, we had to exclude all those video games from our analysis which did not list team members' full name and used only initials instead of first names. Our resulting database contains 4,011 video games. (For more details see Table S1, S2 in Supplementary Materials.)

Gender Inferring
Similarly, to film credits, Moby Games lists each team member's full name and task in the production (imaging, scripting, design, music, etc.). To infer team members' gender, we relied on developers' full names, and adopt a commonly used first-name based gender inferring method (Vedres and Vasarhelyi 2019). Name-based gender inferring methods has been criticized for treating gender as a binary category, and over-representing Western names (Karimi et al. 2016). The method that we selected is optimized for high precision, where names with high probability for being unisex are labelled as unknowns. Our gender inferring yielded 19 percent female, 63 percent male and 18 percent unknowns. (For further details on the accuracy of gender inferring see Gender Inferring, Figure S1., Table S3 in Supplementary Materials.)

Quantitative Measures
Dependent Variable: We measure creativity by adopting De Vann et al.'s distinctiveness metric, which compares the combination of each game's stylistic elements to all games released in the preceding five years and compute and average distance (1-cosine similarity) between them (de Vaan et al. 2015). Since we do not know the exact publish date of a very game, we did not compare games published within the same to avoid temporal aversion.
Cosine Distance d ',) is calculated 1) by comparing the vectors of stylistic elements of all game i with all other game j, the following: Where gik is 1/K if a given stylistic element k was used in game i and 0 otherwise. Then the resulting similarity is subtracted from 1. 2) Finally we normalize these game-pair distances for all games (1,2,…,j) published in the proceeding 5 years, as the following: = < !" > / "+,,"0! Independent Variables -Gender Diversity and Team cohesion metrics: Our core interest is how teams' gender diversity and inclusion predict creativity and success in the video game industry.

Gender Diversity: Blau's index
We use Blau's Heterogeneity Index as our measure of diversity. It is calculated as 1 = 1 − ∑ ! -, where ! is the ratio of group members in category i (male or female). Therefore, the female-male ratio is 50-50 percent the Blau Index is 1, and when a team is composed only by one gender group is 0.
We measured inclusion in four ways by using network-based segregation metrics:

Mixing as reversed assortativity
Assortativity Coefficient developed by Mark Newman (Newman 2003) measures the similarity of connections in the graph with respect to the given attribute. It has been widely used to measure homophily in various (social) networks: such as sexual contacts and marriage matching (Girvan and Newman 2002;Newman 2003), demographics on Facebook (Traud, Mucha, and Porter 2012), book recommendation networks (Bucur 2019) or the research interests of scientists who follow each other on twitter (Ke, Ahn, and Sugimoto 2017). The Assortativity Coefficient, r is calculated as the Pearson correlation coefficient of degrees between pairs of nodes, formally = 23(5)7∑ 5 $ ,7∑ 5 $ , where M is the mixing matrix (joint probability) of the two genders, and Tr(M) is the trace (sum of elements in the diagonal) of matrix M. r=0 is where the network is perfectly disassortative, meaning that every edge connects a node to a different type, while r=1 means perfect assortativity, when the network is fully segregated, such that nodes from type i do not connect to nodes to type j. We quantify reversed assortativity by subtracting it from one and normalizing it: . Large values of Mixing mean high inclusion -team members are mixed by gender, and low values indicate gender segregation.
One of the beneficial attributes of assortativity coefficient while measuring segregation is that this metric is insensitive to the number of isolated nodes within the network (Bojanowski and Corten 2014). Because our collaboration networks are based on previous shared collaborations we have a higher number of isolated group members, which we should not consider while analyzing the network structure.

Bonding as the ratio of weighted cross-gender ties
More frequent shared project experience indicates more intense relationship among team members, which can be a proxy for higher inclusion. Women have been shown to strive and feel more included in workplaces where they could develop stronger ties (Timberlake 2005). Stronger ties were also shown to be beneficial to transfer complex knowledge (Hansen 1999;Nahapiet and Ghoshal 1997) and solve complex problems (de Montjoye et al. 2015). Therefore, our second metric quantifies gendered inclusion as the total number of times men and women worked together in previously divided by the total number of shared working experience of team: where ∑ 95 is the sum of weights that connect different gender groups, and ∑ ! ! is the sum of all weights within the network.

Incorporating as the ratio of women in the graph center
Our third inclusion metric captures how central women's position within the team network, specifically the ratio of women within the collaboration network's center. Network center is defined as the Jordan center of a graph, which is a set of nodes where eccentricity is equal to graphs' radius. The eccentricity ( ) of a node measures how far a node is from the furthest node in the graph. Formally ( ) = max :∈/ ( , ). The radius of a graph is the minimum eccentricity of any node, formally = ( ( )) = min <∈/ max :∈/ ( , ) To measure which team members belong to the center we used Python 3. Netwokx Center Distance Metric. Finally we take the natural logarithm of the ratio of women in the center to ensure a better distribution, therefore calculated as = ( where =∈> is the number of women in the center = is the number of women in the team.

Combined inclusion
Our fourth measure of inclusion is the combination of the first three, as a product of the three measures of inclusion: mixing, bonding, and incorporating.

Modelling Distinctiveness
Our dependent variable that captures creativity -"distinctiveness" is a normally distributed variable we use OLS Regression models to model the impact diversity and inclusion on it. We ran separate models for each inclusion metric -mixing, bonding, incorporating and combined inclusion. To compare models we normalize, our dependent and independent variables by their minimum values as the following: In each model we controlled for multiple attributes that can provide an alternative explanation for teams' creativity. Team size is measured as the number of team members involved in the game production. Larger teams are typically assembled by more established developer firms, therefore more likely to have bigger networks and employ more women. Ratio of center: Ratio of center measures how many percent of the team network belongs to the center. More flat organizations (with higher center) are more likely to be more democratic and allow minorities to share their ideas. Number of Newbies, measures the number of team members with no experience in game development (based on our database). We also counted for the Number of star developers, those who have been awarded a Game Developers Choice Award. Game tenure captures the experience level of a team, measured as the average number of games team members have produced prior to the year of production of the given game. Single-Firm Production Is a dummy variable, which is 1 if the publisher and the developer company is the same entity, otherwise 0. We controlled for the platforms the game was developed for, because certain genres and platforms can be more popular than others. We also controlled for temporal trends, with the t year of release and the number of countries the game was released at.
To account for factors that are not directly measurable on produced games' distinctiveness we provide three alternative modelling approaches: game-level analysis 1) with random effects for developer firm, 2) with fixed-effects for developer firm 3) and developer firm level aggregated analysis. (See Figure S2).

Impact of Gender Robustness on Modelling
To account for the potential bias that inferred gender could introduce to results, we adopt various robustness checks. Although the precision of our name-based gender inferring method was nearly perfect for men and women, we accounted much lower precision for unknowns (50%). Although we excluded teams with more than 50% of unknowns from our analysis, there is still bias that team members with unknown gender can add to our results. Statistics on female representation in the video game industry indicate that most of the unknowns are more likely to be male. Therefore, we randomly select 25 or 50 percent of unknown gendered team members in each game and re-label them as males and re-calculate all diversity and inclusion metrics. We repeat this process 100 times and take the average of the resulting inclusion metrics for each game. (Figure S3, in SI shows the distribution of newly calculated gender diversity and inclusion metrics with 25 and 50 percent of relabelled data compared to original data). Then we rerun game-level OLS models to predict games' distinctiveness based on the 25 and 50 percent relabelled diversity and inclusion metrics. The interaction between gender diversity and bonding stays significant even if 50 percent of unknown gendered team members are labelled as males. Similarly to bonding, combined inclusion's interaction with gender diversity is robust to gender relabelling, while mixing and incorporating loose their significance if at least 25 percent of unknowns turn out to be male.
(See the estimated coefficients of diversity and inclusion at Figure S4, SI in Supplementary Materials.)

Time series analysis
We have filtered our data to include games from developer firms that had at least four games in the dataset. This resulted in a dataset with 2418 games from 308 developer firms, filtered from the original dataset of 4011 games from 1354 firms. Distributions of key variables (creativity, diversity, and combined inclusion) in the filtered dataset did not differ from the full dataset (with Kolmogorov-Smirnov test p values of .99, .65, and .83 respectively), and the means of these variables were not significantly different either (with Wilcoxon rank sum test p-values of .57, .22, and .83 respectively). We recorded the diversity and combined inclusion scores for these games, and we calculated transfer entropy from diversity to inclusion, and from inclusion to diversity as B→C = ( D | D7,:D7F ) − ( D | D7,:D7F , D7,:D7F ), where ( ) is the Shannon entropy of .
Since time resolution for the publication date for games in annual, we had several games within a developer firm that were from the same year. For these games with tied dates we have used random sorting, and re-calculated transfer entropies. We used 500 random sortings of ties for all temporal sequences. We then calculated the mean transfer entropy scores of these 500 sequences for each developer firm game sequence.

Gender Inferring
To quantify the error in our algorithm we randomly selected 200-200 team members whose gender was inferred as male, female and unknown and manually inferred their gender. Since these people re professionals, we look-up their profiles on LinkedIn and labelled them based on their pictures, bio, and recommendations (if the text used gendered language aka, He or She). If we could not find any LinkedIn user with the same name, or based on the picture or the text it was not possible to decide whether the given individual is a male or female, we labeled them as unknown.
Furthermore, we ran another widely used gender inferring method on our database genderguesser 2 to compare how well our method performed. Figure S1 shows the accuracy of the used gender inferring method and the Gender Guesser Python package in comparison with the manually created baseline. Precision measures how many contributors were assigned to the correct gender label according to our baseline. Recall measures how many of the contributors were correctly identified. F-score takes the harmonic average of these two metrics. Our method has a higher precision for females and males, but lower recall for male. Table S3. Shows that our method is also better identifying unknowns than the Gender-Guesser default python package, but worse in identifying male contributors. Figure S1: Gender Inferring Accuracy. Precision, Recall and F-score of our gender inferring method and the Gender Guesser Python package in comparison with the manually created baseline. Precision measures how many contributors were assigned to the correct gender label according to our manually created baseline. Recall measures how many of the contributors were correctly identified. F-score takes the harmonic average of these two metrics Figure S2: Point estimates of distinctiveness in four model specifications. Point estimates of distinctiveness with 95% CI for gender diversity, four variables of inclusion, and their interactions with gender diversity. Markers are numbered according to OLS models; coefficients are for one SD change in distinctiveness because of one SD change in independent variables. Panel a) shows the point estimates of different inclusion models based on the baseline game-level OLS models shown in the manuscript. Panel b) shows estimates for gamelevel OLS models with Random Effects to estimate the effect of game specific characteristics and Panel c) with Fixed-Effects for firm belonging to account for firm-level specific effects. Point estimates from models visualized in Panel d) are coming from firm-level aggregated data, where each variable is the average of all games produced by a given firm. Figure S3: Distribution of gender swapped diversity and inclusion metrics. To assess the bias that namebased gender inferring method can introduce to our independent variables we randomly re-labeled 25 and 50 percent of unknown gendered team members to male in each game, and calculated Gender Diversity, Mixing, Bonding, Incorporating and Combined Inclusion. We repeated this process 100 times for each game and calculated the average of the resulting diversity and inclusion metrics. Blue histogram shows the original distribution, orange distribution is based on 25 percent re-labeled data, and green is based on 50 percent. Distributions indicate that Mixing and Incorporating are more sensitive to re-labelling than Bonding and the Combined Index.  Figure S4: Point estimates of distinctiveness based on OLS models ran on relabeled gender data. To assess the impact of unknowns in project teams, we re-calculated our diversity and inclusion metrics 100 times by randomly relabeling 25 and 50 of unknown gendered team members to male. Blue indicates original point estimates of distinctiveness with 95% CI for gender diversity, yellow 25 percent of unknowns relabeled to male, and green 50.  : Descriptive Statistics of original data collected from mobygames.com We collected data from the video game industry, relying on MobyGames.com. Our dataset contains 8,617 unique video games, with a list of each game's developer teams, critic's reviews, and stylistic elements such as genres, perspective (e.g., firstperson shooter, role-playing) and the platforms it can be played on (e.g., PlayStation, Nintendo Switch, etc.). We also record each game's developer studio, publishing house, and the year of the first release.
Filtering Criteria N All games 8,617 Published between 1993 and 2009 7,931 1<number of edges in the network <2000 4,771 Ratio of unknowns in team < 0.5 4,654 Number of women in team network >=1 4,011 Table S2: Applied filtering criteria on our game dataset. For our analysis we only considered games which were published between 1993 and 2009, and had less than 2000 connection among team members, had at least one female team member, and less than 50% of team members gender could have been inferred. We excluded all re-released and mobile games. Since gender diversity is a key interest of our study, we had to exclude all those video games from our analysis which did not list team members' full name and used only initials instead of first names. Our resulting database contains 4,011 video games.   Table S4: Game-level models efficiency gain and significance tests by models Model 1) F-test is calculated based on baseline model that contains only control variables. Models 2 to 5 are compared with Model 1. Significant F-test (P < 0.05) means that the analysed model explains the variance of creativity statistically better than the baseline model. SSR: Sum of squares of residuals in models, SS_DIFF: Difference in sum of squares compared to baseline models, F: value of statistic used to compare SSR of two models.