Introduction

Complexity emerges in the evolving and self-organizing processes of many natural, social, technological and biological systems. The constituents of a complex system interact with each other and form complex evolving networks, where the constituents are nodes and their interaction relationships are links1,2,3,4,5,6. For many real networks, the link formation process follows either the global principle of popularity in which a node tends to link with high-degree nodes7,8, or the local principle of similarity in which a node tends to link with nodes having traits similar to its own9, or a tradeoff between them9.

In the sociological literature the local principle of similarity, i.e., the phenomenon that “birds of a feather flock together,” is known as homophily10. There is much empirical evidence indicating that individuals prefer to forge social ties with people whose traits such as education, race, age and sex are the same as their own11,12,13,14. Such homophilous behaviors are ubiquitous in social networks and have been well documented10,11,12,14,15,16,17,18. In addition, the similarity shared by individuals in a group is often a significant predictor of a group’s altruism level and its ability to cooperate19. Sociological literature argues that human societies tend to display two social systems: (i) homophilous, in which people seek out people who are similar and (ii) heterophilous, in which people seek out people who are different20. The evidence indicating the actual existence of heterophilous societies is rare, however. One example is to study team formation processes in offline gangs and online games depending on the heterogeneity of agents’ attributes21.

In general, it has long been accepted that one of the most significant factors in increasing productivity in modern human societies has been the division of labor22. Thus we might assume that people in modern societies now prefer to forge links or collaborate with those who have complementary productive skills and that socioeconomic networks are becoming increasingly heterophilous, but no direct evidence of this has been documented. The availability of big data recorded from massively multiplayer online role-playing games (MMORPGs) enables us to test social and economic hypotheses and theories—such as this one—in large-scale virtual populations23 and gain a deeper understanding of our social and economic behaviors24,25,26,27,28,29,30,31,32.

In this work, we study the collaboration formation process of individuals with different professional skills. A mathematical model is proposed by assuming that individuals in socioeconomic systems choose collaborators that are of maximum utility. Based on the evolving collaboration networks of 124 virtual worlds in which the agents (virtual people) belong to three different professions possessing different skills, empirical analysis and model calibration unveil that the agents prefer to collaborate with others of different professions. We further construct two measures to quantify the degree of complementarity of virtual societies. We find that social complementarity positively correlates with economic output.

Results

A model of collaboration formation

Consider a society or a community s on day t, whose size Ns is the number of s-agents. The number of (s, i)-agents is denoted by Ns,i, where i = 1, 2 and 3 stand for the three professions. Hence . The ratio of i-agents in society s is

The average number of j-collaborators of an (s, i)-agent is fs,ij. Hence, the average number of collaborators that an (s, i)-agent has is . The average proportion of j-collaborators in all collaborators of an i-agent is

Note that qs,ii is the homophily index15,33. If i-agents have zero preference for collaborating with j-agents, we have qs,ij = ws,j. If i-agents prefer to collaborate with j-agents, we have qs,ij > ws,j. In this case the i-agents are homophilous when j = i and the i-agents are heterophilous when j ≠ i.

An agent seeks collaborators when she/he finds it difficult to complete a task alone. If there is no collaboration preference, the proportion of (s, j)-collaborators that an (s, i)-agent has is identical to the proportion of j-agents in the group, that is qs,ij = ws,j. Hence the number of (s, j)-collaborators of an (s, i)-agent is . However, in a society with a division of labor, the choice of collaborators has a significant influence on the completion of the task and it is better to have collaborators with complementary skills. Therefore, the number and skill configuration (or distribution) of an agent’s collaborators are the main determinants of her utility. We assume that, for an (s, i)-agent, there is an optimal configuration of collaborators with different skills, , where the preference coefficients γij are independent of society s. If the skill configuration in the collaborator list of an agent is optimal, her utility reaches its maximum. If the skill configuration deviates from that optimal value, her utility is reduced. In other words, the utility of an (s, i)-agent increases when her/his real number fs,ij of (s, j)-collaborators approaches the optimal value and reaches its maximum when her/his collaborator configuration is optimal such that . According to the law of diminishing marginal utility, we have β < 1. Therefore, the utility function of an (s, i)-agent is

where

in which γij is the preference of (s, i)-agents for (s, j)-agents and α > 0 since the second term in Eq. (3) quantifies the amount of utility decrease that is proportional to the deviation of the real configuration to the optimal configuration. If i-agents do not have any preference on j-agents such that qs,ij = ws,j for all societies, we have γij = 1. If i-agents prefer j-agents, we have γij > 1. If i-agents prefer not to collaborate with j-agents, we have γij < 1. For {i, j, k} = {1, 2, 3}, if γij > γik, then i-agents prefer j-agents over k-agents. To maintain a collaboration network of size fs,i, the (s, i)-agent suffers a cost proportional to fs,i12,

According to the above model, the overall utility in the decision-making process is

By maximizing Ds,i(fs,i), we can estimate the parameters γij (see Materials and Methods).

Empirical analysis

Figure 1A shows the collaboration networks on day t = 15 of a group of 27 agents randomly chosen from a virtual society filtered by three intimacy thresholds Ic = 0, 100 and 2000. There are 12 warriors, 5 priests and 10 mages. If i-agents are homophilous (neutral, heterophilous) in their collaboration-forging process, the proportion of links between i-agents is greater than (equal to, less than) the square of the proportion of i-agents (0.1975 for warriors, 0.0343 for priests and 0.1372 for mages). For Ic = 0, there are 77 links including 15 intra-warrior links, 4 intra-priest links and 4 intra-mage links. The proportions of intra-profession links are 0.1948 for warriors, 0.0519 for priests and 0.0519 for mages. For Ic = 100, there are 48 links including 8 intra-warrior links, 1 intra-priest link and 1 intra-mage link. The proportions of intra-profession links are 0.1667 for warriors, 0.0208 for priests and 0.0208 for mages. For Ic = 2000, there are 15 links including only one intra-warrior link and no intra-priest and intra-mage links. The proportions of intra-profession links are 0.0667 for warriors and 0 for priests and mages. Hence, the agents in Fig. 1A are heterophious except for priests when Ic = 0. We will show below that heterophily is not a specific characteristic for these 27 agents but a universal feature presents in all the virtual societies.

Figure 1
figure 1

Empirical evidence of heterophily in the socioeconomic networks of virtual societies on a typical day t = 15.

Warriors, priests and mages are marked respectively in cyan, red and blue. (A) Networks of 27 agents randomly chosen from a virtual society filtered by three intimacy thresholds Ic = 0, 100 and 2000 (top to bottom). (B) Dependence of qs,ij on relative size ws,j for all virtual societies for Ic = 100. In each plot, there are three well isolated clusters. For most societies, qs,ij > ws,j when i ≠ j and qs,ij < ws,j when i = j. (C) Dependence of preference measure Ps,ij on relative size ws,j for all societies for Ic = 100. There are also three well separated clusters in each plot. For most societies, Ps,ij > 0 when i ≠ j and Ps,ij < 0 when i = j. (D) Evolution of the averaged preference measure Ps,ij over all virtual societies for Ic = 100. The preference measures are roughly persistent.

Figure 1B shows that when t = 15 and Ic = 100 most virtual societies have qs,ij > ws,j when i ≠ j, but qs,ij < ws,j when i = j. Such heterophilous patterns are observed for other values of t and Ic as well (see Fig. S1).

Similar to the inbreeding homophily index15,33, we define the collaboration preference index to be

Note that Ps,ii is the inbreeding homophily value15,33. If i-agents have no preference to collaborate with j-agents, we have Ps,ij = 0. If i-agents prefer to collaborate with j-agents, we have Ps,ij > 0. In the latter case, the i-agents are homophilous when j = i and heterophilous when j ≠ i. Empirical results show that for most virtual societies Ps,ij > 0 when i ≠ j, but Ps,ij < 0 when i = j (Fig. 1C and Fig. S2). Thus in socioeconomic networks the agents are heterophilous.

Figure 1D shows the evolution of preference values Pij averaged over all societies on the same day for Ic = 100. Although these curves exhibit mild trends, it is evident that the heterophilous feature is persistent as the virtual societies develop (see also Fig. S3).

Quantifying collaboration preference

To calibrate the model, we follow and further develop the econometric method presented in ref. 12 (see Materials and Methods). We obtain the values of γij for each intimacy threshold Ic on each day t. Figure 2A shows the evolution of preference coefficients γij for socioeconomic networks using the intimacy threshold Ic = 100 and Fig. 2B shows the average preference coefficients over all days. More results are given in Fig. S4 and Fig. S5 for Ic = 0, 1, 10, 500, 1000 and 2000. The F-tests presented in Materials and Methods show that all the results are significant at the 0.1% level (see SI Tables).

Figure 2
figure 2

Preference coefficients γij for socioeconomic networks with the intimacy threshold being Ic = 100.

(A) Daily evolution of the nine preference coefficients γij with . The color of a point (t, γij) is determined by j: cyan, red and blue for j = 1, 2 and 3, respectively. The nine points for a given t were determined simultaneously in one calibration. (B) Box plots of γij shown in (A).

All the estimated values of the γii coefficients are less than 1, while all the γij values for i ≠ j are greater than 1. This indicates that the agents are not seeking same-profession agents but different-profession agents and are thus heterophilous. In most cases, especially when the intimacy threshold Ic is not large, the γij(Ic, t) values do not have a trend along the evolution of virtual worlds. When Ic is large, however, we observe an increasing trend in γ13(Ic, t) for Ic = 1000 and 2000, in γ23(Ic, t) for Ic = 1000 and 2000 and in γ32(Ic, t) for Ic = 500, 1000 and 2000 (Fig. S4). We find that the preference coefficients might change with the increase of Ic (Fig. S4 and Fig. S5). For warriors, γ11 and γ13 decreases, while γ12 increases. For priests, γ21 increases, γ22 does not exhibit evident trend, while γ23 decreases. For warriors, γ31 increases, γ32 decreases, while γ12 increases for large Ic values.

There are also intriguing patterns of relative collaboration preference as quantified by γij − γik where i, j and k correspond to the three professions (Fig. 2B and Fig. S5). On average, warriors prefer priests over mages and this relative preference enhances when Ic becomes greater but reduces slightly when t increases for large Ic values. Priests prefer mages over warriors when Ic values are small and prefer warriors over mages when Ic values are large. For large Ic, priests’ relative preference on warriors over mages decreases along time t. Mages prefer priests over warriors when Ic is small and prefers warriors over priests when Ic is large. For large Ic, mages’ relative preference on warriors over priests also decreases along time t.

Group complementarity and economic output

To measure the economic implications of heterophilous preference in socioeconomic networks, we investigate the relationship between complementarity of professions and economic performance. Consider the socioeconomic network of a virtual society with intimacy threshold Ic on day t. Economic production utilizes virtual money and goods that are converted to a standardized currency (see Materials and Methods). For each member agent a in , we calculate her production output in the week from t − 6 to t, denoted as Ys,a(t). The economic performance of the agents in is defined as the output per capita,

One measure of profession complementarity can be defined as the sum of preference measures between the three types of agents,

Alternatively, we can measure complementarity by determining how much the real collaborator configuration qs,ij deviates from the optimal collaborator configuration γijws,j (see Materials and Methods). The lower the deviation, the higher the degree of complementarity. Thus, we have

To make these results comparable for different virtual worlds, we investigate the relative quantities between two societies in the same world, lg(P2k−1/P2k), lg(C2k−1/C2k) and lg(Y2k−1/Y2k), rather than focusing on each society separately. Both measures of complementarity correlate strongly with the relative economic output when t and I are not large (Fig. 3A–F, Fig. S7 and Fig. S8). For the first few days (small t), most agents strive to achieve higher levels by implementing specific tasks with small economic outputs. Other agents attempt to obtain high intimacy levels by killing monsters in locations unrelated to economic outputs. In both cases the agents intend to form complementary collaboration networks, but their activities are not focused on economic outputs. With the development of a virtual world, the number of active agents increases and reaches a maximum at time tmax and then decays (Fig. 3G). When the activity level of a virtual world decreases, the intent of the agents moves away from production and the collaboration structure is increasingly unrelated to economic activities. This is consistent with the fact that the spectrum of tmax has a distribution similar to the significant correlations between complementarity and economic output (Fig. 3H).

Figure 3
figure 3

Relation between complementarity of collaboration network and economic output.

(A) Examples of correlations between lg(P2k−1/P2k) and lg(Y2k−1/Y2k). (B) Examples of correlations between lg(C2k−1/C2k) and lg(Y2k−1/Y2k). (C) The p-value of the correlation between lg(P2k−1/P2k) and lg(Y2k−1/Y2k) for different values of Ic and t (in units of days). A give grid (t, Ic) is colored as red or yellow if the correlation is significant at the 0.001 level or the 0.01 level. Otherwise, the grid is colored as green. (D) The p-value of the correlation between lg(C2k−1/C2k) and lg(Y2k−1/Y2k). (E) Correlation coefficient ρ between lg(P2k−1/P2k) and lg(Y2k−1/Y2k) for different values of Ic and t. The correlation coefficient is set to be zero is the correlation is insignificant at the 0.01 level. (F) Correlation coefficient ρ between lg(C2k−1/C2k) and lg(Y2k−1/Y2k). (G) Evolution of the number of active agents in different virtual worlds. (H) Histogram of tmax which is the date that a virtual world has historically the maximum active agents.

Discussion

Overwhelming empirical evidence has shown that most social networks are homophilous. The probability that two nodes will connect is higher if they share similar traits. Our analysis of virtual worlds in which division of labor is operative demonstrates the important role of complementarity. In those socioeconomic networks individuals have the motivation to cooperate and in the formation of the network individuals exhibit a heterophilous preference for those with complementary productive skills. Although mapping human behavior in virtual worlds to real-world human behavior is a subtle process34, we believe that they share an intrinsic commonality because agents in virtual worlds are, in fact, controlled by real-world people. In particular, agents consciously form teams to accomplish tasks more successfully and effectively. More generally, growing evidence shows significant similarities in the behaviors of online agents and real-world humans23,35,36,37,38,39,40,41,42,43.

In reality, human’s preference is multidimensional in their traits13. The situation in virtual societies is a little different. Indeed, the way people interact with each other has significantly changed from the old days, particularly due to the impact of the Internet. In the modern time, people can meet through the Internet in the virtual world instead of physically getting together to dine, drink and talk to forge ties. Personal traits become less important in virtual societies while agents’ profession skill is identified as a dominating trait in virtual societies. Like most MMORPGs, the system is set up in a way that a party requires different roles to function optimally. In this sense, the main result of the paper would primarily reflect the design decisions of the game developers. On the other hand, however, such a setup is trying to mimic the real human society, in which people have different and diverse skills and hence there appears the division of labor22. Hence, the results documented in this work have a general significance.

The economic model proposed in this work is different from the one in ref. 12. The essential difference is in the assumption of the utility function. The choice of the utility function may have significant impact on the outcome of the model. We calibrated the original model in ref. 12 and the estimates of parameters suggested a homophilous behavior, which is inconsistent with the empirical results presented in Fig. 1. Also, we have used a modified method of model calibration. Moreover, our model allows us to determine not only if i-agents are homophilous or heterophilous but also the preference of one type of agents to any other type of agents. Hence, our model is more general and can be applied to other systems.

The relationship between social networks and economic output has been studied previously. It has been found, for example, that the diversity of individual relationships within a community strongly correlates with the economic development of the community44 and is directly associated with higher productivity for both individuals and the community45,46. Because, to date, detailed real data at the population level of societies have been unavailable, this correspondence between professional skill and economic performance has not been quantified. Here we have begun to fill this data gap and also to highlight the usefulness of virtual worlds in carrying out research in economics and sociology23. One potential implication of our findings is that if a team leader or a firm manager recruits new members according the complementarity of their skills, the team’s productivity will increase and the firm’s economic well-being grow.

Materials and Methods

Data description

We use a huge database recorded from K = 124 servers of a popular MMORPG in China to uncover the patterns characterizing virtual socioeconomic networks. In a virtual world residing in a server there are two opposing camps or societies. Two agents can choose collaborators and a measure of closeness called intimacy is assigned to the collaboration link. When two collaborators in the same society collaborate to accomplish a task, their intimacy level increases. Two agents from different societies can also collaborate, but their intimacy level remains zero. Hence the social networks of the two camps are essentially separate. We can regard the two camps as two societies, thus giving us S = 248 virtual societies. For convenience, s = 2k − 1 and s = 2k stand respectively for the two societies in the same virtual world k. Two agents are defined as collaborators if they both are on the collaborator list and their intimacy exceeds Ic. We consider many temporal collaboration networks. On day t in a virtual society s, a network is a network in which the intimacies of all edges are no less than a threshold Ic, which can be disconnected (Fig. 1A).

There are a lot of different types of tasks in the virtual societies, which are embedded for agents of all levels. In some levels, the system will ask the agents to kill given amounts of different types of monsters. In other levels, the agents are asked to deliver something to a specific NPC (not-a-person character). And so on so forth. These tasks are usually not easy for the associated agents. However, they can ask their collaborators for help to form a team and fulfill the tasks together. Agents can also form teams to kill monsters and make productions. All these collaborations will increase the intimacy of the collaborating agents in the same team.

In each society there are three professions (warrior, priest and mage). We use subscripts 1, 2 and 3 to stand respectively for the three professions: warrior, priest and mage. For simplicity, we define several notations as follows. An s-agent is an agent belonging to society s. An i-agent is an agent having profession i. Similarly, an i-collaborator is a collaborator having profession i. An (s, i)-agent or (s, i)-collaborator is an i-agent or i-collaborator in society s.

Model calibration

An (s, i)-agent solves the following decision-making problem of how many collaborators to have

It follows that

Note that the γij values are affected only by the professions and remain the same for different societies. This enables us to estimate the parameters.

The solution (12) denotes the average behavior (decision) of all agents having the same profession in a given society. If we consider an arbitrary agent a, we must add a noise term12,

which means that the “realized” number of collaborators agent a has is the sum of a universal (or systemic) term and an idiosyncratic error term. The error term is assumed to have mean 0 and variance σ2. Note that this assumption states that the variance of any agent of any profession is the same.

We denote Ns as the size of society s and ws,i as the fraction of i-agents in society s. Hence the number of i-agents in society s is Nsws,i and the expectation of the aggregated number of collaborators that i-agents have in society s is Nsws,ifs,i. According to Eq. (13), we have

where Es,i has mean 0 and variance (Nsws,iσ)2 = (Nsws,i)2σ2.

It follows that, for i ≠ j,

Following ref. 12, we obtain an error for society s:

According to Eq. (17), we find that the mean of Ψs,i,j is 0 and the variance is ϕs,ijσ2, where

Thus the normalized variable has mean 0 and variance σ2 for any society s. The sum of squared errors over all societies in the sample is

which is independent of Ns as expected. However, is dependent on ws,i, which is consistent with the setup of our model but different from the model in ref. 12. Thus the total sum of the squared errors is

One can see that as, bs and cs could be society-specific and are not included in the final objective function of model calibration.

For each pair of Ic and t, a society is excluded in model calibration if the number of agents having at least one collaborator is less than 500 to ensure that ϵa has enough realizations. Changing this threshold around 500 results in same results. In addition, if the number of societies included in a model is less than 50, we do not calibrate the model because the model has 10 parameters.

To find the solution to the minimization of Q2, the taboo search algorithm is adopted47. The solution space is restricted to 0 ≤ γij ≤ 2 for and . Because there are 10 free parameters, it is not easy to reach the global minimum. We thus perform a taboo search in each cell of a 9-dimensional lattice of size 29 with the constraint that 0 ≤ γij ≤ 1 or 1 ≤ γij ≤ 2. The parameters in certain cell corresponding to the minimum of Q2 in all cells are obtained as the solution. The normality assumption of fitting errors has been verified by QQ-plots (Fig. S7), which rationalizes the setup of the model. We note that the partitioning of the solution space into a 9-dimensional lattice of size 29 is very important. If we perform the taboo search directly, the resulting Q2 value is significantly larger and the three preference curves γij(t) for each i are not well separated around γij = 1 (cf. Fig. 2, Fig. S4 and Fig. S5).

Significance tests

To test whether the preference coefficient γij of i-agents to j-agents is significantly different from the no-preference case, we perform F-tests using the null hypothesis

Following ref. 12, the F-statistic is

where SSR is the sum of squared residuals of the best-fit calibration, p is the number of model parameters, n is the number of observations, while the subscript “con” indicates the constrained model under the null hypothesis and the subscript “uncon” the unconstrained model.

Economic output of individuals

There are two virtual currencies, Xingbi and Jinbi. Xingbi cannot be produced by an agent’s activity and can only be bought from the system, which has an approximately stable exchange rate in reference to the Chinese currency Renminbi. Xingbi is thus a universal currency across different virtual worlds. Jinbi, on the other hand, is produced by the economic activities of the agents. There is a built-in exchange platform in each virtual world so that agents can exchange Xingbi and Jinbi. In this way, there is a real-time exchange rate from Jinbi to Xingbi.

An agent can produce virtual items (e.g., weapons, clothes and medicines) and a limited amount of the virtual currency Jinbi. We convert the produced items and Jinbi to Xingbi to obtain the real economic output of each agent on each day. There is a marketplace in each virtual world in which agents can sell their items that are priced in Xingbi or Jinbi. The price of an item is determined by the average price of all the trades in the marketplace on a given day. Each produced item can thus be measured in Xingbi.

Additional Information

How to cite this article: Xie, W.-J. et al. Skill complementarity enhances heterophily in collaboration networks. Sci. Rep. 6, 18727; doi: 10.1038/srep18727 (2016).