The education-chasing labor rush in China identified by a heterogeneous migration-network game

Despite persistent efforts in understanding the motives and patterns of human migration behaviors, little is known about the microscopic mechanism that drives migration and its association with migrant types. To fill the gap, we develop a population game model in which migrants are allowed to be heterogeneous and decide interactively on their destination, the resulting migration network emerges naturally as an Nash equilibrium and depends continuously on migrant features. We apply the model to Chinese labor migration data at the current and expected stages, aiming to quantify migration behavior and decision mode for different migrant groups and at different stages. We find the type-specific migration network differs significantly for migrants with different age, income and education level, and also differs from the aggregated network at both stages. However, a deep analysis on model performance suggests a different picture, stability exists for the decision mechanism behind the “as-if” unstable migration behavior, which also explains the relative invariance of low migration efficiency in different settings. Finally, by a classification of cities from the estimated game, we find the richness of education resources is the most critical determinant of city attractiveness for migrants, which gives hint to city managers in migration policy design.


Technical proof for the migration game model Proof for the proposition in method section of the main manuscript
We present the proof for the following proposition stated in the subsection "Method-Fast Algorithm" of the main manuscript.
Proposition 0.1. Suppose T , g are bounded functions, F is continuous. If we denote U vN M (x + , P ) as the von-Neumann-Morgenstern expected utility of the mixed strategy profile P , and define U (x + , P ) as the following: where P (x + , j) is the probability that player x + goes to destination j under strategy P , then the following hold uniformly for all P ∈ P: Proof. It suffices to show the as the number of players N → ∞, for each destination, x + =x+ s x + ,j T (x + , x + ) − g (x + , j) → p 0 (3) given that the pure strategy indicator s x + ,j = 1 have the probability P (x + , j). In (3), → p stands for the convergence in probability.
Since the sequence of player set X N s are randomly drawn from the feature space P under the default probability distribution µ, (3) follows naturally from the following provided that F is continuous: By randomness of x + s and s x + ,j given P (x + , j), the boundedness of T and the fact that P is uniformly bounded within [0, 1], the convergence of the (4) is just a consequence of the uniform law of large number. In addition, by the same logic, it is easy to verify the following convergence result: x + =x+ P (x + , j)T (x + , x + ) − g (x + , j) − F ´P /{x+} P (x + , j)T (x + , x + )dµ − g (x + , j) → p 0 (5) Continuous migration game, existence of the Nash equilibrium and its asymptotic relationship with the discrete migration game Note that the migration game stated in the main manuscript has a continuous version, which takes the entire feature space and the set of original cities as the player set. Consequently, there are a continuum of players characterized by their features and origins in the continuous migration game, and the continuous game can be considered as the limit of an increasing sequence of the finite population game with the number of players diverging to infinity. Formally, we can record the continuous migration game as the tuple G c (P := C×R p , C, µ, U ). In addition to the player set, G c is defined in exactly the same way as the G N in the main manuscript except for the two modification on the mixed strategies and utility function: A3'. Mixed strategy: players are allowed to take mixed strategy, the mixed strategy set is represented as the set of vector-valued function P = {P : is the |C| − 1 dimensional simplex, |C| is the cardinality of C. Then, for every destination j ∈ C, the jth coordinate projection P j (i, x) will be the probability that a player (i, x) selects to migrate to j under the mixed strategy P . Without loss of generality, we assume P ∈ P is smooth up to a certain order with respect to x ∈ P, which implies that two players who are similar to each other in both of origin and features should make similar choice of strategies to some extent.
A4'. Utility: denote U as the utility function of players, it takes the following form for a given player x + = (i, x) and strategy P : where F is a continuous function; T is a pairing function which, for a given player x + , describes which group of competitors, namely {x + ∈ C × P : T (x + , x + ) = 0}, will be taken into account when making decision; g is interpreted as the ideal population scale that the destination location should have, which is a kind of private information for every player and the features of every player can affect this quantity in a certain manner. As promised in the subsection "Method-Deal with unobserved migrants" of the main manuscript, we present the following proposition regarding the existence of the mixedstrategy Nash-equilibrium for the continuous version of the population game in the main manuscript, the proposition also provides an asymptotic relationship between the large population game introduced in the main manuscript and the continuous version of the game in the limit case. One important implication of the proposition is that even if there exist unobserved migrants, the statistical inference based merely on the observed sample is still asymptotically correct as long as the observed sample is sufficiently representative and sufficiently large in sample size. This implication provide a solid foundation for the empirical study of our paper.
Proposition 0.2. (1). Given a continuous migration game G(P := R p , C, µ, U ), if we suppose every mixed strategy P is allowed to be a L 2 function from C × R p to the S C ⊂ R |C| and define the best response set for P as j) . Then, the set of L 2 mixed strategy is compact convex within the topological vector space L 2 (C × R p , R |C| , µ) with respect to weak topology and the set-valued map Σ is upper hemicontinuous, therefore the Nash equilibrium exists as the fixed points of Σ.
(2). For every Nash equilibrium P E of the continuous game, there always exist a sequence of finite discrete game G N (X N ⊂ C × R p , C, µ, U N ) N = 1, . . . , a sequence of positive numbers ε 1 , ε 2 , . . . that decrease to 0 and a sequence of strategies {P N,E : N = 1, . . . } such that for each N , P N,E corresponds to a ε N -Nash equilibrium of game G N , then P N,E → P E in Banach norm.
(3). If there exists a sequence of of Nash equilibriums P N,E s associated with an increasing sequence of finite discrete game G N s (i.e. the player set of G N is always is subset of G N +1 ) such that the left hand side of (2) converges and the sequence of P N,E s satisfy the continuity condition that for every player x +,i , every > 0 there exists a δ > 0 and a N such that then there exists a unique continuous Nash equilibrium P E that is the limit of the sequence P N,E s in Banach norm.
Proof for Proposition 0.2. In this proof, we first show that the best response function Σ is a upper hemi-continuous set-valued function with respect to the weak topology on the L 2 (P, R |C| ).
Notice that by the definition of the weak topology and the upper hemi-continuity on the infinite dimensional topological vector space, the function Σ is upper hemi-continuous if Condition 0.3. for a finite sequence of vector-valued functions in L 2 (P, R |C| ), denoted as {l 1 , . . . , l m }, and a sequence of positive number (ε 1 , . . . , ε m ) such that there always exists another sequence of (l 1 , . . . , l m } and (ε 1 , . . . , ε m ) such that whenever a P satisfies (7) with l i , ε i s replaced by l i , ε i s, all Q ∈ Σ(P ) satisfies (7) as well.
To verify the above continuity conditions, we first assumes that ∀P ∈ Σ(P ) and all x + ∈ P the support set of P (x + , .) is not the entire C. Under this assumption and the condition that µ has compact support, it can be easily checked that there exists a small ε > 0 such that whenever the following holds min which implies that Σ(P ) ⊂ Σ(P ). Then, the condition (0.3) holds as long as we take ε i ≡ ε/2 and l i = T (x +,i , ·) for i = 1, . . . , m with {x +,i : i = 1, . . . , m } satisfying for every Under the requirement on T in the Proposition 0.2, the sequence of {x +,i : i = 1, . . . , m } always exists. This completes the proof for upper hemicontinuity of Σ. The convexity and weak compactness of the Σ(P ) is trivial. Consequently, by the Kakutani fixed point theorem on infinite dimensional topological vector space, fixed point exist for Σ which verifies the existence of Nash equilibrium for the continuous migration game. The statement ii) is a direct consequence of the conclusion of Proposition 0.1 and the fact that the set of all smooth functions are dense in L 2 (P, R |C| ) with respect to the norm topology. For the statement iii), the continuity constraint guarantees the convergence of the sequence of Nash equilibriums for finite games G N to a continuous functions which is also contained in L 2 (P, R |C| ) due to the compact support of µ, finally the limit function is a Nash equilibrium of the limiting continuous game because of the uniform convergence result in Proposition 0.1.

Statistic summary for the resume data sample
We present the sample distribution of 8 key quantitative variables of the resume dataset as in table 1-8. The variables include the education, age, gender/marriage status, monthly salary, the number of previous jobs, work experience (years), the number of words used to describing the past work experience and the current job status. There are many other non-quantitative feature variables in the resume data characterizing migrant types such as the text-valued variables documenting the self-evaluation, past working experience and education experience. They are not statistically summarizable, so we adopt the nature language processing techniques (such as the LDA method for topic mining) to pre-process the text-valued variables and extract 48 quantitative variables related with different latent topics, which, combining with the 9 key variables (the gender and marriage status are separated as two variables) and the 20 industry dummy variables introduced below, consist of the 77 migrant-level features used for fitting the migration game in the main manuscript. These variables, although, capture some personal features of every migrant, they are not easily interpretable in the usual sense. So we won't summarize them here.  Next, there are 53 industries in total that Zhaopin.com provides to characterize migrants' working professions. Based on key words matching, the 53 industries can be assigned into the 20 industries officially classified by the National Bureau of Statistics of China. The table    0.42 Older than 60 0.1 9 presents the joint distribution of migrant education and their working industry (calculated based on the 20 official industry) and makes a comparison between the joint distribution extracted from the resume data sample and the 2015 census data provided by the National  From all above tables, it can be summarized that the online job seekers in our sample represents a sub-population who have higher education level and are gathering more intensively in the eastern coast area of China than the average of the whole population. This bias is reasonable as threshold exists for seeking job online, the online job-seekers have to get used to use computer and internet in their daily life, meanwhile they own stable access to the internet. Therefore, online job seekers can only be those relatively high-educated workers and stays in relatively developed region of China (most of the workers in remote regions, such as the Tibet and Qinghai Province, have very limited access to the internet, they are naturally under-represented in our sample). Although bias exists for our sample, it at least represents an important sub-population, the online job-seekers/migrants. As the economic growth and the further development of internet, this sub-population constitutes an increasingly influential portion of the labor supply in the entire Chinese labor market, therefore, concentrating on this sub-population is meaningful for interpret the future labor force migration pattern in China.

Type-specific migration networks and decentralization trend
The difference migration network formed by entry-wisely subtracting the type-specific migration probability from the overall migration probability is presented in Fig. 2 for three migrant groups that have undergruate degree, monthly salary 84,000 and are younger than 20-year-old respectively. The three types are selected because they deviate from the overall migration trend most significantly at stage i). In 2a and 2b, the difference migration network by the undergraduate group of migrants is plotted, it shows that the undergraduate migrants originated from "other cities" are more likely than the average to migrate into the central cities such as Beijing in the stage i) and Shanghai in the stage ii) both of which are the top cities in China and pointed toward by the coarse blue arrows in Fig. 2a and 2b respectively. This observation implies that the high-educated migrants have extra preference to top cities which agrees with the findings in ref [1,2]. For the high-income migrants with monthly salary 84,000, it is shown in Fig. 2d that the strong migration tendency from the top city, Shanghai, and a couple of local central cities, such as Shenyang, Yinchuan and Lanzhou, to the "other cities" exists in the stage ii) migration. This observation is very special, it does neither exist for the other migrant types in Fig. 2 nor is documented in the existing studies as far as we know. One possible explanation is the return migration trend [3,4] after the financial crisis in 2008, which means the high-income migrants intend to move back to somewhere close to their home town so as to chase better life quality rather than keep accumulating material wealth in rich regions. Finally, as shown in Fig. 2e and 2f, the migrants younger than 20-year-old are more likely than the average to migrate from the other cities to the city clusters centered at Guangzhou, Fuzhou, Changsha and so on, rather than the top cities, e.g. Beijing and Shanghai. This observation holds for both migration stages which might be caused by the fact that younger migrants are less capable of finding well-paid job opportunities, thus are less likely to survive in the top cities as the top cities are also the most expansive cities [2,5]. For different education, income and age types, the difference migration networks are plotted for the types that deviate most significantly from the whole population at the migration stage i) in terms of their type-specific migraiton network. Based on the discussion in the main manuscript, the three types are the undergraduate type for education, the type with monthly salary 84,000 for income and the 10-20-year-old type for age, respectively, which are all significant at least the 0.05 confidential level.
The difference network is computed by subtracting the type-specific network from the the overall migration network. In a, c and e, the difference network for stage i) migration are plotted for undergraduate, 84,000-monthly-salary and 10-20-year-old migrants, respecitvely. In b, d and f the stage ii) difference migration networks are plotted for the same three migrant types.
In all the figures, the arrows are always pointed toward the destination, the size, opacity and darkness of arrows represent the absolute value of the link weight in the difference network, the red-colored arrow represents on which the overall migration probabilty is greater than the type-specific migration probability while the blue-colored arrow represents the opposite on which the type-specific migration probability is greater than the overall probability. For different education, income and age types, the difference migration networks are plotted for the types that deviate most significantly from the whole population at the migration stage ii) in terms of their type-specific migraiton network. Based on the discussion in the main manuscript, the three types are the type of professional school degree for education, the type with monthly salary 6,000 for income and the 30-40-year-old type for age, respectively, which are all significant at least the 0.05 confidential level. The difference network is computed by subtracting the type-specific network from the the overall migration network. In a, c and e, the difference network for stage i) migration are plotted for undergraduate, 84,000-monthly-salary and 10-20-year-old migrants, respecitvely. In b, d and f the stage ii) difference migration networks are plotted for the same three migrant types. In all the figures, the arrows are always pointed toward the destination, the size, opacity and darkness of arrows represent the absolute value of the link weight in the difference network, the red-colored arrow represents on which the overall migration probabilty is greater than the type-specific migration probability while the blue-colored arrow represents the opposite on which the type-specific migration probability is greater than the overall probability.       between the np and nn class of cities (city-pairs) for a variety of age, education and income types. The smaller pvalue implies the greater confidence to reject the null hypothesis and take the alternative hypothesis that the np class has much more hospital bed number than the nn class. Because the pp class contains no city nor city pair for both stage i) and ii) migration, the relevant radial plots are missing.

Hypothesis test for city features
During the classification, to avoid the data noise, we trimed those very small valued DE 1 x,ij and DE 2 x,ij in the sense of setting DE 1 x,ij (DE 2 x,ij ) as 0 when their absolute value is less than 0.001, then the resulting city pair ij is discarded as noisy point and won't be rendered into any of the four classes.