Urbanization and Economic Complexity

Urbanization plays a crucial role in the economic development of every country. The mutual relationship between the urbanization of any country and its economic productive structure is far from being understood. We analyzed the historical evolution of product exports for all countries using the World Trade Web (WTW) with respect to patterns of urbanization from 1995-2010. Using the evolving framework of economic complexity, we reveal that a country's economic development in terms of its production and export of goods, is interwoven with the urbanization process during the early stages of its economic development and growth. Meanwhile in urbanized countries, the reciprocal relation between economic growth and urbanization fades away with respect to its later stages, becoming negligible for countries highly dependent on the export of resources where urbanization is not linked to any structural economic transformation.


Introduction
It is an established fact that urbanization in developed countries is accompanied by economic growth and industrialization which mutually self-reinforce one another 1 . This historic pattern generates an expectation of a virtuous circle between economic growth and urbanization regardless of local conditions 2,3 . From classic urban economic theories 4,5 to the more recent scaling approach to cities 6,7 , the growth of urban population has routinely been used as a proxy for economic growth. This pattern has also been observed in rapidly developing countries such as China and India but it cannot be considered a universal blueprint 8 for deviations from this norm have not yet been fully explained.
In fact, as pointed out in several studies [9][10][11][12] , the increasing urbanization rate in persistently poor and non-industrialized countries poses an important dilemma for urban economic theory. Why, given the same rate of urbanization, does Asia contain a number of explosive economies while sub-Saharan Africa has seen very little growth? Moreover, in developed and advanced industrialized economies, is there appears to be a competitive advantage in continuing this urbanization process indefinitely?
There are several theories aimed at explaining urbanization processes. Some argue that rural poverty moves people to cities as was clearly the case in 19th century Europe and America 13 , driving the transformation from an agricultural to an industrial-service based economy 14,15 . Others argue that in the last decades there has been urban-biased public policy that has led to over-urbanization 12 .
The most intriguing approach however is rooted in the mutual indirect effects of the World Trade Web (WTW) on global urbanization 16,17 . The dominant idea is that in open economies, domestic communities (such as cities) can trade easily with other communities, boosting their exports in substituting industrialization for urbanization policy 18 . In simple terms, the commodities can flow more freely using urban agglomerates as nodes in the trading networks between countries, generating the ever present virtuous circle between economic growth and urbanization.
Starting from this theory, we analyze the WTW to explore the mutual relationship between the urbanization of the countries and their economic production structure using the Economic Complexity (EC) framework. Economic Complexity, 19-26 is a new and expanding field in the economic analysis, which proposes "Fitness" and "Complexity" metrics to quantify the fitness or competitiveness of countries and the complexity of products from a country's basket of exports. The main focus of EC is based on a bipartite representation of the World Trade Web where the nodes represent the set of world-countries and the set of exported products defined as different entities. Countries and products are connected to one another by imposing a threshold based on their Revealed Comparative Advantage (RCA) 27 which defines the criterion for the existence of relations.
The Fitness and Complexity algorithm is a kind of PageRank method applied to WTW, where Fitness F c is the quantity for country c, and Complexity Q p is the quantity for products p. The idea at the basis of the algorithm is that the countries with the highest fitness are those which are able to export the highest number of the most exclusive products i.e. those with the highest complexity. On the other hand this complexity is non linearly related to the fitness of its exporters so that products exported by low fitness countries have a low level of complexity and high complexity products are exported by high fitness countries only.
The Fitness metric is valuable in quantifying a country's productive structure and structural transformations which enable one to predict its future economic growth 23 . It correlates with the extent of economic equality 28 and it has been used to analyze the country's growth path to industrialization 29 .
In this work, we couple the WTW data with the urbanization level of more than 146 countries worldwide, and analyze this between 1995-2010 thus capturing the fingerprint of urbanization on countries' productive systems through the lens of their exports. We notice that in rural economies, the increase in urban population fosters structural changes in industrial exports. It boosts the country's diversification improving the country fitness, and allowing the export of more complex products. These economic transformations fade away in countries that already have a high level of urban population (more than 60%) where there is no relation between the urbanization process and the country's fitness.
Within the sub-Saharan countries, we capture those where the virtuous circle between economic growth and urbanization is fostering structural changes in those countries' productive systems. On the other hand within countries with economies based on raw materials, we assess the implementation of policy leading to urbanization that does not support any structural transformations of their basket of exports.

Economic Complexity and Urbanization
We represent the WTW as a bipartite network, i.e. by considering the set of world-countries and the set of products as different entities and linking a given country to a given product if (and only if) the former exports to the latter above a certain threshold (the so-called Revealed Comparative Advantage -RCA) 27 . RCA is a general criterion adopted in order to understand whether a country can be considered, or not, a producer of a particular product. It quantifies how much the export of a given product p is relevant for the economy of a country c in relation to the global export of p for all countries (See Methods Section below).
The country's fitness and product's complexity are the result of a non-linear iterative map applied to the WTW matrix M 19, 20, 30 (See Methods below).
Through the algorithm's iterations, products exported by low fitness countries have a low level of complexity while high complexity products are exported by high fitness countries only. The countries' composition of their export basket depends on their fitness. Fitness and Complexity are thus non-monetary indicators of the economy's development: the fitness represents a measure of tangible and intangible assets and capabilities, which drive the country's development, such as political organization, its history, geography, technology, services, and infrastructures 21 . Meanwhile complexity measures the necessary capabilities which must be owned by a country in order that it can produce and then export the resulting product.
Within this framework, we include the dimension of a country's degree of urbanization defined as the percentage of the total population living in urban areas. Our aim is to quantify the link between a country's urbanization process and their exports as a proxy for their industrial economic system. To disentangle the relation between country productivity systems and their urbanization, we have divided the set of countries in terms of their degree of urbanization, defined by the Urban Range, which is expressed in four quantiles [Q1, Q2, Q3, Q4] (see urban range division in Fig.1B top).
More urbanized countries [Q3,Q4] in the early 2000s, export a wide range of complex products such as: textiles, heavy manufacturing industries, and IT while rural countries [Q1,Q2] export products that require a low level of sophistication such as raw materials and agricultural products (Fig.1A).
Highly urbanized countries maintain a similar distribution across the analysis years, with a long tail of low complexity products and a consistent increase in the number of high complexity products. On the other hand starting from 2005, we have noticed that rural countries change their export basket towards higher complexity products. This shift is shown by the cumulative distribution functions of the different Urban ranges that decrease their distance from one another over time (see Fig.1A inset) together with their median and peak distance.
We notice that countries within the higher quantile of the Urban Range, Fig.1B, are the ones with higher fitness and higher diversification, whilst low urbanized countries have a low diversification and fitness. Notable exceptions are countries with exports based on raw materials (i.e. Qatar, Kuwait, Gabon, Iraq, Libya). These countries reached higher levels of urbanization as result of policy decisions 31 meanwhile their exports are limited to a few products with low complexity.
The representation of the WTW in Fig. 1C shows country exports in 2010 rearranged by ranked fitness and complexity. The country exports' diversification is related to the urbanization level. Low urbanized countries are at the bottom of the matrix with low fitness and lower degree of diversification, whilst the urbanized countries, with the most advanced economies lie at the top, with a high degree of product diversification with different levels of complexity and high fitness.

Exports Diversification and Urbanization
It is known that low fitness countries have similar economies with low degrees of diversification and high similarity with respect to their export baskets 20,32,33 i. e. they produce and export few of the same low technology products. We captured a shift in the distribution of the exported products within the rural countries (Fig.1A). In particular, we noticed that rural countries start to 2/10 There is a shift of lower urbanized countries towards the export of more complex products B. caption Distribution of the Urban Range (percentage of the total population living in urban areas) of the 146 countries analyzed. B. Ranked country fitness vs products export diversification, the highly diversified countries are one's with more fitness and high urbanization, meanwhile low urbanized country are in the center bottom of the scatter plot, with some exceptions such as those with links to the oil countries. C. Matrix of the country exports in 2010, reordering the countries and products by fitness and complexity; the color dots represent an exported product under the RCA threshold Eq.2, the color gradient follows the urban range definition.
produce and export more sophisticated products. This productive systems transformation in the EC literature is related to the development of new capabilities 22,34,35 .
Some questions from this analysis emerge: do the rural countries evolve their productive systems in the same way? and do they continue to produce and export the same products? Is the pattern of economic development entangled with urbanization in same fashion for each rural country?
We can measure indirectly the transformation of the productive systems by analyzing the evolution of WTW topology. In 3/10 particular we can assess the changes of countries' similarities in their exports studying the abundance evolution of network motifs. A network motif is a particular pattern of interconnections occurring between the nodes of the network (i.e. between the countries and their products). In our case we are interested in the abundance of the similarity motif µ sim in Fig.2B (motifs 6 36 , or X motif 33 ): it quantifies the co-occurrence of any two countries as producers of the same couple of products as Eq.7 (and, viceversa, the co-occurrence of any two products in the basket of the same couple of countries). This represents the simplest motifs 36  To provide a benchmark and asses the µ sim statistical significance of the WTW we use the Bipartite Configuration Model (BiCM) 32 as a null model. This framework is valuable in the analysis of the abundance of the bipartite motifs 36 , enabling us to detect financial crisis effects on a country's export basket 33 as well as export similarities between countries with same level of economic development 37 .
We generated 1000 matrices using the BiCM 32 (see Methods Section below) and we compare the observed abundances of the similarity motif (Eq.7) in the real network with the corresponding expected values in the null ensemble using the Z-score.
The whole WTW manifests a progressive increase of the abundance of similarity motifs with respect to the null case 33 (Black line Fig.2A). Highly urbanized countries show a similar trend of increasing similarity in their products exports. This measure implies that rural economies are very similar with a higher abundance of the similarity motif with respect to the random case having a high value Z-score. Interestingly, low urban range countries diversifying between each other manifest an opposite trend. The exports diversification trends of the low urbanized countries coupled with the increasing complexity of the product exported imply a nontrivial connection between urbanization and production capabilities. This measure outlines how rural economies follow different development patterns based on their production systems. The urbanization phenomenon coupled with the capabilities already presented in the country enable the production of different sophisticated products depending on their environment.

Urbanization Growth and Country Fitness
The economic transformation of a rural country has an impact on its overall fitness value, and the competitiveness of its productive system. In this respect, the urbanization process is key element in a country's development and its economic growth 31,38 . To assess the relation between the country's fitness and the urbanization process we analyzed the Urban Range growth rate in relation to the growth rate of country fitness ranking between 1995 and 2010, as we show in Fig.3. The country fitness ranking is the country's ordered position with respect to the country's fitness value in a given year. The growth rate of the country's fitness ranking is an easily understood tool to compare the transformations of a country's productive systems with 4/10 respect to its competitors. It has been proven a reliable tool in quantifying the country's relative degree of competitiveness across different years providing a more stable measurement than the raw fitness value 39  For each of the four Urban Range quantiles we find a linear relation between the urbanization rate and the Fitness ranking growth rate in Fig.3. Increasing urbanization within lowly urbanized countries is interwoven with increasing Fitness. Meanwhile, the effects are minimal in highly urbanized countries (Urban Range Q3,Q4). We validate the urbanization/fitness relation analyzing a 25% quantile sliding window on the whole urbanization distribution, which we show in black in Fig.3B.
We notice that in many rural economies, the urbanization process affects or has been affected by structural changes in its economic production. (An example are countries such as Uganda, Nepal, Somalia.) On other hand, there are many countries (such as IvoryCoast, Paraguay, Chad) where the urbanization process does not provide improvement in the fitness ranking 40 .
The self-reinforced mechanism between urbanization and fitness reaches a plateau within the urbanized countries (Q3,Q4), where the urbanization does not affect or has not been affected by changes in fitness ranking. In this respect, the resource exports countries manifest a shift toward a negative relation between urbanization and fitness. In fact in countries that are heavily dependent on resource exports, urbanization appears to be concentrated in the cities where the economies consist primarily of non-tradeable services 41 .

Urban Fitness Trends
The process of urbanization is often entangled with a country's industrialization 11 . As countries develop, people move out of rural areas and agricultural activities into urban centers, where they engage in manufacturing products 42 which are more sophisticated with higher complexity. This transformation is outlined by the increasing level of fitness of low urbanized countries that are involved in the urbanization process. To leverage this information and capture its trends, we define the country Urban Fitness F urb c (t) = F c (t) * U c (t); this is the value of country fitness F c weighted by the percentage of urban population U c . We cluster the countries Urban Fitness trends using the Louvain algorithm 43 which is based on their correlation matrix

Discussion
It is well-known that urbanization provides several advantages to the economics of scale and division of labour, boosting productivity and competition. It helps in accessing the labor force and inputting materials to the production process as well as decreasing the geographical distance between firms, reducing transaction costs, and fostering competition 44 . These urbanization advantages 45 together with the appropriate bureaucratic environment 31 , investment in infrastructures 46 and companies market structure 47 , are some of intangible attributes, the capabilities, that a country needs to drive economic growth and innovation 34 . We noticed that the country Fitness, the production and export of goods, is interwoven within the urbanization process during the early stages of country's economic development and growth. We show that the information carried by WTW can provide a different perspective on analyzing the complex process of urbanization, enlightening the relation between a country's exports, economic development and its urban growth.

World Trade Web
The dataset used in this work is the BACI (Base pour l'Analyse du Commerce International) World Trade Database 1 . The data contains information on the trade of 200 different countries for more than 5000 different products, categorized according to the 6-digit code of the Harmonised System 2007 2 . The products' sectors follows the UN categorization 3 We create a map between the two systems converting the HS2007 in to the ISIC revision 2 code at 2-digit 4 We represent the trade relation between the 145 countries c ∈ [1,C] and the 1131 products p ∈ [1, P] between the years [1995,2010] throught the bipartite matrixM with dimension (C × P) where each entrym c,p measures the export in US dollars. The framework of the Economic-Complexity [19][20][21][22] based on the interaction between countries and products is expressed by the application of the Revealed Comparative Advantage (RCA) 27 threshold over the entriesm c,p : Finally we define the entries of the biadjacency matrix M of the undirected bipartite network analyzed in this work as: This indicates that the connection (country-product link) is established if and only if the relative RCA is relevant (over the threshold), otherwise it can be ignored. Each row of M represents the export basket of a given country (or its diversification k c ), while each column represents the subset of producers of a given product (or its ubiquity k p ) 48 .
The data for the urban population from 1995 to 2010 are available at the World Bank database 5

Fitness and Complexity
Fitness and Complexity are a metric for countries and products applied to bipartite binary matrix M of the WTW [19][20][21][22]24 . The basic idea of EC is to define a non-linear map through an iterative process which couples the Fitness of countries to the Complexity of products. At every step of the iteration, the Fitness F c of a given country c is proportional to the sum of the exported products, weighted by their complexity parameter Q p . In particular, the Fitness F c for the generic country c and Quality Q p for the generic product p at the n−th step of iteration, are defined as: where the symbols · indicate the average taken over the proper set. The initial condition are taken as F 0 c = Q 0 p = 1 ∀c ∈ N c , ∀p ∈ N p , where N c and N p are the number respectively of countries and products (the convergence of the algorithm described by Eqs.(4) depends on the shape of the matrix M, as it has been discussed in 39 ).

Bipartite Configuration Model (BICM)
The Bipartite Configuration Model (BICM), as defined by 32,33 , is a null model of general applicability that is able to generate a grandcanonical ensemble of bipartite, undirected, binary networks in which the two layers Country and Products have respectively C and P nodes. The ensemble generate by the BICM constrained the number of connections for each node, on both layers (in our case d c and u p ) to match, on average, the observed one. Each network M in such ensemble is assigned a probability coefficient: x c and y s are the Lagrange multipliers associated to the constrained degrees.
Constraining the ensemble average values of countries and products degree induces the probability that a link exists between country c and industry sector p independently of the other links: x c y p 1 + x c y p .
The numerical values of the unknown parameters x and y have to be determined by solving the following system of C + P equations, which constrains the ensemble average values of countries diversification and products ubiquities to match the real values, d c = d * c , c = 1 . . .C and u p = u * p , p = 1 . . . P. Where {d * c } C c=1 and {u * p } S p=1 are the real degree sequence of countries, and industry sectors respectively, and · represents the ensemble average of a given quantity, over the ensemble measure defined by Eq.6 -as d c = ∑ s p cp and u s = ∑ c p cp . Indicated with an asterisk, " * " are the parameters that satisfy the systems.

Similarity Motifs
In the present work we have sampled the grand canonical ensemble of binary, undirected, bipartite networks induced by the BiCM, according to the probability coefficients P(M| x * , y * ) and calculated the average and variance of the motif µ sim , define as b-motif6 in 36 .
The Similarity Motif represents the symmetric and complete connections between two countries c, c and two industry sectors p, p . The number of similarity motifs is: with Z is the matrix of dimension (C,C), that represents the projection of M over the countries. Each entry Z cc counts the number of industry sectors in common between the countries c and c , it is defined as: Z cc = ∑ S s=1 M cs M c s = MM T This motif represents the co-occurrence of two products in two countries' export basket within the bipatite matrix of the country exports. The accuracy of the BiCM prediction in reproducing the value of quantity µ sim please follows 32 .