Introduction

It is an established fact that urbanization in developed countries is accompanied by economic growth and industrialization which mutually self-reinforce one another1. This historic pattern generates an expectation of a virtuous circle between economic growth and urbanization regardless of local conditions2,3. From classic urban economic theories4,5 to the more recent scaling approach to cities6,7, the growth of urban population has routinely been used as a proxy for economic growth. This pattern has also been observed in rapidly developing countries such as China and India but it cannot be considered a universal blueprint8 for deviations from this norm have not yet been fully explained.

In fact, as pointed out in several studies9,10,11,12, the increasing urbanization rate in persistently poor and non-industrialized countries poses an important dilemma for urban economic theory. Why, given the same rate of urbanization, does Asia contain a number of explosive economies while sub-Saharan Africa has seen very little growth? Moreover, in developed and advanced industrialized economies, is there appears to be a competitive advantage in continuing this urbanization process indefinitely?

There are several theories aimed at explaining urbanization processes. Some argue that rural poverty moves people to cities as was clearly the case in nineteenth century Europe and America13, driving the transformation from an agricultural to an industrial-service based economy14,15. Others argue that in the last decades there has been urban-biased public policy that has led to over-urbanization12.

The most intriguing approach however is rooted in the mutual indirect effects of the World Trade Web (WTW) on global urbanization16,17. The dominant idea is that in open economies, domestic communities (such as cities) can trade easily with other communities, boosting their exports in substituting industrialization for urbanization policy18. In simple terms, the commodities can flow more freely using urban agglomerates as nodes in the trading networks between countries, generating the ever present virtuous circle between economic growth and urbanization.

Starting from this theory, we analyze the WTW to explore the mutual relationship between the urbanization of the countries and their economic production structure using the Economic Complexity (EC) framework. Economic Complexity19,20,21,22,23,24,25,26,27, is a new and expanding field in the economic analysis, which proposes “Fitness” and “Complexity” metrics to quantify the fitness or competitiveness of countries and the complexity of products from a country’s basket of exports. The main focus of EC is based on a bipartite representation of the World Trade Web where the nodes represent the set of world-countries and the set of exported products defined as different entities. Countries and products are connected to one another by imposing a threshold based on their Revealed Comparative Advantage (RCA)28 which defines the criterion for the existence of relations.

The Fitness and Complexity algorithm is a kind of PageRank method applied to WTW, where Fitness \(F_c\) is the quantity for country c, and Complexity \(Q_p\) is the quantity for products p. The idea at the basis of the algorithm is that the countries with the highest fitness are those which are able to export the highest number of the most exclusive products i.e. those with the highest complexity. On the other hand this complexity is non linearly related to the fitness of its exporters so that products exported by low fitness countries have a low level of complexity and high complexity products are exported by high fitness countries only.

The Fitness metric is valuable in quantifying a country’s productive structure and structural transformations which enable one to predict its future economic growth23. It correlates with the extent of economic equality29 and it has been used to analyze the country’s growth path to industrialization30.

In this work, we used a data driven approach borrowing tools recently introduce by statistical physics and network science to improve our understanding of the complex dynamics of human societies, with the aim of finding innovative insight31 to link urbanization process with the evolution of the international trade.

We couple the WTW data with the urbanization level of more than 144 countries worldwide, and analyze this between 1995-2010 thus capturing the fingerprint of urbanization on countries’ productive systems through the lens of their exports. We notice that in rural economies, the increase in urban population fosters structural changes in industrial exports. It boosts the country’s diversification improving the country fitness, and allowing the export of more complex products. These economic transformations fade away in countries that already have a high level of urban population (more than \(60\%\)) where there is no relation between the urbanization process and the country’s fitness.

Within the sub-Saharan countries, we capture those where the virtuous circle between economic growth and urbanization is fostering structural changes in those countries’ productive systems. On the other hand within countries with economies based on raw materials, we assess the implementation of policy leading to urbanization that does not support any structural transformations of their basket of exports.

Results

Economic complexity and urbanization

We represent the WTW as a bipartite network, i.e. by considering the set of world-countries and the set of products as different entities and linking a given country to a given product if (and only if) the former exports to the latter above a certain threshold (the so-called Revealed Comparative Advantage—RCA)28. RCA is a general criterion adopted in order to understand whether a country can be considered, or not, a producer of a particular product. It quantifies how much the export of a given product p is relevant for the economy of a country c in relation to the global export of p for all countries (See “Methods” Section).

The country’s fitness and product’s complexity are the result of a non-linear iterative map applied to the WTW matrix M19,20,32 (See “Methods”).

Through the algorithm’s iterations, products exported by low fitness countries have a low level of complexity while high complexity products are exported by high fitness countries only. The countries’ composition of their export basket depends on their fitness. Fitness and Complexity are thus non-monetary indicators of the economy’s development: the fitness represents a measure of tangible and intangible assets and capabilities, which drive the country’s development, such as political organization, its history, geography, technology, services, and infrastructures21. Meanwhile complexity measures the necessary capabilities which must be owned by a country in order that it can produce and then export the resulting product.

Within this framework, we include the dimension of a country’s degree of urbanization defined as the percentage of the total population living in urban areas. Our aim is to quantify the link between a country’s urbanization process and their exports as a proxy for their industrial economic system. To disentangle the relation between country productivity systems and their urbanization, we have divided the set of countries in terms of their degree of urbanization, defined by the Urban Range, which is expressed in four quantiles [Q1, Q2, Q3, Q4] (see urban range division in Fig. 1B top).

Figure 1
figure 1

(A) Distribution of exported products complexity by different urbanization levels through the 2000–2010. There is a shift of lower urbanized countries towards the export of more complex products (B) Distribution of the Urban Range (percentage of the total population living in urban areas) of the 144 countries analyzed. (B) Ranked country fitness versus products export diversification, the highly diversified countries are one’s with more fitness and high urbanization, meanwhile low urbanized country are in the center bottom of the scatter plot, with some exceptions such as those with links to the oil countries. (C) Matrix of the country exports in 2010, reordering the countries and products by fitness and complexity; the color dots represent an exported product under the RCA threshold Eq. (2), the color gradient follows the urban range definition.

More urbanized countries [Q3,Q4] in the early 2000s, export a wide range of complex products such as: textiles, heavy manufacturing industries, and IT while rural countries [Q1,Q2] export products that require a low level of sophistication such as raw materials and agricultural products (Fig. 1A).

Highly urbanized countries maintain a similar distribution across the analysis years, with a long tail of low complexity products and a consistent increase in the number of high complexity products. On the other hand starting from 2005, we have noticed that rural countries change their export basket towards higher complexity products. This shift is shown by the cumulative distribution functions of the different Urban ranges that decrease their distance from one another over time (see Fig. 1A inset) together with their median and peak distance.

We notice that countries within the higher quantile of the Urban Range, Fig. 1B, are the ones with higher fitness and higher diversification, whilst low urbanized countries have a low diversification and fitness. Notable exceptions are countries with exports based on raw materials (i.e. Qatar, Kuwait, Gabon, Iraq, Libya). These countries reached higher levels of urbanization as result of policy decisions33 meanwhile their exports are limited to a few products with low complexity.

The representation of the WTW in Fig. 1C shows country exports in 2010 rearranged by ranked fitness and complexity. The country exports’ diversification is related to the urbanization level. Low urbanized countries are at the bottom of the matrix with low fitness and lower degree of diversification, whilst the urbanized countries, with the most advanced economies lie at the top, with a high degree of product diversification with different levels of complexity and high fitness.

Exports diversification and urbanization

It is known that low fitness countries have similar economies with low degrees of diversification and high similarity with respect to their export baskets20,34,35 i. e. they produce and export few of the same low technology products. We captured a shift in the distribution of the exported products within the rural countries (Fig. 1A). In particular, we noticed that rural countries start to produce and export more sophisticated products. This productive systems transformation in the EC literature is related to the development of new capabilities22,36,37.

Some questions from this analysis emerge: do the rural countries evolve their productive systems in the same way? and do they continue to produce and export the same products? Is the pattern of economic development entangled with urbanization in same fashion for each rural country?

We can measure indirectly the transformation of the productive systems by analyzing the evolution of WTW topology38. In particular we can assess the changes of countries’ similarities in their exports studying the abundance evolution of network motifs39. A network motif is a particular pattern of interconnections occurring between the nodes of the network (i.e. between the countries and their products). In our case we are interested in the abundance of the similarity motif \(\mu _{sim}\) in Fig. 2B (motifs 640, or X motif35): it quantifies the co-occurrence of any two countries as producers of the same couple of products as Eq. (7) (and, viceversa, the co-occurrence of any two products in the basket of the same couple of countries). This represents the simplest motifs40 that can quantify the similarities in the export countries’ diversification which maintains a pairwise correlation within the products exported. Two economies with a fixed number of products exported are diversifying if the values of \(\mu _{sim}\) is decreasing while their production similarity increases with high values of \(\mu _{sim}\).

Figure 2
figure 2

(A) Z-score of the export similarity motif by country groups with different Urban Ranges and the Z-score of the whole WTW in black. (B) Similarity motif as the co-occurrence of any two countries as producers of the same couple of products35,40.

To provide a benchmark and asses the \(\mu _{sim}\) statistical significance of the WTW we use the Bipartite Configuration Model (BiCM)34 as a null model. This framework is valuable in the analysis of the abundance of the bipartite motifs40, enabling us to detect financial crisis effects on a country’s export basket35 as well as export similarities between countries with same level of economic development41.

We generated 1000 matrices using the BiCM34 (see “Methods” Section) and we compare the observed abundances of the similarity motif (Eq. 7) in the real network with the corresponding expected values in the null ensemble using the Z-score.

The whole WTW manifests a progressive increase of the abundance of similarity motifs with respect to the null case35 (Black line Fig. 2A). Highly urbanized countries show a similar trend of increasing similarity in their products exports. This measure implies that rural economies are very similar with a higher abundance of the similarity motif with respect to the random case having a high value Z-score. Interestingly, low urban range countries diversifying between each other manifest an opposite trend. The exports diversification trends of the low urbanized countries coupled with the increasing complexity of the product exported imply a nontrivial connection between urbanization and production capabilities. This measure outlines how rural economies follow different development patterns based on their production systems. The urbanization phenomenon coupled with the capabilities already presented in the country enable the production of different sophisticated products depending on their environment.

Urbanization growth and country fitness

The economic transformation of a rural country has an impact on its overall fitness value, and the competitiveness of its productive system. In this respect, the urbanization process is key element in a country’s development and its economic growth33,42. To assess the relation between the country’s fitness and the urbanization process we analyzed the Urban Range growth rate in relation to the growth rate of country fitness ranking between 1995 and 2010, as we show in Fig. 3. The country fitness ranking is the country’s ordered position with respect to the country’s fitness value in a given year. The growth rate of the country’s fitness ranking is an easily understood tool to compare the transformations of a country’s productive systems with respect to its competitors. It has been proven a reliable tool in quantifying the country’s relative degree of competitiveness across different years providing a more stable measurement than the raw fitness value43.

Figure 3
figure 3

(A) The Fitness Ranking Growth Rate versus Urbanization Growth Rate. The effect of urbanization growth on the transformation of the economic systems (or vice-versa) is more relevant in low urbanize countries. The dashed lines represent the 95% Confidence Interval (CI) of the linear regression. (B) Slope coefficient of a sliding window across \(25\%\) of the countries (corresponding to 36 countries) of its fitness ranking growth rate versus urban population growth rate. The error bar corresponds to the fit’s \(95\%\) confidence interval. The colors follow the Urban Range Scheme.

For each of the four Urban Range quantiles we find a linear relation between the urbanization rate and the Fitness ranking growth rate in Fig. 3b. Increasing urbanization within lowly urbanized countries is interwoven with increasing Fitness. Meanwhile, the effects are minimal in highly urbanized countries (Urban Range Q3,Q4). We validate the urbanization/fitness relation analyzing a \(25\%\) quantile sliding window on the whole urbanization distribution, which we show in black in Fig. 3B.

We notice that in many rural economies, the urbanization process affects or has been affected by structural changes in its economic production. (An example are countries such as Uganda, Nepal, Somalia.) On other hand, there are many countries (such as IvoryCoast, Paraguay, Chad) where the urbanization process does not provide improvement in the fitness ranking44.

The self-reinforced mechanism between urbanization and fitness reaches a plateau within the urbanized countries (Q3,Q4), where the urbanization does not affect or has not been affected by changes in fitness ranking. In this respect, the resource exports countries manifest a shift toward a negative relation between urbanization and fitness. In fact in countries that are heavily dependent on resource exports, urbanization appears to be concentrated in the cities where the economies consist primarily of non-tradeable services45. To support our result we provide the same analysis using instead of the Fitness Ranking metric, the Fitness, Gross Domestic Product (GDP) and GDP Ranking respectively (see “Methods” section: Urban Range vs Fitness and GDP). We do not find any evidence of relation between the other three metrics and the urbanization rate.

Urban fitness trends

The process of urbanization is often entangled with a country’s industrialization11. As countries develop, people move out of rural areas and agricultural activities into urban centers, where they engage in manufacturing products46 which are more sophisticated with higher complexity. This transformation is outlined by the increasing level of fitness of low urbanized countries that are involved in the urbanization process. To leverage this information and capture its trends, we define the country Urban Fitness \(F_{c}^{\text{ urb }}(t)=F_c(t)*U_c(t)\); this is the value of country fitness \(F_c\) weighted by the percentage of urban population \(U_c\).

Figure 4
figure 4

(A) Clusters of normalized Urban Fitness Trends. (B) Correlation Matrix of the countries urban fitness trends clustered with the Louvain algorithm. (C) Geographical cluster distribution. The map in this figure was created using the software QGIS.

We cluster the countries Urban Fitness trends using the Louvain algorithm47 which is based on their correlation matrix shown in Fig. 4B. Three clusters emerge with high correlations disentangling the non-trivial geographical relations we show in Fig.4A–C.

In Fig. 4A countries with a clear urbanization trend (in orange) are ones with a stable increase in fitness ranking. Meanwhile the blue cluster contains developed countries, where the urbanization does not provide any new input to the economic development and resource dependent countries, where the urbanization is not only lead by deep structural economic change. These results are in agreement with the Urban Range study in Fig. 3 that show a poor effect of the urbanization on the country fitness, implying that over a given value of urbanization, other factors have a more important role in economic development and growth. Finally, the third cluster (in red) are the countries without any clear trend and are thus uncategorized.

Discussion

It is well-known that urbanization provides several advantages to the economics of scale and division of labour, boosting productivity and competition. It helps in accessing the labor force and inputting materials to the production process as well as decreasing the geographical distance between firms, reducing transaction costs, and fostering competition48. These urbanization advantages49 together with the appropriate bureaucratic environment33, investment in infrastructures50 and companies market structure51, are some of intangible attributes, the capabilities, that a country needs to drive economic growth and innovation36. We noticed that the country Fitness, the production and export of goods, is interwoven within the urbanization process during the early stages of country’s economic development and growth. We show that the information carried by WTW can provide a different perspective on analyzing the complex process of urbanization, enlightening the relation between a country’s exports, economic development and its urban growth.

Methods

Data

World trade web

The dataset used in this work is the BACI (Base pour l’Analyse du Commerce International) World Trade Database (Gaulier, S. Baci: International trade database at the product-level http://www.cepii.fr/CEPII/fr/publications/wp/abstract.asp?NoDoc=2726 Date of access: 18/01/2021). The data contains information on the trade of 200 different countries for more than 5000 different products, categorized according to the 6-digit code of the Harmonised System 2007 (http://www.wcoomd.org/ Date of access: 18/01/2021). The products’ sectors follows the UN categorization (http://unstats.un.org/unsd/cr/registry/regcst.asp?Cl=8 Date of access: 18/01/2021). We create a map between the two systems converting the HS2007 in to the ISIC revision 2 code at 2-digit (http://www.macalester.edu/research/economics/PAGE/HAVEMAN/Trade.Resources/TradeConcordances.html#FromISIC Date of access: 18/01/2021). We represent the trade relation between the 144 countries \(c\in [1,C]\) and the 1131 products \(p\in [1,P]\) between the years [1995, 2010] throught the bipartite matrix \({\tilde{M}}\) with dimension \((C\times P)\) where each entry \({\tilde{m}}_{c,p}\) measures the export in US dollars. The framework of the Economic-Complexity19,20,21,22 based on the interaction between countries and products is expressed by the application of the Revealed Comparative Advantage (RCA)28 threshold over the entries \({\tilde{m}}_{c,p}\):

$$\begin{aligned} {\text{ RCA}}_{c,p}=\frac{\frac{{\tilde{m}}_{cp}}{\sum _{1,P}^{p'}{\tilde{m}}_{cp'}}}{\frac{\sum _{1,C}^{c'}{\tilde{m}}_{c'p}}{\sum _{1,P}^{p'}\sum _{1,C}^{c'}{\tilde{m}}_{c'p'}}} \end{aligned}$$
(1)

Finally, we define the entries of the biadjacency matrix M of the undirected bipartite network analyzed in this work as:

$$\begin{aligned} {\left\{ \begin{array}{ll} m_{cp}=1 &{} {\text {when}}\, {\text{ RCA}}_{cp}\ge 1\\ m_{cp}=0 &{} {\text {otherwise}} \end{array}\right. } \end{aligned}$$
(2)

This indicates that the connection (country-product link) is established if and only if the relative RCA is relevant (over the threshold), otherwise it can be ignored. Each row of M represents the export basket of a given country (or its diversification \(k_c\)), while each column represents the subset of producers of a given product (or its ubiquity \(k_p\))52.

$$\begin{aligned} k_c=\sum _p m_{cp}\quad k_p=\sum _c m_{cp} \end{aligned}$$
(3)

Urbanization

The data of the urban population from 1995 to 2010 are available at the World Bank database (https://data.worldbank.org/ Date of access: 18/01/2021).

Fitness and complexity

Fitness and Complexity are a metric for countries and products applied to bipartite binary matrix M of the WTW19,20,21,22,24. The basic idea of EC is to define a non-linear map through an iterative process which couples the Fitness of countries to the Complexity of products. At every step of the iteration, the Fitness \(F_c\) of a given country c is proportional to the sum of the exported products, weighted by their complexity parameter \(Q_p\). In particular, the Fitness \(F_c\) for the generic country c and Quality \(Q_p\) for the generic product p at the \(n-\)th step of iteration, are defined as:

$$\begin{aligned} \left\{ \begin{array}{c} {\tilde{F}}^{(n)}_c=\sum _p m_{cp} Q^{(n-1)}_p\\ \\ {\tilde{Q}}^{(n)}_p=\dfrac{1}{\sum _c m_{cp} \frac{1}{F^{(n-1)}_c}} \end{array} \right. \rightarrow \left\{ \begin{array}{c} F^{(n)}_c=\dfrac{{\tilde{F}}^{(n)}_c}{\langle {\tilde{F}}^{(n)}_c\rangle }\\ \\ Q^{(n)}_p=\dfrac{{\tilde{Q}}^{(n)}_p}{\langle {\tilde{Q}}^{(n)}_p\rangle } \end{array} \right. , \end{aligned}$$
(4)

where the symbols \(\langle \cdot \rangle\) indicate the average taken over the proper set. The initial condition are taken as \(F_c^0=Q_p^0=1\,\,\forall c\in N_c,\,\forall p\in N_p\), where \(N_c\) and \(N_p\) are the number respectively of countries and products (the convergence of the algorithm described by Eq. (4) depends on the shape of the matrix M, as it has been discussed in43).

Bipartite configuration model (BICM)

The Bipartite Configuration Model (BICM), as defined by34,35, is a null model of general applicability that is able to generate a grandcanonical ensemble of bipartite, undirected, binary networks in which the two layers Country and Products have respectively C and P nodes. The ensemble generate by the BICM constrained the number of connections for each node, on both layers (in our case \(d_c\) and \(u_p\)) to match, on average, the observed one. Each network \({\mathbf {M}}\) in such ensemble is assigned a probability coefficient:

$$\begin{aligned} P({\mathbf {M}}|{\mathbf {x}}, {\mathbf {y}})=\prod _cx_c^{d_c({\mathbf {M}})}\prod _py_p^{u_p({\mathbf {M}})}\prod _{c, p}(1+x_cy_p)^{-1}, \end{aligned}$$
(5)

\(x_c\) and \(y_s\) are the Lagrange multipliers associated to the constrained degrees.

Constraining the ensemble average values of countries and products degree induces the probability that a link exists between country c and industry sector p independently of the other links:

$$\begin{aligned} p_{cp}=\frac{x_cy_p}{1+x_cy_p}. \end{aligned}$$
(6)

The numerical values of the unknown parameters \({\mathbf {x}}\) and \({\mathbf {y}}\) have to be determined by solving the following system of \(C+P\) equations, which constrains the ensemble average values of countries diversification and products ubiquities to match the real values, \(\langle d_c\rangle =d_c^*,\,c=1\dots C\) and \(\langle u_p\rangle =u_p^*,\,p=1\dots P\).

Where \(\{d_c^*\}_{c=1}^C\) and \(\{u_p^*\}_{p=1}^S\) are the real degree sequence of countries, and industry sectors respectively, and \(\langle \cdot \rangle\) represents the ensemble average of a given quantity, over the ensemble measure defined by Eq. (6)—as \(\langle d_c\rangle =\sum _sp_{cp}\) and \(\langle u_s\rangle =\sum _cp_{cp}\). Indicated with an asterisk, “\(*\)” are the parameters that satisfy the systems.

Similarity motifs

In the present work we have sampled the grand canonical ensemble of binary, undirected, bipartite networks induced by the BiCM, according to the probability coefficients \(P({\mathbf {M}}|{\mathbf {x}}^*, {\mathbf {y}}^*)\) and calculated the average and variance of the motif \(\mu _{\text{ sim }}\), define as b-motif6 in40.

The Similarity Motif represents the symmetric and complete connections between two countries \(c,c'\) and two industry sectors \(p,p'\). The number of similarity motifs is:

$$\begin{aligned} \mu _{\text{ sim }}=\frac{1}{4}\sum _{c=1}^C\sum _{c=1}^C{\fancyscript {Z}}_{cc'}({\fancyscript {Z}}_{cc'}-1)-\frac{1}{4}\sum _{c=1}^C d_c(d_c-1) \end{aligned}$$
(7)

with \({\fancyscript {Z}}\) is the matrix of dimension (CC), that represents the projection of M over the countries. Each entry \({\fancyscript {Z}}_{cc'}\) counts the number of industry sectors in common between the countries c and \(c'\), it is defined as: \({\fancyscript {Z}}_{cc'}=\sum _{s=1}^S M_{cs}M_{c's}=MM^T\)

This motif represents the co-occurrence of two products in two countries’ export basket within the bipatite matrix of the country exports. The accuracy of the BiCM prediction in reproducing the value of quantity \(\mu _{sim}\) please follows34.

Figure 5
figure 5

Slope coefficient of a sliding window across \(25\%\) of the countries (corresponding to 36 countries) of respectively its Fitness Growth Rate (A)— GDP Growth Rate (B)— GDP Ranking Growth Rate (C) versus Urban Population Growth Rate. The error bar corresponds to the fit’s \(95\%\) confidence interval. The colors follow the Urban Range scheme.

Urban range versus fitness and GDP

To validate our analysis of the relation between the country’s fitness and the urbanization process we analyzed the urbanization growth rate in relation to the growth rate of three different metrics: the country Fitness (Fig. 5A), country GDP (Fig. 5B), and country GDP ranking (Fig. 5C) between 1995 and 2010.

We study the variation of the slope coefficient of a sliding window across \(25\%\) of the countries urban range and the three metrics above. Both the metrics extracted from the GDP do not have statistical significant results. Although the growth rate of fitness in relation with the urbanization growth rate manifests a linear relation (Fig. 5A) with an \(R^2=0.53\), as Fig. 3B, we notice that the fitness ranking is a more reliable tool than the raw fitness value43. The fitness ranking provides a more stable metric across each sliding window.