Growth dynamics and complexity of national economies in the global trade network

We explore the quantitative nexus among economic growth of a country, diversity and specialization of its productions, and evolution in time of its basket of exports. To this purpose we set up a dynamic model and construct economic complexity measures based on panel data concerning up to 1238 exports of 223 countries for 21 years. Key statistical features pertaining to the distribution of resources in the different exports of each country reveal essential in both cases. The parameters entering the evolution model, combined with counterfactual analyses of synthetic simulations, give novel insight into cooperative effects among different productions and prospects of growth of each economy. The complexity features emerging from the analysis of dynamics are usefully compared with gross domestic product per capita (GDPpc) and with an original measure of the efficiency of the economic systems. This measure, whose construction starts from an estimate of bare diversity in terms of Shannon’s entropy function, is made fully consistent with the degree of specialization of the products. Comparisons of this measure with the model parameters allow clear distinctions, from multiple perspectives, among developed, emerging, underdeveloped and risky economies.


Introduction
Hereby we provide information on the sources used to extract the data for each single country. We also illustrate the integration method used for the stochastic differential equations and present details of the calibration procedure followed in order to determine the values of the parameters that best fit the data. Additional results and plots illustrating properties of the entropic measures are reported in the last section.

S1.1 Export Data
The data used for quantifying the products exported by each country are extracted from the BACI database 1 . This database, redacted by the CEPII Research Institute, is in turn built upon the COMTRADE database (freely accessible at the UN COMTRADE website 2 ), and consists in a revision of the latter, in which the authors check and rectify the amounts declared by the exporting and importing countries to make them consistent with each other. All the prices are expressed in terms of current US dollars (thus, justifying the insertion of an inflation term in the dynamic model). The BACII database is a 4 dimensional panel that spans the following axes: 1. Time: data are reported on an annual basis (starting from 1995) and are constantly updated with the latest data available.
At the time we started performing our analysis the database covered a 21 years range, with the latest data being those of 2015.
2. Product: The products are classified according to a 6 digit code (the Harmonized System 2007 3 ) which consists in roughly 5000 different categories of products.
3. Exporting country: about 223 countries are present in the database. Data are organized at a country-to-country level, so that one can know how much of a single product has been exported from any country to any another. 4. Same for importing countries.

S1.2 Gross Domestic Product
The values of the Gross Domestic Product Per Capita (GDP pc ), have been extracted from a database 4 of the United Nations Statistics Division (UNSD). This database spans the period 1970-2015 and quantities are expressed in current US dollars (same as for the BACII database).

S1.3 Consumer Price Index
Finally the Consumer Price Index (CPI) values were given by the Organisation for Economic Co-operation and Development (OECD) 5 . In table S1 we report the values used in our analysis, which correspond to a weighted aggregate at a global level of all the CPIs of every country present in the OECD databases.

S2 Data aggregation
The Harmonized System classification allows us to organize the data in less specific categories by simply aggregating products sharing the most significant digits. Our analysis was performed over 1238 "macro" categories built with 4 digit data for exports of each country. Another aggregation has been carried out over the importing countries axis, in order to obtain the Z c p,n representing the total amount of product p exported by country c in the year n. The entropic measures construction (see section S6) was carried out on the resulting dataset. In order to apply our dynamic model we had to perform a further selection in the Z c p,n panel. With data coarse-grained to a 4 digit level, it may still happen that some entries become zero. In principle this should indicate that a certain country stops suddenly to export a product. However, such occurrence is more likely related to an intrinsic problem of missing information in the original database: the data we work on are aggregated, so, if a product disappears, it means mean that the country has ceased to produce a whole macro category of products in just one year, which is unlikely to happen. Moreover these zeroes appear mainly when dealing with underdeveloped countries. So, this suggests that the problem lies in the redaction of the database itself. Having a Z c p,n = 0 is an issue because it produces a singularity when inserted in Eq. S1. So, when this happens, we need to remove the entire time-series associated with that product. We decided to proceed with the calibration only for those countries who kept more than 80% of their exports after these removals. So, the dynamic model was applied to 131 countries in total.

S3 Dynamic model
We rewrite our stochastic system of differential equations (SDE) for the Z c p (t)'s in the following form: where η c is a colored zero-mean Gaussian noise. In the following sections we provide an exhaustive description of the dynamics described by this system of equations in order to give a better understanding of the meaning of the parameters and of the model ingredients.

S3.1 Correlation matrix
In the basket of a country, this matrix plays the important role of quantifying the correlations between the increments of each pair of exported products. The first step in building such matrix is to evaluate the yearly logarithmic returns which we standardize referring to averages over all the years (indicated by A n = Σ n A n /T for a quantity A n ). where T = 20 represents the last available year for the returns (2015 in this case). As we will see in the following sections, the correlation matrix will play an important role in the construction of both the matrix ruling the transfers between different productions and the colored noise.

S3.2 Matrix J c pp
The matrix J c pp is constructed exploiting the correlation matrix as a sort of "inverse distance" (see Timbergen's gravity law 6 ): the shorter this "distance", the larger the rate at which the transfer of resources between two products occurs. In the structure of the transfer term of the equations , the gravity law is present also because the transfer term is proportional to both Z c p and z c p . Indeed, we write: where G c is a coupling multiplicative constant that regulates the magnitude of these transfer processes, and z c p is the fraction of total export for product p averaged the last five years: This is the quantity used to discuss the ranking in value of exports in the main text. The transition matrix A c , with elements A c pp = J c pp for p = p and A c pp ≡ − ∑ p =p J c p p , establishes and maintains the observed ranking of the various Z c p (t)'s once we choose J c pp ∝ z c p . Indeed, with such a choice, z c p is in the kernel of A c and, in the absence of noise terms, the solutions of Eq. S1 tend to a long-time attractor in which the Z c p 's are in the kernel of A c itself 7 and are thus proportional to the z c p 's.

S3.3 Noise
The noise that defines the stochastic dynamics of Eq. S1 needs to take into account that fluctuations in the growth conditions are correlated both in time and between different products. The former correlations require to define η c as a colored noise with characteristic time τ c and variance σ c , while the latter will imply that we can't use a Kronecker delta for the correlation among noises for different products. Indeed, fluctuations affecting a certain product will be felt by other products that are "spatially" close. For these reasons, we define η c as a colored Gaussian noise with mean and correlation τ c represents a typical opportunity/crisis period duration 8 , while σ c regulates the magnitude of the fluctuations. Here the averages indicated by . have the usual meaning in stochastic differential calculus. c c pp appears in the noise correlator because the noise is directly responsible for the return correlations recorded historically.

S3.4 Drift
The term µ c (t) represents a deterministic drift, accounting for the average percentual input of resources in the export production in a given country in the absence of mutual influences. The time dependence comes from the inclusion of the inflationary effects to which the exports are exposed every year (as explained in section S1) Thus, we write in order to separate the constant, average contribution to the driftμ c from the inflationary one I(t), which can be read as an yearly step-wise function whose values are taken from the OECD 5 and reported in table S1.

S4 Calibration procedure
In this section we will drop the · c to avoid redundancy in the notations. The calibration procedure we follow is similar to that presented in ref. 9 , with the main difference lying in the presence of the new background noise σ 0 . Dividing Eq. S1 by Z p (t) and integrating in the time interval [n 1 , n 2 ] (n 1 and n 2 integers), we obtain f p (n 1 , n 2 ) = Gg p (n 1 , n 2 ) +μ + 1  Figure S1. An offset noise term characterize the networks of underdeveloped countries. Here we report the quantity n 2 Var [ f p (0, n) − Gg p (0, n)] evaluated from the data of a driving country in the economical scene (USA, panel (a)) and for an underdeveloped country (Zambia, panel (b)). It is evident that an offset term arises for the latter country, justifying the introduction of the additional parameter σ 0 .
where we introduced the functions which are estimated approximately by sums in place of integrals because of the discrete nature of the data at our disposal. We immediately see that Eq. S10 establishes a linear relation between such functions, with G representing the slope. Therefore we perform a robust linear regression of the scatter plot f vs. g to estimate the value of G. The intercept obtained with the regression is a first estimate ofμ (the stochastic term here averages to 0), but a more accurate derivation can be performed after calibrating the parameters σ and τ. In order to do that, we first need to rearrange Eq. S10 by moving to the left side the Gg p term. We then evaluate the variance of the resulting equation. By setting n 2 = n and n 1 = 0 we obtain: When dealing with developed countries this relation is usually quite solid, but for the rest of the cases the empirical plots show that there is a missing offset in the equation (see Fig. S1). Such effect is imputable to the fact that the countries are not a closed system and are subject to what happens outside of their borders: the less a country is driving in the global economy, the more its trades will be affected by fluctuations of the countries it relates to. In order to correctly calibrate the parameters σ and τ of a country, we thus need to clean the data by introducing an offset noise term σ 0 , which in turn will be an indicator of the extra fluctuations the country's exports are exposed to. Eq. S13 becomes then: The exponential term decays quite fast (ex post analysis shows that τ c = 2.8 ± 0.2 y). So, by neglecting the first years terms, we are able to perform a linear regression: the slope m = 2σ 2 can be directly inverted to obtain the value of σ , while the intercept equation q = −2σ 2 τ + σ 2 0 needs to be coupled with another condition in order to determine the remaining two parameters. For this reason we integrate Eq. S15 in the full time range [0, T ] obtaining: where we disregarded the exponential term e −T /τ obtained after the integration. Inverting the system of equations finally yields We calibrate the last remaining parameterμ through repeated synthetic simulations (see below) of the system of equations deprived of the deterministic driftμ itself. In this case the average growth reads λ * T = h T (G) + ∑ T t=0 I t . Thus, one can find the value ofμ that reproduces correctly the growth from the historical data by difference: In Figs. S5 we report on the global maps the values of the parameters of the 131 countries on which we performed the calibration procedure, while in Figs. S6, S7, S8 and S9 we show a scatter plot on the GDP pc -S c,20 plane of the same parameters.

S5 Integration
The stochastic differential Eq. S1 involves a colored Gaussian noise that evolves according to where ρ ≡ e − dt/τ and ξ p (t) is a zero-mean Gaussian noise with correlation ξ p (t)ξ p (t ) = c pp δ (t − t ). In order to obtain a noise with this properties we perform the C = LDL T Cholesky decomposition (see for instance 10 ) of the matrix C ≡ c pp , which can be applied to any symmetric matrix and does not require the evaluation of any square roots in the diagonal terms (which may become a problem when dealing with large matrices whit relatively small entries). Thus, we are able to apply the decomposition to a vector of independent Gaussian noises ξ to generate ξ = LD 1/2 ξ . By substituting the correlated noise in Eq. S1 and discretizing we obtain where we introduced the Wiener processes ∆W p,t = √ ∆tξ p,t and the two coefficients With the assumption ∆t τ different integration prescriptions yield the same results: we chose a second order Runge-Kutta scheme for the Itô prescription, suited for systems of equations and characterized by both strong and weak convergence of order 1 11 . If we partition the time interval [0, T ] in L sub-intervals each of length ∆t, the integration approximation { w l } l=1···L estimated over such mesh is given by Here B 1,2 are two diagonal matrices containing the Wiener processes associated with the noise ∆W p,n ≡ √ ∆T (ξ p,n+1 − ξ p,n ), which have entries where S p,n are random variables that can assume values ±1 with equal probability p = 1/2. In all the numerical integrations performed we set ∆t = 0.01, satisfying the condition ∆t τ.
S5/S19 S6 Entropic measures of diversity and specialization

S6.1 Algorithm
To simplify notations here we omit the subscript n. Thus, our considerations will refer to data for a generic year. The starting point that led to the construction of the algorithm for the entropies presented in the main text, is the idea to use Shannon's entropy 12 as measure of the diversity in the basket of the exported products of a country 13 . Here the role of probabilities is played by the shares s c,p ≡ Z c p / ∑ p Z c p of the total export by the country in a certain year. Therefore a first, bare measure of diversity will be: The more a basket is equally partitioned among the products, the higher will be the value of this entropy. The ideal (unrealistic) case is therefore the one in which every product has a share 1/M, which gives the upper bound limit value log M to this entropy. On the other hand, when all the exports are concentrated in a restricted niche of products (like in oil-exporting countries) the entropy will be small, with lower bound value 0 corresponding to the case in which only one product is exported. We can define an analogous quantity for the products, in order to understand which products are more common, and therefore easy to produce, and which are more rare. The shares are evaluated in this case over the total global export of each product, as q c,p ≡ Z c p / ∑ c Z c p , Hence we define: so that in this way Q will be higher for the most specialized products and lower for the most common ones. The next step in creating the algorithm is to use S and Q respectively as a weight in valuating more refined entropies: the new weighted shares will therefore be: Using these shares as entries, we valuate the new entropic diversities S p . We then naturally iterate the procedure to further refine our measures. The algorithm at the k-th steps takes the form: which is the one presented in the main text.

S6.2 Convergence
In panel (a) of Fig. S2 we show the euclidean distance evaluated between consecutive iterations, which is defined as where S (k) and Q (k) are the vectors with components {S holds with coefficients α = −1.720 ± 0.001 and β = 3.12 ± 0.02, found by fitting the points obtained with 100 different random initial conditions (see panel (b) of Fig. S2). With this parameters we were therefore able to estimate the Lipschitz constant 14 , defined as which turns out to be q = e α = 0.1824 ± 0.0003 (S32) These estimates suggest to classify the algorithm as globally convergent with a linear rate of convergence (q < 1 means that the map associated with the algorithm is a contraction). In panel (c) of Fig. S2 we report the ratios of distances from the fixed points of consecutive iterations, and confront them with the value of the Lipschitz constant q obtained with the previous regression.

S6.3 Data presentation
In Fig. S3 we compare our indicators with the current state of the art in terms of complexity indicators, represented by the fitness F introduced in 15 . As we can clearly see from panels (a) and (b), we have an exceptionally high correlation between F and S: evaluating the Spearman correlator yearly we obtain on average a value of ρ s (S, F) t = 0.95, suggesting the existence of a monotonic relation between the two quantities. Such a strong correlation was to be expected, since the algorithm of Fitness has the main purpose of rewarding countries exporting specialized products and data show that countries exporting such products are characterized by high degree of diversification. So, it is not surprising that rewarding in first place number of exports and evenness of relative shares, as in our case, can lead to a substantial agreement. An exponential relation of the form F ∝ e αS is found to be quite plausible, with α assuming values close to 1. When comparing our measure of specialization of the products Q p with its analog computed with the Fitness algorithm, we still see a positive correlation but definitely lower ( ρ s (Q, Q * ) t = 0.42 on average). Following also hints in ref. 16 , we therefore need to further investigate this aspect to better understand how the two specialization measures are related. Fig. S4 is an enlarged version of Fig. 1b of the main text, which highlights the relation connecting the total export of a country with the Spearman correlation coefficient evaluated among the z c p of the country and the world. The palette used to color the points is representative of the entropic diversity S c,20 evaluated in the year 2015. A vertical gradient of the colors is clearly perceivable, indicating that countries with a basket more similar to that of the world have a higher entropic diversity. A S7/S19 lighter horizontal gradient is also present, indicating that, given a fixed value of correlation, countries that have a higher total export have in general a less balanced basket. In Fig.s S10, S11, S12 and S13 we show the 223 countries on the S-GDP pc plane for each year in the period between 1995-2015. As we can see, even if S c correlates positively with GDP pc , it is a complementary measure to it, which adds an additional non-trivial dimension that helps classifying a country's economic status. We are therefore able to distinguish countries in developed (top right corner), emerging (bottom right), underdeveloped (bottom left) and economically risky (top left). S10/S19 Figure S5. Parameters of the model for the 131 calibrated countries: total fluctuations a country is exposed to (σ tot ), the yearly average transfer rate J pp , the average deterministic drift deprived of the inflation rateμ c and the growth caused by cooperativity in a country's network h c 20 . In the last three maps we present the contributes to the growth equation λ 20 = h 20 +μ + I t t normalized by the overall growth λ 20 itself. S11/S19 S19/S19