Entropic measure unveils country competitiveness and product specialization in the World trade web

We show how the Shannon entropy function can be used as a basis to set up complexity measures weighting the economic efficiency of countries and the specialization of products beyond bare diversification. This entropy function guarantees the existence of a fixed point which is rapidly reached by an iterative scheme converging to our self-consistent measures. Our approach naturally allows to decompose into inter-sectorial and intra-sectorial contributions the country competitivity measure if products are partitioned into larger categories. Besides outlining the technical features and advantages of the method, we describe a wide range of results arising from the analysis of the obtained rankings and we benchmark these observations against those established with other economical parameters. These comparisons allow to partition countries and products into various main typologies, with well-revealed characterizing features. Our methods have wide applicability to general problems of ranking in bipartite networks.

irrespective of the specific export or exporter they refer to. This corresponds to the standard use of the entropy function in the above mentioned fields of application. Distinction among productions and countries is then obtained here by reweighting self-consistently, in terms of the searched measures, the amounts corresponding to products and countries when evaluating the entropy functions. Such self-consistency can be achieved by running an iterative procedure in which bare entropies enter as starting inputs. The final measures searched are then obtained as fixed points of the iterative scheme. While iteration to reach self-consistency is common to all methods proposed so far to obtain complexity measures, in our case its realization within a scheme dictated by the Shannon function guarantees the existence of a fixed point. This feature, and the fact that the iterations reveal the character of contractions rapidly converging to the fixed point, represents an advantage with respect to previous alternative approaches.
The fact that our measure of country competitiveness is expressed as a Shannon entropy function of the weighted percentual shares of the exports in the basket offers also the unique possibility to decompose the same measure into intra-and inter-sectorial contributions, once the shares are summed to represent different sectors of production. This is allowed by a key, characterizing property of the function 17 and opens the possibility to revisit in the spirit of the economic complexity approach some methods of analysis already in use in development economics 16 .
Another virtue of our approach is the wide potential applicability, mainly due to the fact that it is based on the universally used Shannon function. It can indeed be used in any problem involving (weighted or not weighted) bipartite networks 21,22 , when the searched rankings depend in a simple way on the number and strength of the links connecting the two network's layers. Possible contexts of application range from ecology 23 , e.g. the case of plant-pollinator 24 and predator-pray 25 networks, to collaboration networks 26 as well as gene sharing networks in microbial communities 27 .
This article is organized as follows. In "Data and bare entropic measures" section we describe how the data on the exports are used for setting up our bipartite network and provide empirical evidence in favor of the use of the Shannon function. "Iterative scheme" section is devoted to the exposition of the iterative algorithm, while "Results" section presents the main results. "Coarse-grained analysis, and intra-vs inter-sectorial contributions" section illustrates an example of intra-and inter-sectorial analysis. Conclusions are in "Conclusion" section.

Data and bare entropic measures
The data we use for the analysis are extracted from the BACI database 28 , which is a refined version of the freely accessible COMTRADE database 29 . This contains export data on a country-to-country level covering 21 years between 1995 and 2015. The total number of countries contained in the database is N c = 223 while products are classified in 5016 categories according to the Harmonized System 2007 (HS07) 30 , which consists in a 6 digit code of hierarchical nomenclature. Every digit represents a category by which the good is classified, and this category becomes more specific as the number of digits increases. This hierarchical structure consents to naturally coarse-grain the data at different levels, aggregating products sharing the most significant digits. Being not limited by problems of convergence, our analysis will make use of the full information contained in the original dataset by considering all the 6 digits that characterize the HS07 classification with a total of N p = 5016 different product categories. It is precisely the aggregation represented by the four-digit classification with respect to the six-digit one that we will consider below in order to illustrate the decomposition into inter-and intra-sectorial contributions.
A preliminary aggregation of the data is carried over the importing countries, in order to obtain the biadjacency matrix X cp (n) , representing the total amount of product p exported by a country c in a given year n (Fig. 1). Since the analysis does not mix data from different years, from now on we will drop the year index n and specify the year considered in the text.  www.nature.com/scientificreports/ The starting point for the construction of our measure is a bare Shannon entropy of the nodes of each layer. Given a set of probabilities {p i } i=1...N , i p i = 1 , the Shannon function is defined as If we think the p i 's as relative occupations of a collection of N states, Eq. (1) expresses the diversity of the corresponding distribution. Indeed, H increases with both the total number of states or entries, N, and with the evenness of the distribution of the p i -s. For any N > 1 , H is bounded in the region [0, log(N)] , with the maximum value being reached in the case of exact equipartition p i = 1/N and the minimum when the occupation is concentrated in one single state.
In the context of the countries-products bipartite network the shares of products in the basket of a given country c can be interpreted as a set of probabilities which altogether provide a measure of the diversification of a country's exports 5 . Something similar holds also for the products: defining for each country a share of an exported product with respect to the worldwide export of that same product p one can represent somehow the ubiquity of that export 5,20 . Similar uses of the Shannon function to estimate diversification are not new in fields like development economics 15,16 .
So, the shares ξ 0 cp = X cp / p ′ X cp ′ of the products p in the basket of country c can be plugged in Eq. (1), providing us with a "bare" entropic indicator of productive diversification: In this sum there are of course terms which are zero corresponding to products which do not appear in the basket of the considered country. It is the number of nonzero terms which corresponds to the N-dependence we were discussing in connection with Eq. (1). In The different total numbers of products exported by the various countries are given by the areas under the corresponding histograms. Up to differences in these numbers the shapes of the histograms are rather similar within each group. Therefore, throughout the rest of the paper, USA, IRQ, PRT and ALB will serve as representative examples for countries characterized by these differently structured economies. We also notice that fully and moderately developed countries like USA and PRT present a relatively narrow peak rather close to the equipartition value of the full range of products 1/N p , while both IRQ and ALB show a broader peak, with IRQ's peak being much further away from this equipartition value. Moreover, while for example PRT  31 ). (e) Median of the distribution of the shares vs. total export of a country, colored with respect to the number of exported products. The colored areas show the locations of the four groups of countries whose histograms are reported in (a-d). In the inset we highlight the countries exporting less than 1000 products defining a "poverty trap" area. The red line is a linear regression performed over such countries in log-log scale. www.nature.com/scientificreports/ and IRQ share a very similar total export X c = p X cp ∼ 5 × 10 10 $ , we can clearly see how the distribution are very different: not only the number of exported products (area of the histogram) is much larger for PRT, but also the distribution of basket shares in the case of IRQ is extremely uneven. This is due to the overwhelming dominance of the few oil related exports (highlighted with a blue arrow in Fig. 2b), with total shares holding 98%. One parameter that also captures and partly quantifies the above considerations is the median of the distributions of the basket shares. With the due precautions connected to the fact that the median of the distribution is also depending on the number of products exported by the country and thus cannot be naively compared to the world equipartition value 1/N p , this parameter provides an insight complementary to that of the total export. In Fig. 2e we show how such quantity relates with the total export X c for each one of the 223 countries of the data-set, always in the year 2015. The color scheme used in Fig. 2e reflects the number of products exported by each country. It is remarkable how all the countries exporting less than 1000 products are found to lie on a line, exhibiting a very high correlation between the overall export of a country and the median of its basket shares. Such behavior defines a sort of "poverty trap" on which are lying the countries whose export depends only on a blind exploitation of the natural resources at disposal. Note that, in spite of its relatively high total export, also Iraq is lying on this line because of its complete dependence on oil exports. Countries that managed to improve their overall economic wealth had to increase both the number of products and the diversification of the total production. This strategy allowed them to detach from the "poverty trap" line in a direction which brought them closer to the ideal equipartition value 1/N p .
The above considerations point to basket shares distributions as a key ingredient for the construction of a measure for the productive efficiency of the countries. At the same time, the Shannon entropy function introduced in Eq. (2) qualifies as a natural candidate to represent a single value quantifier of all the "information" contained in such distributions.
Similarly, one can think of an analogous argument also for the layer of the products 5 . For any given product p we can define an ubiquity measure that takes into account both the number of countries that are able to export such product and how evenly its offer on the global market is distributed. To do so, we define an export share ζ 0 cp = X cp / c ′ X c ′ p which is normalized with respect to the overall export of that same product at a global level. The bare entropic indicator for the ubiquity of products will therefore be: These quantifiers will be at the basis of our measure constructions.

Iterative scheme
The entropic bare measures of Eqs. (2) and (3), although already rather sound, are not making full use of the information at our disposal. For instance, the formulation of H 0 c is indifferent to interchanges among products, meaning that swapping the exports of two products would leave the entropy unaltered. Analogously, swapping the exports of two countries for the same product p would preserve the associated ubiquity H 0 p . Nevertheless, each product (and country) should enter differently in the evaluation of the measures. This is also required by the basic assumptions of the economic complexity approach, which through a comparative analysis aims at distinguishing among products and among producing countries. In general, one would intuitively expect "important" products to weight more in a country's wealth indicator. Such importance should be determined by how many developed countries are exporting these products. We can introduce such dependencies by reweighing the shares entering Eqs. (2) and (3) with weights self-consistently related to diversities and ubiquities. The imposition of such selfconsistency can be achieved by the construction of an iterative algorithm in which at each step we evaluate finer measures of diversity and ubiquity by using weights determined by the same quantities evaluated at the previous step. A general formulation of such algorithm reads: with shares at the k-th step defined as Here f and g are two functions that, respectively, take in as argument H p and H c . Inspecting Eqs. (4) and (5) one can easily appreciate the role played by the weights introduced to modify the bare shares in the Shannon function. For the sake of argument and to simplify notation, let us look at the f's and the g's in Eq. (5) as to independent variables with labels referring to the corresponding product p ( f (H Then, let us consider for example a country with a very special basket having one single dominant product p ( X cp ≃ 1 ) and all the other products p ′ � = p with X cp ′ ≃ 0 . Due to the shape of the function −x log(x) in the interval [0, 1], an increase of f p would lead to a decrease of H . So, an unbalanced basket dominated by a single export becomes even less efficient if that export is further valued. www.nature.com/scientificreports/ Thus, the logic at the basis of using weighted shares is such to allow distinction among products and to favor an optimal balance among all productions at the same time.
The choice of the functions f and g in Eq. (4) is clearly not unique, and in Ref. 5 a particular form of selfconsistency was preliminarily proposed. In any case the choice is dictated by the aims of the analysis. Thus, it needs to reflect the fact that the more a product is ubiquitous the less it should contribute to the diversity of a basket of a country. Also, the more a country is wealthy and developed, the less it should count in determining the ubiquity of a product. Therefore we need to define two functions f and g that invert the concept of diversity and ubiquity, respectively. Thanks to the boundedness of the Shannon entropy function, a simple and convenient way to achieve such inversion is through the following simple linear relations: These positive weights, with their continuity, grant existence of a fixed point 32 and in fact stability of the iterative algorithm.
Indeed, the scheme of Eq. (4) can be seen as a map ϕ of a closed, compact set in itself Such a peculiar feature is guaranteed by the boundedness of the Shannon's entropy function, which (together with the continuity property of ϕ allows us to apply Brower's fixed point theorem 32 to ultimately prove the existence of a fixed point {H c , H p } c=1...N c p=1...N p for the map ϕ . In Fig. 3a we show indeed how such fixed point is reached exponentially fast by iterating numerically the scheme of Eq. (4). Evaluating the Euclidean distance between two consecutive steps, defined for every k ∈ N as: Numerically we also tested the uniqueness of such fixed point (to check if the algorithm is globally convergent) by iterating the scheme starting from different randomly chosen initial conditions. We evaluated at each step the distance with respect to the previously obtained fixed point, defined as and found for it an exponential decay with k → ∞ : D (k) ∝ e αk with α = −1.52 (Fig. 3b). With such parameter we were therefore able to estimate the Lipschitz constant 33 , defined as q = lim k→∞ D (k+1) /D (k) = 0.22 (see Fig. 3c). These estimates allow to classify the algorithm as globally convergent with a linear rate of convergence ( q < 1 means that the map associated with the algorithm is a contraction).
The scheme therefore always ultimately converges to the same fixed point for the entropies of countries and ubiquities of products, for which the following consistency relations hold www.nature.com/scientificreports/ where we introduced the weighted shares normalized with respect to a country's export The iteration procedure, through its fixed point, solves the mathematical problem of providing the solution of the self-consistency conditions in Eqs. (10) and (11).
From an economic point of view, the iteration scheme can be interpreted as an attempt to progressively establish the weights of the factors playing a role in the production network of a good. Trying to establish the relevance of all the intermediate steps that lead to the realization of a product reveals to be an extremely challenging task. Production chains are indeed the result of multiple complex interactions, often involving products realized across different countries at a global scale. Our approach naturally overcomes the limitations induced by analyzing a single national value chain: through exports, all the national productivity chains are effectively interconnected with one another, merging into a global value chain. One therefore needs to take into account such connections, keeping in mind that every country is nevertheless required to offer a highly diversified basket of products in order to retain an adequate self-sustainability as well as the ability to compete as a driving force in the global economy. The iterative scheme endogenously captures the intrinsic relevance of both countries and products, establishing weights for these structural ingredients of the global value chain.

Results
In this section, we present some of the main results emerging from the analysis performed with our algorithm.
In Fig. 4a, we report on the horizontal axis our fixed point entropic measures H c of the 223 countries for the year 2015. On the vertical axis is the total export of each country in the same year, and the colors vary according to total number of exports. Countries with maximal entropic efficiency are in fact those with the highest total export value on the top right corner. Not surprisingly, also in view of what we anticipated in the second section (Fig. 2b), the countries in this corner are also those with the largest numbers of productions. Here we can identify in particular the four countries pointed out in Fig. 2a-d as representative of four different types of economy. While USA and PRT are in the top right corner, although with different levels of total export and number of products, IRQ is in the top left corner, of countries which are based almost exclusively on export of natural resources. ALB clearly differentiates from IRQ with its higher entropic measure, largely due to the relatively broader distribution of shares in the basket. Notably, this entropic measure successfully captures the distance from the "poverty trap" highlighted in Fig. 2b and here (Fig. 4a) represented by the region in which the countries with less then 1000 exported products lay. The countries in this trap are now spread on a rather large region colored in red. The entropic measure also provides a better estimator of country efficiency than the  Fig. 4b, where they are reported on the horizontal axis. The year of reference is again 2015. On the vertical axis is reported the worldwide amount of export of each product and the scale of colors is in accordance with the number of exporting countries. The horizontal and vertical lines simply represent median values of the corresponding quantities. In this way one partitions the products in four categories: in the top right corner fall the products with large ubiquity and with large volume of global export. These products are generally exported by a large number of countries, which is often also responsible for the large value of the global exports. Products like crude and non-crude oils are also falling in this category. The products in the top left corner are those with low ubiquity and include computer processor units (CPUs) and aircrafts. In this case, while the amount of global export is often very large, the number of exporting countries is small, as appropriate for very specialized productions. In the right bottom corner the are exports with high ubiquity exported in variable but moderate total amounts by a number of countries definitely lower than that of nations contributing to the upper right corner. Finally, in the lower left corner there are products which are very marginal in the global economy. These products have low ubiquity, small total amount of exports, and few exporting countries.
It is of particular interest to find out how specific countries are positioned with their exports compared to the global situation represented in Fig. 4b. In Fig. 5a-d we do this comparison for USA, PRT, IRQ and ALB. For instance, in the case of USA we reproduce Fig. 4b with all products in black, coloring in blue only those that are actually exported by the USA. Remarkably, we see that almost all points are blue, meaning that the USA are exporting almost all products present in the global basket. To quantitatively appreciate how the exports of the USA are divided in the four sectors identified in Fig. 4b, we report (in blue) the medians pertaining to only the blue dots. The percentages reported in each sector give a measure of the overlap between USA export basket and global basket in that sector. For the USA the only percentage lower than 100% is in the lower left corner.
Also for PRT all the sectors are rather well occupied, but one can already appreciate a slight shift to the right of the horizontal median, indicating an average higher ubiquity of exports with respect to the USA. In the case of IRQ and ALB we see instead a very sensible shift of the medians, and, especially in the case of IRQ considerably low percentages, indicating that these economies are definitely far from reproducing the global export basket. The shifts indicate that such countries are forced to export products that are not only on average more ubiquitous, but also more dominant in the world aggregated basket of exports. This is not surprising, since the www.nature.com/scientificreports/ export aggregated at a global level exactly coincides with the worldwide import. This means that poorly developed countries, lacking the possibility to export elaborated products, concentrate their efforts in goods that are relatively easy to produce and highly demanded in the global panorama. A peculiarity that emerges from this analysis is how developed countries seem to export all kinds of products, including those with extremely high ubiquity. Least developed countries, on the other hand, are lacking an enormous percentage of such products. This suggests that a country, in order to increase its development, cannot aim to just produce the most complex and/or heavily exported products on the market. There are indeed articulated constraints linked to the possibility of producing such goods. Such constraints in many cases remain even hidden (related with the so called "intangibles"). This analysis sheds light on how the lack of apparently irrelevant products in the basket might heavily impact the overall development itself of a country down the line. In other words, we understood how a developed country can hardly refrain from continuing to keep its exports as diverse as possible, including ubiquitous and poorly demanded products.
It is interesting to compare our rankings with those one obtains with the approach of Ref. 2 . Here the rankings of that approach were obtained by use of the full 6-digit code of the HS7 product classification For sure the fitness method, at this level of resolution can be anticipated to suffer of convergence problems 11 , which in fact were already encountered at the four digit level 10 . Instability issues of the algorithm impose to cut somehow arbitrarily the number of iterations in this case. The problem arises from the implementation of non linearity in the algorithm 10 , which leads some products' measures (therein called complexity) to progressively approach the limiting value 0. Such behavior denotes an instability of the scheme, extremely marked for the complexities but less visible for the fitness measure of countries. Indeed, evaluating the Spearman correlator 34 between the countries' fitness F c resulting after hundred iterations and our diversity H c of Eq. (10) yields the rather high value ρ S {H c , F c } = 0.94 (6 digits code of the HS07 classification, year 2015). The Spearman correlator between products' complexity Q p and our ubiquity H p of Eq. (10) yields ρ S H p , Q p = −0.32 . The minus sign comes from the fact that ubiquity is conceptually an inverse of complexity. The remarkable distance of the obtained correlation from the value −1 indicates that the two approaches give rather different rankings for the products.

Coarse-grained analysis, and intra-vs inter-sectorial contributions
The entropy function satisfies a special summation rule when clustering states 15,17 . The HS07 classification of products 30 is intrinsically structured as a multilayered nomenclature, where the leftmost digits are shared by products belonging to a more general macro-category. So far our analysis was carried out at the finest possible level of details offered by the dataset, consisting in 6 numerical digits identifying a total of 5016 product categories. According to the convention of the HS07 classification, one can group together products sharing the same 4 leftmost digits resulting in a total of 1241 macro-categories P. Referring to the fixed point shares ξ cp of Eq. (10) we introduce the coarse-grained shares ξ cP = p∈P ξ cp , for which one can evaluate a coarse-grained Shannon entropy: One can easily see that the inequality H c > H CG c holds, with the equality holding only in the case in which for every coarse-grained category P there is one and only one fine-grained category p. The difference between the two entropies is indeed related to the average intra-sectorial entropy of a macro-category P defined as Straightforward calculations show that the difference between the two level entropies is the following weighted average: This summation rule allows us to regard the coarse-grained entropy of Eq. (12) as inter-sectorial and H c in Eq. (14) as intra-sectorial contributions to the total entropic measure of country c. As we see in Fig. 6a, the intrasectorial contribution is more substantial for countries with high entropic measure, indicating the importance of organization within sectors. In Fig. 6b, we report also the total export of each country as a function of H c . One sees that the countries with the largest export, both in terms of total amount, and in terms of number of products, are those with highest intra-sectorial contribution. Germany (DEU) reaches the top of the list in this case. At the same time, most of the countries belonging to the "poverty trap" show a low H c , which is another indication of the scarce inner organization of their economic structure.
One can of course perform intra-and inter-secorial decompositions also with reference to larger sectors of the economies. Such decompositions were already considered in the economic literature with reference to bare entropic indicators 16 . This opens novel perspectives of analysis within the complexity approach.

Conclusions
In this report we showed how the Shannon entropy function can be used to develop a consistent and rapidlyconvergent method for the evaluation of economic complexity measures. In view of the universally accepted meaning of the Shannon function, our construction gives concrete support to the expectation that diversity can be assumed as a basic ingredient of such measures. www.nature.com/scientificreports/ From a mathematical and numerical point of view, the success of our approach relies on the continuity and boundedness of this function, which guarantees both the existence of and the convergence to a fixed point in our iterative scheme. Here our method was proven to be effective on a weighted bipartite network between two layers composed of 5016 (products) and 223 (countries) nodes, and with the links' weight spanning values across more than 9 decades. The proven stability of our algorithm therefore acquires additional significance in the perspective of more general applications, giving confidence to obtain consistent and meaningful results also outside the particular context and database considered here.
Results show that, using our refined entropic measure, it is possible to establish a meaningful and unambiguous ranking of countries, with the most developed ones also scoring the highest entropic measures. On the other hand, countries that have a poorly developed economy or simply rely on the exploitation of their raw materials are characterized by low values of our entropic index. In a similar fashion, products can be ranked according to their ubiquity. The more ubiquitous is a product, the more it is exported (or imported) in the global market. Developed countries characterize themselves as exporter of products in the whole range of ubiquities.
Another key mathematical property is the possibility to decompose the function into parts which can be ascribed to intra-and inter-sectorial contributions 17,18,20 . This gives the possibility to quantitatively analyze the complexity measure of countries in terms of the interplay among different categories in which one may wish to partition the production, and in terms of the the importance of each individual category. It emerges that developed countries are the ones displaying the largest intra-sectorial entropies.
While the analysis presented here referred only to a particular year of the dataset, a comparison of the results for different years opens to an analysis of the evolution dynamics of the complexity measures. This would allow to argue further information both on the different growth potential of each country and on the evolution of the hierarchy of importance of various products in the global economy. A preliminar analysis along these lines 5,20 has already highlighted a strong bond connecting the entropic diversity of a country and the hierarchy of products to a dynamical model regulating the yearly evolution of single exports for every country. For the time evolution of the basket compositions, the present authors have recently developed a statistical mechanics model [4][5][6] , calibrated on the same database used here, which clarifies the non-equilibrium character of its dynamics. The combined use of this model with the entropic complexity analysis offers in perspective possibilities of predictions going beyond standard regression analysis. As we showed in "Coarse-grained analysis, and intra-vs inter-sectorial contributions", the entropic measures allow to keep track of the whole information of the most detailed data, without dispersing it along a coarse-graining process. The control of the effects of coarse-graining on the entropy production is indeed one major issue in the general context of statistical mechanics out of equilibrium 20,35,36 .

Data availability
The data used in this work are extracted from the BACI database 28 , which is a refined version of the freely accessible COMDTRADE databse 29 redacted by the United Nations. Reference tables of the Harmonized System 2007 nomenclature and ISO-3166 country codes can be found respectively at Refs. 30,31 .   14) plotted against the overall export of a country X c . Germany (DEU) emerges as the country with the highest H c , implying an extremely articulated structure of the export shares. The gradient of the color of every point is related to the overall number of exported products by each country in the 6 digit HS07 classification. www.nature.com/scientificreports/ Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.