Abstract
Within the last fifteen years, network theory has been successfully applied both to natural sciences and to socioeconomic disciplines. In particular, bipartite networks have been recognized to provide a particularly insightful representation of many systems, ranging from mutualistic networks in ecology to trade networks in economy, whence the need of a pattern detectionoriented analysis in order to identify statisticallysignificant structural properties. Such an analysis rests upon the definition of suitable null models, i.e. upon the choice of the portion of network structure to be preserved while randomizing everything else. However, quite surprisingly, little work has been done so far to define null models for real bipartite networks. The aim of the present work is to fill this gap, extending a recentlyproposed method to randomize monopartite networks to bipartite networks. While the proposed formalism is perfectly general, we apply our method to the binary, undirected, bipartite representation of the World Trade Web, comparing the observed values of a number of structural quantities of interest with the expected ones, calculated via our randomization procedure. Interestingly, the behavior of the World Trade Web in this new representation is strongly different from the monopartite analogue, showing highly nontrivial patterns of selforganization.
Similar content being viewed by others
Introduction
In the last fifteen years network science has exploded, revealing a world composed by interconnected systems ubiquitously found both in natural sciences and in socioeconomic disciplines^{1,2,3}. Since the very beginning of network science, many different network representations have been adopted in order to study the particular system at hand^{4}. However, the class of networks represented by bipartite networks has been recognized to provide a particularly insightful representation of many different systems^{5}: ecological networks^{6}, trade networks^{7,8,9}, citations and collaboration networks^{10,11} represent only few examples.
One could thus expect a relevant amount of work aimed at identifying the statisticallyrelevant patterns observed in real bipartite networks, at least comparable to the mass of results obtained so far for monopartite networks^{12,13,14,15,16,17,18,19,20,21}: however, quite surprisingly, little work has been done so far to implement null models on real bipartite networks. Generally speaking, null models are statistical models used to make inference on a real system on the basis of partial information. The latter usually corresponds to some observable property of interest as the number of trade partners of a country, its exports and imports, the total exposure of a bank, etc. In particular, null models for bipartite networks being realdata rooted and showing the desirable features of general applicability and analytical character are currently missing. More in detail, the algorithms proposed so far show several limitations, ranging from being purely numerical (thus lacking the analytical character)^{6,22,23}, to assuming an a priori functional form either for the distribution of the quantities of interest^{6} or for the model parameters (thus not being real datarooted)^{24} or, lastly, using approximate analytical models^{25}. Moreover, almost all the aforementioned approaches are tailored on ecological networks, thus lacking the character of general applicability.
The lack of such models is, maybe, also due to the misconception that bipartite networks could be analysed by, firstly, projecting them on one of the layers and, secondly, analysing the projection with one of the models currently available for monopartite networks. As we will show in what follows, the monopartite and the bipartite representations enclose different kinds of information, irreducible to each other (in the most general case).
The aim of the present paper is to fill this gap, proposing a theoretical framework guaranteeing the three aforementioned properties. In order to do this, we extend a recentlyproposed method to randomize monopartite networks^{19} to bipartite networks. The method rests upon the sequential maximizations of Shannon entropy and the network likelihood function, a combination which has been proven to be rather effective both for detecting patterns and to reconstruct the structure of several realworld networks^{20,26,27,28,29,30}. To the best of our knowledge, the only other paper proposing a method satisfying the three requirements above is^{31}: we will comment on the differences with the one proposed here in the Discussion section.
While the proposed formalism is perfectly general, in this paper we apply our method to the binary, undirected, bipartite representation of the World Trade Web (hereafter WTW). We focused on this particular system precisely because of its popularity among network scientists, who have applied null models to all its possible representations^{26,27,32,33,34,35}, with the exception of the bipartite one. As we will show in what follows, representing the WTW as a bipartite network allows to gain a substantially new insight into an already deeply explored system.
The rest of the paper is organized as follows: Data section is devoted to the description of the dataset used for the present analysis, Methods section reports the detailed description of our method and Results section illustrates the results which are discussed in Discussion section, where conclusions are also drawn.
Data
The WTW can be represented in many different ways, depending on the level of information that we want to process. The most popular ones represent it via an adjacency matrix with nodes playing the role of worldcountries and links indicating the presence of (any kind) of trade exchange between them. This framework has been recently extended to analyse the WTW as a multiplex, where trade exchanges corresponding to different commodities are distinguished^{35,36}.
Here we represent the WTW as a bipartite network, i.e. by considering the set of worldcountries and the set of products as different entities and linking a given country to a given product if (and only if) the former exports the latter above a certain threshold (the socalled RCA^{8,9}). Applying the latter rises the probability that the exported commodity is actually produced by the exporting country. In this representation, any two countries (as well as any two products) cannot be directly linked (i.e. links connecting nodes of the same set are not allowed): thus, any two nodes of the same set can be still thought as “interacting” but only indirectly, via a connection with the same nodes of the other family. This way of representing the WTW allows us to analyze the global economy from a different perspective, by making the productivity relations between countries explicit (i.e. which country produces which product).
The dataset we have considered for the present analysis is the NBER database, collecting data for the 38 years 19632000^{37} and categorizing products according to the SITC revision 2 at fourdigits level. Data have been further processed, building upon the datamining procedure adopted in^{38}, to produce a dataset with 538 products across all years and a number of countries varying from 130 to 151.
Methods
The distinction between countries and products leads naturally to the definition of a biadjacency matrix, which will be indicated with . In the present paper we focus on the binary, undirected representation of the WTW: thus, the matrix entries will be either m_{cp} = 1, indicating that country exports an amount of product above the RCA threshold, or m_{cp} = 0, indicating that the production of by country is below the RCA threshold and, thus, has been ignored. As a consequence, each row represents the export basket of a given country, while each column represents the subset of producers of a given product. A pictorial representation of the WTW biadjacency matrix in the year 2000 is shown in Fig. 1, with the blue dotsrepresenting the ones and the white dots the zeros.
If we indicate with the total number of countries and with the total number of products, the total number of elements of the biadjacency matrix (i.e. its volume) is C ⋅ P , also representing the maximum observable number of connections. In fact, unlike the usual square representation, the problems arising from the presence of selfconnections are not encountered here. Moreover, the presence of two different subsets (also known as layers) induces a measure of “rectangularity” of our matrix M^{6}, i.e. , ranging in R ∈ [0,1), with values closer to 1 indicating a large asymmetry between the number of countries and the number of products and values closer to 0 indicating equivalence between the two layers cardinality (notice that the information on the sign would be based on the arbitrary choice of the layers ordering).
The definitions of other topological quantities of interest easily follow from the usual ones, as the number of links (i.e. the total number of connections)
and the connectance , measuring the percentage of observed connections. Fundamental properties are represented by the number of nodespecific connections, i.e. the degree of countries, also named diversification^{7,8,9}, measuring the number of products exported by each country.
and the degree of products, also named ubiquity^{7,8,9}, measuring the number of countries exporting each product.
Definitions (2) and (3) induce the notions of countries mean degree and products mean degree.
The last passage follows from noticing that .
In order to make the connections between nodes of the same family explicit, a bipartite network can be projected on its layers, thus recovering two traditional, monopartite representations. This operation can be straightforwardly implemented by considering the matrix products.
where M^{T} is the transpose of the biadjacency matrix M. While the dimensions of M are C × P, the dimensions of its transpose are P × C. This implies that results in a C × C matrix whose generic element , with c ≠ c′, counts the number of patterns of length two between countries c and c′. The generic, diagonal element is precisely the degree of country c. Similarly, Р results in a P × P matrix whose generic element Р_{pp′}, with p ≠ p′, counts the number of patterns of length two between products P and p′. As before, the generic, diagonal element is the degree of product P. Remarkably, the entries of matrices and Р have a clear macroeconomic interpreation: while counts the number of products shared by countries c and c′, Р_{pp′} counts the number of countries exporting both products P and p′.
Since nodes of the same layer cannot be directly linked, it is enough that a path of length two (i.e. the minimum allowed length) connects any two nodes of the same family to directly link them in the corresponding monopartite projection. Thus, by first applying the Heaviside stepfunction Θ[…] to matrices and Р elementwise (i.e. , where can be 0 or 1, if and respectively  and similarly for Р) and then subtracting the diagonal elements, the binary, adjacency matrices describing the two monopartite projections are recovered, i.e.
where I_{C} and I_{P} are the identity matrices having dimensions C × C and P × P respectively.
Topological measures for binary, undirected, bipartite networks
Several quantities have already been proposed to analyse bipartite networks^{6}. However, here we define different measures by extending some of the most used indicators in network theory, better capturing, in our opinion, the particular features of a given bipartite network′s structure.
a. Assortativity
The traditional definition of assortativity is intended to quantify the degrees correlations, by distinguishing the assortative behavior (signalling positive degrees correlations) from the disassortative behavior (signalling negative degrees correlations). When dealing with bipartite networks, we can measure such correlations both with respect to countries and with respect to products, by respectively defining the average nearest products ubiquity (or ANPU)
and the average nearest countries diversification (or ANCD) as
As in the monopartite case, assortativity is quantified by respectively scattering the ANPU and ANCD values versus the degree sequences and .
b. Complexity and fitness
As recently pointed out^{8,9}, countries and products can be assigned two purely networkbased quantities, known as fitness, F_{c} (to be assigned to countries) and complexity, Q_{p} (to be assigned to products), playing the role of nonmonetary indicators of the economy development and providing a highly nontrivial way to rank the worldcountries economic health (see also the Supplementary Information).
c. Motifs
The usual clustering coefficient, measuring the hierarchical structure of a monopartite network, cannot be defined for bipartite networks: in fact, since no odd cycles of any length can be observed in bipartite networks (precisely because links within the same layer are forbidden) triangles cannot be observed as well; similarly, the usual triangular motifs cannot be defined^{3,39}.
However, higherorder correlations between nodes can still be captured by defining a completely new class of motifs. The first examples we provide are the Vmotifs and the Λmotifs (see Fig. 2). The former count how many couples of countries export the same products, quantifying the productivities′ similarity; the latter count how many couples of products are in the basket of the same producer, providing a measure of products correlation. Remembering that , with c ≠ c′, counts the number of products exported by both c and c′, the total number of Vmotifs connecting any pair of countries is
and, remembering the analogous role of , the total number of Λmotifs connecting any pair of products is.
The last passages follow from noticing that each Vmotif (Λmotif) is constituted by a pair of links having the same product (country) as a common vertex. The number of countries competing on the same product, as well as the number of products in the same basket, can be further risen, leading to the following generalizations (with V2 ≡ V and Λ2 ≡ Λ):
Figure 2 shows an example of V3motifs and Λ3motifs. From defintions (12) it follows that V1 = Λ1 = L.
Higherorder correlations can be captured by allowing for a higher number of connected nodes in the same layers (see Xmotifs, Mmotifs and Wmotifs in the Supplementary Information). Remarkably, all the defined kinds of motifs:
can be compactly expressed in terms of products of biadjacency matrix entries;
can be defined for specific subsets of countries and products, thus allowing for a finer analysis of the production dynamics. For example, a measure of correlation of countries a and b production is given by the motif
may have an application also in the analysis of ecological networks, especially mutualistic networks (e.g. impollinatorsflowers): in fact, measures of cooccurrence can be directly applied to ecosystems to quantify the species’ competitiveness for the available resources.
In what follows we will focus on the Vn and Λn families (a more detailed discussion about all motifs is provided in the Supplementary Information).
d. Assortativity coefficient
Beside our definitons, we have also considered the assortativity measure proposed in^{40} and called r. The latter ranges in the domain r ∈ [−1,1], with r = 1 indicating the tendency of links to connect nodes with similar degrees and r = −1 indicating the tendency of links to connect nodes with different degrees.
e. Nestedness
On the basis of the two aforementioned measures F_{c} and Q_{p}, one can reorder the matrix rows and columns (i.e. countries and products) by, respectively, decreasing the fitness along rows (from top to bottom) and increasing the complexity along columns (from left to right), thus obtaining the triangular structure shown in Fig. 7. In order to quantify the shape of such a matrix, several measures have been recently proposed^{41,42,43,44}, under the common name of nestedness. Here we adopt the one proposed in^{41} (called NODF  see also the Supplementary Information). Notice that the measure of nestedness adopted here doesn’t depend on the rows and columns ordering criterion (in what follows we will adopt the one based on F_{c} and Q_{p} measures)^{8,9}.
Randomizing bipartite networks
In order to implement suitable null models to detect the statisticallyrelevant patterns of real bipartite networks, the lines of the method proposed in^{19} can be followed. In particular, an ensemble of binary, undirected, bipartite networks must be considered, in order to maximize Shannon entropy
under a given set of constraints ^{16,19}. Notice that the probability coefficient P(M) is assigned to every adjacency matrices in the esemble and the constraints are defined in terms of the entries of M. The result is the wellknown exponential distribution:
with the hamiltonian compactly expressing the imposed set of constraints, being the vector of Lagrange multipliers associated to the vector of constraints and being the normalization.
In the monopartite case, one of the most insightful null models has been proven to be the socalled Configuration Model (CM)^{12,14}. Let us now implement the bipartite extension of the CM (BiCM, in what follows), by constraining the degree sequence of the binary, undirected, bipartite WTW and analyzing the system beyond the information contained into it. Since now we have two different layers of nodes, the hamiltonian reads.
Now we can calculate the probability coefficient (14), associating a probability to each network in the ensemble on the basis of the specific degree sequences and :
the notation Π_{c,p} being equivalent to (see the Supplementary Information for the detailed calculations). The coefficient with and , is the ensemble probability of having a link between country c and product p, as .
Our null model provides the analytical expression of a network probability as a product over all the accessible C × P pairs of nodes. In other words, the BiCM interprets the links as independent random variables, thus defining a grandcanonical probability measure where links correlations are discarded. Notice also that no probability coefficients controlling for the presence of links between nodes in the same layer appear in the expression (16). This is a consequence of having considered an ensemble of bipartite networks as the support of our probability distribution: in so doing, the forbidden intralayer links are automatically excluded by the choice of the allowable configurations volume.
The probability distribution in (16) depends on C + P unknown parameters (i.e. the Lagrange multipliers), also called hidden variables^{13,24}. The recipe provided by statistical mechanics to estimate the hidden variables is summed up by the equations
However, no indication about the numerical value to be assigned to the ensemble average of constraints is provided. Thus, in order to estimate the hidden variables from data, let us first note that can be rewritten solely in terms of the observed constraints value, i.e. ^{19}. Then, let us consider the loglikelihood function :
The recipe provided by statistics to estimate the unknown parameters of a given probability distribution prescribes to maximize ^{19}. This means solving the system of C + P equations in C + P unknowns^{19}:
In what follows the vector of solutions satisfying the system (19), for given and as degree mean values, will be indicated as . Notice that the coefficients appearing at the second member of the system equations have the same functional form both for countries and products. This is a consequence of assigning only one Lagrange multiplier to each node but in such a way to distinguish the nodes in the first layer from the nodes in the second layer.
Expected topological measures for binary, undirected, bipartite networks
In the previous subsections several quantities of interest to be measured on binary, undirected, bipartite networks have been listed. In this subsection we will show how our method can be implemented to calculate their expected value (to be compared with the observed one) and the relative errors (to quantify the discrepancies) in order to assess up to what level our null model is able to explain the higherorder structure of the network.
Our method allows us to proceed in a twofold way. The first one is analytical. Using the linkspecific probability coefficients p_{cp} and the passages sketched in^{19}, we are able to analytically calculate both the expected value and the standard deviation of the (analyticallydefinable) quantities of the previous subsections. However, because of the impossibility to perform analytical evaluation of the average for some key quantities, we have adopted a different strategy: we have sampled the grancanonical ensemble of binary, undirected, bipartite networks induced by the BiCM according to the probability coefficients , measured the aforementioned properties on our sample and calculated the statistical moments, as average and standard deviation, of the generic quantity X as.
i.e. as sampling moments according to the sampling frequencies (N_{m} being the number of networks in the ensemble having biadjacency matrix equal to m). Since our method is unbiased^{19,21}], numerically sampling provides a faithful representation of the whole ensemble. We have also calculated the probability distribution (induced by ) of some of the properties of interest, in order to quantify the statistical significance of their observed value (via the zscore, for example).
Nevertheless, the analytical expressions of the expected value and standard deviation of the quantities explicitly defined in the previous subsections has been derived in the Supplementary Information.
Results
Let us first show our results on the temporal snapshot of the WTW corresponding to the year 2000. The number of nodes is C_{2000} = 151 and P_{2000} = 538, causing the R index to be (see section Methods). The high asymmetry of our network is also pointed out by the different mean degrees, and , indicating that countries are, on average, almost three times more connected than products. However, the connectance is : thus, our bipartite WTW is much sparser than its monopartite counterpart^{26}. Notice that our null model, constraining (on average) the degree sequence, exactly reproduces any network’s connectance by definition, spanning the domain of applicability of both the sparse and the dense network reconstruction algorithms.
Assortativity
Figure 3 shows the comparison between observed and expected values of our coefficients of assortativity. Having plotted VS d_{c} and VS u_{p}, we firstly observe that the bipartite WTW shows a disassortative behavior, signalled by a globally decreasing trend of our measures. More detailedly, two distinct behaviors seem to characterize as a function of d_{c}: while countries with low diversification are preferentially linked to products with high ubiquity (left side of panels 3a and 3b), countries with high diversification are linked to almost all products (right side of panels 3a and 3b). This is also reflected in the triangular structure of the matrix (see Fig. 1). For products, this distinction is less sharp (panels 3c and 3d): in fact, while highubiquity products are linked to almost all countries, lowubiquity products can be found connected to both high and lowdiversification countries.
As can be seen from Fig. 3, the BiCM captures the disassortative behavior of both and ; however, only part of the observed points lies within the ±2 standard deviations region. This means that the mechanism shaping the disassortative behavior of the WTW is not completely explained by our null model, signalling a nontrivial origin of the WTW degree correlations. What is strikingly surprising is the prediction based on the Random Graph model (BiRG): the corresponding trend is closer to the BiCM prediction than in the monopartite representation of the WTW^{26}. Moreover, since disassortativity is more pronounced in real data, our results indicate that the BiCM performs better than BiRG for small values of d_{c} and u_{p}, while the BiRG correctly capture their flat behavior at large d_{c} and u_{p} (i.e. for competitive countries and ubiquitous products, for which , ). This seems to indicate that the explanatory power of the degree sequence is far more limited in the bipartite representation than in the monopartite one and that additional information is required to improve the agreement between observations and predictions (even at the simplest level of binary, undirected networks).
Figure 4 extends our assortativity analysis to the entire dataset. In order to condensate the information of 38 scatter plots, we have computed the barycenter and sparseness of both the observed and expected clouds of points. In particular, we have calculated the arithmetic mean of both the observed values .
and , the expected values and and the corresponding confidence intervals (CI) at 95% level. As for the motifs, also and can be interpreted in macroeconomic terms. In fact, measures the countryspecific number of competitions, thus quantifying the (average) presence of a country on the global market. Further averaging over all countries provides a measure of the integration of worldcountries production.
What emerges is that the evolution of expected points closely follows the evolution of the observed ones, pointing out that the BiCM correctly describes the temporal trend of the assortativity measures. Notice that, even if observed points are systematically more concentrated on higher levels (as shown in panels 4a and 4c), the confidence intervals are still close enough to let us interpret the BiCM predictions as correct. Moreover, the constancy of the amplitude of the confidence intervals for both observed and expected ANPU values indicates that the corresponding clouds of points maintain the same sparseness across our 38 years dataset; on the other hand, the amplitude of the observed ANCD confidence intervals slightly reduces, indicating a shrinkage of the corresponding cloud of points (compare panels 3b and 3d).
The temporal trends of and show interesting differences. In fact, while keeps increasing across the whole dataset, does not (and from 1975 starts decreasing). Since the countries mean degree keeps rising as well ( and ), the increasing trend is probably due to the birth of new links, indicating that while existing countries have enlarged their production, newborn countries have started theirs. The results seem also to be compatible with the picture of several “appealing” products behaving as hubs and attracting links, including the ones of the newborn countries which in turn, having a low degree, reduce the value of the .
Since ranges in the interval [0,C], the effect due to the varying number of countries can be washed away by further dividing it by C, (and thus normalizing it to the interval [0,1]). Remarkably, our index G_{C} can be now interpreted a “genuine” measure of globalization, not affected by any spurious effect. Very interestingly, the temporal trend of G_{C} after 1970 becomes now almost flat. This means that the WTW evolution does not actually affect the value of countries integration which organize in such a way to maintain the same value of G_{C}, irrespectively of the rising number of countries, their higher diversification, etc. This seems to confirm the stationary evolution of such network, recently pointed out^{45}. A similar reasoning leads us to interpret as a measure of products homogeneity.
We have also calculated the Pearson correlation coefficient between the vectors and (panel 4b) and between the vectors and (panel 4d), in order to quantify the agreement on the “shape” of the clouds of points. The correlation of the latter is lower than the correlation of the former: this is due to the shape of the empirical cloud of ANCD which is less linear than the empirical ANPU, thus worsening the agreement with the corresponding expectations (which show an almost perfectly linear trend).
Complexity and fitness
Complexity and fitness can be obtained only numerically, as the result of the convergence of the algorithm proposed in^{8,9,46}. Panels 5a and 5b show the comparison between observed and expected complexity (plotted VS ubiquity) for the years 1963 and 2000; panels 5c and 5d show the comparison between observed and expected fitness (plotted VS diversification) for the same years. Our null model capture both trends with a larger accuracy than in the measure of assortativity: notice how the expected trend under the BiCM reproduces the “beak” of the observed complexity in real data and the vast majority of the observed cloud lies within the ±2 standard deviations region.
Similarly, the expected trend of reconstructed fitness captures the different growth regimes of the observed fitness in the WTW data, showing few sparse points outside the same error region (clearly visible in the loglog plots of Fig. 5). The regime with lower slope (left side of panels 5e and 5f) represents the socalled “poverty trap”^{8,9}, i.e. the area populated by the group of countries with lowest fitness: notice how all such countries lie within the ±2 standard deviation region (or immediately outside). Similar considerations hold for all the remaining years, indicating a constant performance of our method across our 38years dataset.
The average trends in Fig. 5 are computed differently from those in Fig. 3: while the latter represent the nodespecific, ensemble averages and , the former represent averages taken over ranked nodes, ordered according to their complexity  panels a and b  and fitness  panels c and d. Generally speaking, ordering nodes on the basis of such procedure will produce a different ranking for different bipartite networks of the ensemble. Moreover, the ranking operation guarantees neither that the identity of ranked nodes remains the same (e.g. two different countries can be ranked first for two different networks), nor that the corresponding complexity and fitness maintain their value across our sample (i.e. the nodes ranked first will, in general, have different values of F_{c} and Q_{p}): this in turn implies that each ranked node degree may change as well (i.e. the nodes ranked first for different networks will, in general, have different degrees). From these considerations, the need of quantifying 1) the variation of any country diversification as a function of its fitness and 2) the variation of any product ubiquity as a function of its complexity follows. This is in line with the spirit of the research in^{8,9}: trying to establish a biunivocal relation both between ubiquity and complexity and between fitness and diversification, in order to unambiguously rank countries and products. This kind of analysis represents a highly nontrivial test bench of our model which appear to perform very well.
Motifs
The motifs analysis has been carried on by calculating two different quantities. The first one has been defined as
and named similarity: it quantifies the goodness of our prediction, measuring the difference between the observed and expected abundances. Beside similarity, we have also considered the traditional zscores^{3,28,39}, defined as the ratio of the difference between the observed and expected abundances and the corresponding standard deviation
with and m indicating the particular motif considered. Even if zscores have been recognized to be dependent on the network size ^{47} (at least for monopartite networks), our dataset collects matrices with very similar volume (R ∈ [0.56, 0.61]): thus, we can imagine this effect to be very small.
Notice that similarity and zscores provide complementary information: in particular, the latter measures the statistical significance of the agreement found by the former, accounting for the role of higherorder correlations not included in our constraints. Moreover, their ratio s_{m}/z_{m} = σ_{m}/〈N_{m}〉 coincides with the motifspecific coefficient of variation, quantifying to what extent the average sums up the relevant information encoded into the corresponding ensemble distribution. Naturally, as for the observed abundances, both s_{m} and z_{m} can be defined for specific subsets of nodes as well.
Figure 6 shows the analysis of the Vn and Λn motifs. First, we have sampled the Vmotifs and Λmotifs abundance on the ensemble, in order to verify their distribution (see the Supplementary Information): both follow a gaussian very closely. Since all our motifs are sums of (neither independent nor identically distributed) random variables, this may be seen as a consequence of the generalized Central Limit Theorem. zscores can be thus attributed the correct probabilistic meaning of (gaussian) standardized variables^{39,47} and choosing a threshold z_{0} for z allows the identification of significantly deviating patterns. In what follows we will choose z_{0} = ±1.65 as threshold values for the aggregated Vn and Λn families and z_{0} = ±2 for the subsetsspecific corresponding ones (see the Supplementary Information for a justification of such values). Naturally, if the observations were exactly reproduced by our null model, the zscores would be zero^{49}.
The evolution of both similarity and zscores across the years in our database point out that the Λn family is better reproduced than the Vn family (showing a similarity and a zscore closer to zero  see panels 6a and 6b). In particular, Vn zscores lie outside the boundary of the significance region, showing values lower than −1.65. This indicates that for the binary, bipartite representation of the WTW, the degree sequence is far more effective in reproducing the products correlations than the correlations between countries. In other words, we correctly capture the countries tendency to expand their production, which seems to coexist with a certain superposition of the countries baskets of products (see Mmotifs in the Supplementary Information). However, the BiCM overestimates the resemblance of the different baskets: as zscores indicate, worldcountries tend to form less Vmotifs than expected under our null model (further confirmed by the trend of Xmotifs and Wmotifs  see the Supplementary Information). Summing up, world countries show a clear tendency to diversify their production, at the same time avoiding to directly compete on the same products.
The comparison between similarity and zscore clarifies the role of average in characterizing the ensemble distribution of Vn and Λn families: the ratio s_{m}/z_{m} ≤ 0.1 justifies our interest in their ensemble average alone.
However, zscores of Vn and Λn families result in almost flat trends which allow us to draw only general conclusions on the WTW as a whole. The reason lies in the “aggregated” character of such motifs, not distinguishing between different subsets of countries or products. To be more precise, let us consider the temporal evolution of our motifs on specific subsets of nodes (see panels 6c and 6d): in particular, the Asian Tigers (South Korea, Singapore, Taiwan, Hong Kong), the BRICS countries (Brazil, USSR/Russia, India, China, South Africa), the european countries belonging to G7 (France, Italy, Germany, United Kingdom) and a number of easterneuropean countries (Hungary, Romania, Bulgaria, Poland, USSR/Russia) and let us calculate the temporal evolution of V4 and V5 motifs restricted to them. The european countries show a zscore almost constantly equal to 4, indicating a significant affinity which is maintained over time. An even stronger internal affinity is shown by the Asian Tigers to which China should be added (in fact, its addition to the group rises the zscore). On the other hand, BRICS countries show a very limited affinity^{8,9,48}: their trend becomes more and more consistent with the null model, to become negative in the recent years. The last two examples point out the limitations of the traditional economic classification (usually distinguishing China from Asian Tigers and gathering BRICS together), not capturing any actual economic likeness.
Easterneuropean countries, on the other hand, show a strong correlation before 1989, gradually declining as this topical year approaches. Interestingly enough, after 1989 such correlation doesn’t disappear, remaining statistically significant (and stabilizing around ): this seems to indicate a significant connection still persisting, having Russia replaced USSR as “reference” country. An additional test is provided by the random choice of four countries (Ghana, China, Mozambique, Austria): although close to zero, their trend is constantly negative. In fact, being Ghana and Mozambique lowdiversification countries, they will be linked only to highubiquity products, common to all countries (see Fig. 3): thus, their basket will be far more limited than China’s and Austria’s, limiting in turn their possibility to compete. The constantly negative sign indicates, in this case, the impossibility to compete.
This kind of analysis can be repeated for Λn motifs as well, allowing us to gain a substantial insight into the products correlations. Panels 6e and 6f show some examples. While the food sector we have considered shows a constantly high value of z, indicating the common origin of the chosen dairy products, the pink trend signals a nontrivial positive correlation between the sectors represented by worked aluminium artifacts, tractors and fruit. A possible explanation may rest upon the consideration that tractors are constituted by parts in aluminium to be, in turn, used to transport the picked fruit. Consistently, the last group of products (cheese, rods and locomotives) is characterized by the value .
Notice that while for some groups of nodes the first moment encloses great part of the relevant information (s_{m}/z_{m} ≤0.5), for other groups higherorder moments could provide additional, useful information (s_{m}/z_{m} ≃ 1), e.g. the distribution asymmetry. Interestingly, these circumstance are mostly encountered for countries and products, respectively.
Assortativity coefficient and nestedness
As for the Vn and Λn motifs, the assortativity coefficient has a gaussian ensemble distribution (see the Supplementary Information). Both the observed value r and its zscore signal that we are globally overestimating the network assortativity: more exactly, since our expected coefficient 〈r〉 is still negative, we are predicting a less disassortative network than observed (see Fig. 7). This is a consequence of our randomization procedure, distributing links between nodes more homogeneously (recall that, consistently, our predicted and show less steeply decreasing trends than the observed ones  see Fig. 3).
In order to better understand the concept of nestedness, let us explicitly draw a matrix from the BiCMinduced grandcanonical ensemble, ranking its rows and columns according to the F_{c} and Q_{p} measures^{8,9}. The result is shown in Fig. 7. Notice that nestedness cannot be simply reduced to the concept of “triangularity” of a matrix. In fact, even if the drawn matrix shows a more curved boundary than the observed one, both the nestedness ensemble distribution (see the Supplementary Information) and its zscore (Fig. 7) signal that our method reproduces it correctly.
We have also measured the nestedness along rows and the nestedness along columns separately (according to the definitions in^{41}. While the latter is reproduced and closely follows the trend of the global one, the former is, for a few years, significantly underestimated. This is nontrivially related to the way our null model redistributes Vmotifs and Λmotifs. However, as the bottom panel in Fig. 8 suggests, a role seems to be played by the asymmetry of our bipartite matrix as well: in other words, the higher cardinality of the products layer seems to induce a preferential filling of the rows, making them more homogenenous and lowering their expected nestedness.
It should be also noted that the ensemble coefficient of variation for both r and NODF show such a small value (s_{m}/z_{m} ≃ 10^{−2} for both, across our temporal dataset) that the ensemble average can be considered as the only moment carrying relevant information.
Discussion
In this paper we have both proposed a method to randomize binary, undirected, bipartite networks, by constraining essential network features as the total number of links and the nodes connectivity and tested it on a real system as the World Trade Web. While, on the one hand, specifying the degree sequence allows highly nontrivial properties like countries fitness, products complexity and the matrix nestedness to become reproduced across our whole dataset, on the other quantities like assortativity and motifs still elude a satisfactorily explanation.
This is even more surprising, when considering the high level of accuracy achieved by the CM predictions in the analysis of the monopartite representation of the WTW. Our findings suggest that analysing different representations of the same network can indeed convey additional information, as proved by the agreement between the observed assortativity and the expected one (see Fig. 3), lower than in the corresponding monopartite WTW^{26}. In words, the correlations between countries induced by their productivity relations, clearly displayed by the bipartite representation of the WTW, are only partially explained by the degree sequence, calling for a higher amount of information to achieve the same level of accuracy obtained for the monopartite representation (and analogously for products). Otherwise stated, representing the same system via different network models (even belonging to the same class of binary, undirected configurations) may strongly affect the effectiveness of the corresponding piece of information (as the nodes connectivity) in reproducing the observed structure.
Assortativity provides again the clearest example: as previously pointed out, the bipartite Configuration Model predicts trends quite similar to those expected under the bipartite Random Graph. To better quantify this difference, we have calculated the Shannon entropy (normalized to the total number of nodes pairs, i.e. the network volume) of the probability distributions induced by the BiRG and the BiCM:
where for the BiCM and for the BiRG (see Supplementary Information). Results are shown in the bottom panel of Fig. 9. As evident from the trends, while specifying the total number of links strongly reduces the uncertainty (as signalled by the low value of the connectance, reducing the ensemble entropy to half its maximum value), further specifying the degree sequence produces a less relevant effect one could expect on the basis of the well known, monopartite results^{26}. Comparing the analyses of degree correlations for the bipartite and the projected WTW (on both countries and products layers  top and middle panels of Fig. 9 for the year 2000), what emerges is quite impressive: while the CM prediction correctly overlaps to the observed trend, the RG predicts a flat trend completely missing the observed cloud of points (in line with the results already obtained for the monopartite representation^{26}. In terms of Shannon entropy, when passing from the RG to the CM the reduction of uncertainty on the observed, projected WTW amounts to 41%; for the bipartite WTW, this percentage reduces to only 16% (see Fig. 9). This findings clearly indicate a future extension of our work: constraining those quantities having a significant impact on nodes correlations, as Vmotifs, Λmotifs or nestedness, in order to define a more effective null model.
However, as the analysis of motifs reveals, the BiCM provides the right benchmark to highlight meaningful correlations between countries and products, representing a purely topological alternative to the traditional economic classification, whose limitations have been already pointed out^{8,9,36}. Remarkably, this kind of analysis can be repeated for different years, in order to monitor our system over time and detect significant temporal trends of the world economies coevolution.
We stress that our approach is grandcanonical and possible extensions of the method move in the same direction. The paper in^{31}, on the other hand, implements the microcanonical version of a monolayer regular random graph: as for monopartite networks, comparing the performance of the two available approaches represents a challenging, future research direction.
Future work moves towards the direction of extending the present framework to directed, as well as weighted, networks models, to test the robustness of our findings also for configurations beyond the binary, undirected ones.
Additional Information
How to cite this article: Saracco, F. et al. Randomizing bipartite networks: the case of the World Trade Web. Sci. Rep. 5, 10595; doi: 10.1038/srep10595 (2015).
References
Albert R. & Barabasi. A.L. Statistical Mechanics of Complex Networks. Rev. Mod. Phys. 74, 47–96 (2002).
Newman M. E. J. The structure and function of complex networks. SIAM Rev. 45, 167 (2003).
Caldarelli G. in Scalefree Networks. Complex Webs in Nature and Technology (Oxford University Press 2007).
Boccaletti S., Latora V., Moreno Y., Chavez M. & Hwang D.U. Complex Networks: Structure and Dynamics. Phys. Rep. 424, 175–308 (2006).
Guillaume J.L. & Latapy M. Bipartite structure of all complex networks. Inform. Proc. Lett. 90, 215–221 (2004).
Dormann C. F., Fründ J., Blüthgen N., Gruber B. Indices, Graphs and Null Models: Analyzing Bipartite Ecological Networks. Open Ecol. J. 2, 7–24 (2009).
Hidalgo C. & Hausmann R. The building blocks of economic complexity Proc. Nat. Acad. Sci. 26, 10570–10575 (2009).
Tacchella A., Cristelli M., Caldarelli G., Gabrielli A. & Pietronero L. A New Metrics for Countries’ Fitness and Products’ Complexity. Sci. Rep. 2, 723, doi:10.1038/srep00723 (2012) (Date of access: 13/03/2015).
Tacchella A., Cristelli M., Caldarelli G., Gabrielli A. & Pietronero L. Measuring the Intangibles: A Metrics for the Economic Complexity of Countries and Products. PLoS ONE 8, doi:10.1371/journal.pone.0070726 (2013) (Date of access: 13/03/2015).
Cimini G., Gabrielli A. & Sylos Labini F. The Scientific Competitiveness of Nations. PLoS ONE 9, doi: 10.1371/journal.pone.0113470 (2014) (Date of access: 13/03/2015).
Peltomäki M. & Alava M. Correlations in Bipartite Collaboration Networks. J. Stat. Mech. 2006, P01010 (2006).
Chung F. & Lu L. Connected Components in Random Graphs with Given Expected Degree Sequences. Ann. Comb. 6, 125–145 (2002).
Caldarelli G., Capocci A., De Los Rios P. & Muñoz M. ScaleFree Networks from Varying Vertex Intrinsic Fitness. Phys. Rev. Lett. 89, 258702 (2002).
Park J. & Newman M. E. J. The statistical mechanics of networks. Phys. Rev. E 70, 066117 (2004).
Serrano M. A., Boguna M. & PastorSatorras R. Correlations in weighted networks. Phys. Rev. E 74, 055101(R) (2006).
Garlaschelli D. & Loffredo M. I. Maximum likelihood: extracting unbiased information from complex networks. Phys. Rev. E 78, 015101(R) (2008).
Bianconi G. The entropy of network ensembles. Phys. Rev. E 79, 036114 (2009).
Fronczak A. in Encyclopedia of Social Network Analysis and Mining (SpringerVerlag 2014).
Squartini T. & Garlaschelli D. Analytical maximumlikelihood method to detect patterns in real networks, New. J. Phys. 13, 083001 (2011).
Mastrandrea R., Squartini T., Fagiolo G. & Garlaschelli D. Enhanced reconstruction of weighted networks from strengths and degrees. New J. Phys. 16, 043022 (2014).
Squartini T., Mastrandrea R. & Garlaschelli D. Unbiased sampling of network ensembles. New J. Phys. 17, 023052 (2015).
Dormann C. F., Gruber B. & Fründ J. Introducing the bipartite Package: Analysing Ecological Networks. R News 8, 8–11 (2008).
Strona G., Nappo D., Boccacci F., Fattorini S. & SanMiguelAyanz J. A fast and unbiased procedure to randomize ecological binary matrices with fixed row and column totals. Nat. Comm. 5, doi:10.1038/ncomms5114 (2014) (Date of access: 13/03/2015).
Kitsak M. & Krioukov D. Hidden Variables in Bipartite Networks. Phys. Rev. E 82, 026114 (2011).
Dormann C. F., Strauss R. A method for detectingmodules in quantitative bipartite networks. Methods Ecol. Evol. 5, 90–98 (2014).
Squartini T., Fagiolo G. & Garlaschelli D. Randomizing world trade. I. A binary network analysis. Phys. Rev. E 84, 046117 (2011).
Squartini T., Fagiolo G. & Garlaschelli D. Randomizing world trade. II. A weighted network analysis. Phys. Rev. E 84, 046118 (2011).
Squartini T. & Garlaschelli D. Triadic Motifs and Dyadic SelfOrganization in the World Trade Network. Lec. Notes Comp. Sci. 7166, 24–35 (2012).
Musmeci N., Battiston S., Puliga M. & Gabrielli A. Bootstrapping topology and systemic risk of complex network using the fitness model. J. Stat. Phys. 151, 720–734 (2013).
Caldarelli G., Chessa A., Pammolli F., Gabrielli A. & Puliga M. Reconstructing a credit network. Nat. Phys. 9, 125–126, doi:10.1038/nphys2580 (2013) (Date of access: 13/03/2015).
Tumminello M., Micciché S., Lillo F., Piilo J. & Mantegna R.N. Statistically Validated Networks in Bipartite Complex Systems. PLoS ONE 6, doi:10.1371/journal.pone.0017994 (2011) (Date of access: 13/03/2015).
Fronczak A. & Fronczak P. Statistical mechanics of the international trade network. Phys. Rev. E 85, 056113 (2012).
Serrano M. A. & Boguna M. Topology of the world trade web. Phys. Rev. E 68, 015101(R) (2003).
Fagiolo G., Reyes J. & Schiavo S. The evolution of the world trade web: a weightednetwork analysis. J. Evol. Econ. 20, 479–514 (2010).
Barigozzi M., Fagiolo G. & Garlaschelli D. Multinetwork of international trade: A commodityspecific analysis. Phys. Rev. E 81, 046104 (2010).
Mastrandrea R., Squartini T., Fagiolo G. & Garlaschelli D. Reconstructing the world trade multiplex: the role of intensive and extensive biases. Phys. Rev. E 90, 062804 (2014).
National Bureau of Economic Research dataset: http://www.nber.org/data/ (Date of access: 13/03/2015).
Feenstra R. C., Lipsey R. E., Deng H., Ma A. C. & Mo H. World Trade Flows: 19622000. National Bureau of Economic Research working paper 11040, doi:10.3386/w11040 (2005) (Date of access: 13/03/2015).
Milo R., ShenOrr S., Itzkovitz S., Kashtan N., Chklovskii D. & Alon U. Network motifs: simple building blocks of complex networks. Science 298, 824–827 (2002).
Newman M. E. J. Assortative mixing in networks. Phys. Rev. Lett. 89, 208701 (2002).
AlmeidaNeto M., Guimaraes P., Guimaraes P. R. Jr., Loyola R. D. & Ulrich W. A consistent metric for nestedness analysis in ecological systems: reconciling concept and measurement. Oikos 8, 1227–1239 (2008).
Bastolla U., Fortuna M. A., PascualGarcia A., Ferrera A., Luque B. & Bascompte J. The architecture of mutualistic networks minimizes competition and increases biodiversity. Nature 7241, 1018–1020 (2009).
Staniczenko P. P. A., Kopp J. & Allesina S. The ghost of nestedness in ecological networks. Nat. Comm. 4, doi: 10.1038/ncomms2422 (2013) (Date of access: 13/03/2015).
Jonhson S., DominguezGarcia V. & Munoz M. A. Factors Determining Nestedness in Complex Networks. PLoS ONE 8, doi:10.1371/journal.pone.0074025 (2013) (Date of access: 13/03/2015).
Squartini T. & Garlaschelli D. Stationarity, nonstationarity and early warning signals in economic networks. Journal of Complex Networks, doi: 10.1093/comnet/cnu012 (2014) (Date of access: 13/03/2015).
Pugliese E., Zaccaria A. & Pietronero L. On the convergence of the FitnessComplexity Algorithm. arXiv:1410.0249 (2014) (Date of access: 13/03/2015).
Milo R., Itzkovitz S., Kashtan N., Levitt R., ShenOrr S., Ayzenshtat I., Sheffer M. & Alon U. Superfamilies of evolved and designed networks. Science 303, 1538–1542 (2004).
Caldarelli G., Cristelli M., Gabrielli A., Pietronero L., Scala A. & Tacchella A. A Network Analysis of Countries’ Export Flows: Firm Grounds for the Building Blocks of the Economy. PLoS ONE 7, doi:10.1371/journal.pone.0047278 (2012) (Date of access: 13/03/2015).
Freund J. E. & Perles B. M. in Modern Elementary Statistics (PrenticeHall College Div 1996).
Acknowledgements
This work was supported by the EU project GROWTHCOM (611272) and the Italian PNR project CRISISLab. The authors thank Giulio Cimini, Matthieu Cristelli and Andrea Tacchella for useful discussions.
Author information
Authors and Affiliations
Contributions
F.S. and R.D.C. analysed the data and prepared all figures. A.G. wrote the article. T.S. planned the research and wrote the article. All authors reviewed the manuscript.
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Electronic supplementary material
Rights and permissions
This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/
About this article
Cite this article
Saracco, F., Di Clemente, R., Gabrielli, A. et al. Randomizing bipartite networks: the case of the World Trade Web. Sci Rep 5, 10595 (2015). https://doi.org/10.1038/srep10595
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/srep10595
This article is cited by

Network topology mapping of chemical compounds space
Scientific Reports (2024)

Alice and the Caterpillar: A more descriptive null model for assessing data mining results
Knowledge and Information Systems (2024)

Capability accumulation patterns across economic, innovation, and knowledgeproduction activities
Scientific Reports (2023)

Inflammatory bowel disease biomarkers revealed by the human gut microbiome network
Scientific Reports (2023)

Quantifying the complexity and similarity of chess openings using online chess community data
Scientific Reports (2023)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.