Randomizing bipartite networks: the case of the World Trade Web

Within the last fifteen years, network theory has been successfully applied both to natural sciences and to socioeconomic disciplines. In particular, bipartite networks have been recognized to provide a particularly insightful representation of many systems, ranging from mutualistic networks in ecology to trade networks in economy, whence the need of a pattern detection-oriented analysis in order to identify statistically-significant structural properties. Such an analysis rests upon the definition of suitable null models, i.e. upon the choice of the portion of network structure to be preserved while randomizing everything else. However, quite surprisingly, little work has been done so far to define null models for real bipartite networks. The aim of the present work is to fill this gap, extending a recently-proposed method to randomize monopartite networks to bipartite networks. While the proposed formalism is perfectly general, we apply our method to the binary, undirected, bipartite representation of the World Trade Web, comparing the observed values of a number of structural quantities of interest with the expected ones, calculated via our randomization procedure. Interestingly, the behavior of the World Trade Web in this new representation is strongly different from the monopartite analogue, showing highly non-trivial patterns of self-organization.

In the last fifteen years network science has exploded, revealing a world composed by interconnected systems ubiquitously found both in natural sciences and in socioeconomic disciplines [1][2][3] . Since the very beginning of network science, many different network representations have been adopted in order to study the particular system at hand 4 . However, the class of networks represented by bipartite networks has been recognized to provide a particularly insightful representation of many different systems 5 : ecological networks 6 , trade networks [7][8][9] , citations and collaboration networks 10,11 represent only few examples.
One could thus expect a relevant amount of work aimed at identifying the statistically-relevant patterns observed in real bipartite networks, at least comparable to the mass of results obtained so far for monopartite networks [12][13][14][15][16][17][18][19][20][21] : however, quite surprisingly, little work has been done so far to implement null models on real bipartite networks. Generally speaking, null models are statistical models used to make inference on a real system on the basis of partial information. The latter usually corresponds to some observable property of interest as the number of trade partners of a country, its exports and imports, the total exposure of a bank, etc. In particular, null models for bipartite networks being real-data rooted and showing the desirable features of general applicability and analytical character are currently missing. More in detail, the algorithms proposed so far show several limitations, ranging from being purely numerical (thus lacking the analytical character) 6,22,23 , to assuming an a priori functional form either for the distribution of the quantities of interest 6 or for the model parameters (thus not being real data-rooted) 24 or, lastly, using approximate analytical models 25 . Moreover, almost all the aforementioned approaches are tailored on ecological networks, thus lacking the character of general applicability.
The lack of such models is, maybe, also due to the misconception that bipartite networks could be analysed by, firstly, projecting them on one of the layers and, secondly, analysing the projection with one of the models currently available for monopartite networks. As we will show in what follows, the monopartite and the bipartite representations enclose different kinds of information, irreducible to each other (in the most general case).
The aim of the present paper is to fill this gap, proposing a theoretical framework guaranteeing the three aforementioned properties. In order to do this, we extend a recently-proposed method to randomize monopartite networks 19 to bipartite networks. The method rests upon the sequential maximizations of Shannon entropy and the network likelihood function, a combination which has been proven to be rather effective both for detecting patterns and to reconstruct the structure of several real-world networks 20,[26][27][28][29][30] . To the best of our knowledge, the only other paper proposing a method satisfying the three requirements above is 31 : we will comment on the differences with the one proposed here in the Discussion section.
While the proposed formalism is perfectly general, in this paper we apply our method to the binary, undirected, bipartite representation of the World Trade Web (hereafter WTW). We focused on this particular system precisely because of its popularity among network scientists, who have applied null models to all its possible representations 26,27,[32][33][34][35] , with the exception of the bipartite one. As we will show in what follows, representing the WTW as a bipartite network allows to gain a substantially new insight into an already deeply explored system.
The rest of the paper is organized as follows: Data section is devoted to the description of the dataset used for the present analysis, Methods section reports the detailed description of our method and Results section illustrates the results which are discussed in Discussion section, where conclusions are also drawn.

Data
The WTW can be represented in many different ways, depending on the level of information that we want to process. The most popular ones represent it via an adjacency matrix with nodes playing the role of world-countries and links indicating the presence of (any kind) of trade exchange between them. This framework has been recently extended to analyse the WTW as a multiplex, where trade exchanges corresponding to different commodities are distinguished 35,36 .
Here we represent the WTW as a bipartite network, i.e. by considering the set of world-countries and the set of products as different entities and linking a given country to a given product if (and only if) the former exports the latter above a certain threshold (the so-called RCA 8,9 ). Applying the latter rises the probability that the exported commodity is actually produced by the exporting country. In this representation, any two countries (as well as any two products) cannot be directly linked (i.e. links connecting nodes of the same set are not allowed): thus, any two nodes of the same set can be still thought as "interacting" but only indirectly, via a connection with the same nodes of the other family. This way of representing the WTW allows us to analyze the global economy from a different perspective, by making the productivity relations between countries explicit (i.e. which country produces which product).
The dataset we have considered for the present analysis is the NBER database, collecting data for the 38 years 1963-2000 37 and categorizing products according to the SITC revision 2 at four-digits level. Data have been further processed, building upon the data-mining procedure adopted in 38 , to produce a dataset with 538 products across all years and a number of countries varying from 130 to 151.

Methods
The distinction between countries and products leads naturally to the definition of a biadjacency matrix, which will be indicated with M. In the present paper we focus on the binary, undirected representation of the WTW: thus, the matrix entries will be either m cp = 1, indicating that country c exports an amount of product p above the RCA threshold, or m cp = 0, indicating that the production of p by country c is below the RCA threshold and, thus, has been ignored. As a consequence, each row represents the export basket of a given country, while each column represents the subset of producers of a given product. A pictorial representation of the WTW biadjacency matrix in the year 2000 is shown in Fig. 1, with the blue dotsrepresenting the ones and the white dots the zeros.
If we indicate with C the total number of countries and with P the total number of products, the total number of elements of the biadjacency matrix (i.e. its volume) is C ⋅ P , also representing the maximum observable number of connections. In fact, unlike the usual square representation, the problems arising from the presence of self-connections are not encountered here. Moreover, the presence of two different subsets (also known as layers) induces a measure of "rectangularity" of our matrix M 6 , i.e. R C P C P = − + , ranging in R ∈ [0,1), with values closer to 1 indicating a large asymmetry between the number of countries and the number of products and values closer to 0 indicating equivalence between the two layers cardinality (notice that the information on the sign would be based on the arbitrary choice of the layers ordering).
The definitions of other topological quantities of interest easily follow from the usual ones, as the number of links (i.e. the total number of connections)  . In order to make the connections between nodes of the same family explicit, a bipartite network can be projected on its layers, thus recovering two traditional, monopartite representations. This operation can be straightforwardly implemented by considering the matrix products.
where M T is the transpose of the biadjacency matrix M. While the dimensions of M are C × P, the dimensions of its transpose are P × C. This implies that  results in a C × C matrix whose generic element  cc′ , with c ≠ c′ , counts the number of patterns of length two between countries c and c′ . The generic, diagonal element  cc is precisely the degree of country c. Similarly, Р results in a P × P matrix whose generic element Р pp′ , with p ≠ p′ , counts the number of patterns of length two between products P and p′ . As before, the generic, diagonal element is the degree of product P. Remarkably, the entries of matrices  and Р have a clear macroeconomic interpreation: while  cc′ counts the number of products shared by countries c and c′ , Р pp′ counts the number of countries exporting both products P and p′ . Since nodes of the same layer cannot be directly linked, it is enough that a path of length two (i.e. the minimum allowed length) connects any two nodes of the same family to directly link them in the corresponding monopartite projection. where I C and I P are the identity matrices having dimensions C × C and P × P respectively.
Topological measures for binary, undirected, bipartite networks. Several quantities have already been proposed to analyse bipartite networks 6 . However, here we define different measures by extending some of the most used indicators in network theory, better capturing, in our opinion, the particular features of a given bipartite network′ s structure.
a. Assortativity. The traditional definition of assortativity is intended to quantify the degrees correlations, by distinguishing the assortative behavior (signalling positive degrees correlations) from the disassortative behavior (signalling negative degrees correlations). When dealing with bipartite networks, we can measure such correlations both with respect to countries and with respect to products, by respectively defining the average nearest products ubiquity (or ANPU) b. Complexity and fitness. As recently pointed out 8,9 , countries and products can be assigned two purely network-based quantities, known as fitness, F c (to be assigned to countries), and complexity, Q p (to be assigned to products), playing the role of non-monetary indicators of the economy development and providing a highly non-trivial way to rank the world-countries economic health (see also the Supplementary Information).
c. Motifs. The usual clustering coefficient, measuring the hierarchical structure of a monopartite network, cannot be defined for bipartite networks: in fact, since no odd cycles of any length can be observed in bipartite networks (precisely because links within the same layer are forbidden) triangles cannot be observed as well; similarly, the usual triangular motifs cannot be defined 3,39 . However, higher-order correlations between nodes can still be captured by defining a completely new class of motifs. The first examples we provide are the V-motifs and the Λ -motifs (see Fig. 2). The former count how many couples of countries export the same products, quantifying the productivities′ similarity; the latter count how many couples of products are in the basket of the same producer, providing a measure of products correlation. Remembering that  cc′ , with c ≠ c′ , counts the number of products exported by both c and c′ , the total number of V-motifs connecting any pair of countries is and, remembering the analogous role of , the total number of Λ -motifs connecting any pair of products is.
The last passages follow from noticing that each V-motif (Λ -motif) is constituted by a pair of links having the same product (country) as a common vertex. The number of countries competing on the same product, as well as the number of products in the same basket, can be further risen, leading to the following generalizations (with V2 ≡ V and Λ 2 ≡ Λ ): Figure 2 shows an example of V3-motifs and Λ 3-motifs. From defintions (12) it follows that V1 = Λ 1 = L.
Higher-order correlations can be captured by allowing for a higher number of connected nodes in the same layers (see X-motifs, M-motifs and W-motifs in the Supplementary Information). Remarkably, all the defined kinds of motifs: • can be compactly expressed in terms of products of biadjacency matrix entries; • can be defined for specific subsets of countries and products, thus allowing for a finer analysis of the production dynamics. For example, a measure of correlation of countries a and b production is given by the motif N C m m ; • may have an application also in the analysis of ecological networks, especially mutualistic networks (e.g. impollinators-flowers): in fact, measures of co-occurrence can be directly applied to ecosystems to quantify the species' competitiveness for the available resources.
In what follows we will focus on the Vn and Λ n families (a more detailed discussion about all motifs is provided in the Supplementary Information).
d. Assortativity coefficient. Beside our definitons, we have also considered the assortativity measure proposed in 40 and called r. The latter ranges in the domain r ∈ [− 1,1], with r = 1 indicating the tendency of links to connect nodes with similar degrees and r = − 1 indicating the tendency of links to connect nodes with different degrees. e. Nestedness. On the basis of the two aforementioned measures F c and Q p , one can reorder the matrix rows and columns (i.e. countries and products) by, respectively, decreasing the fitness along rows (from top to bottom) and increasing the complexity along columns (from left to right), thus obtaining the triangular structure shown in Fig. 7. In order to quantify the shape of such a matrix, several measures have been recently proposed [41][42][43][44] , under the common name of nestedness. Here we adopt the one proposed in 41 (called NODF -see also the Supplementary Information). Notice that the measure of nestedness adopted here doesn't depend on the rows and columns ordering criterion (in what follows we will adopt the one based on F c and Q p measures) 8,9 . Randomizing bipartite networks. In order to implement suitable null models to detect the statistically-relevant patterns of real bipartite networks, the lines of the method proposed in 19 can be followed. In particular, an ensemble  of binary, undirected, bipartite networks must be considered, in order to maximize Shannon entropy under a given set of constraints C M → ( ) 16,19 . Notice that the probability coefficient P(M) is assigned to every adjacency matrices in the esemble and the constraints are defined in terms of the entries of M. The result is the well-known exponential distribution: being the normalization.
In the monopartite case, one of the most insightful null models has been proven to be the so-called Configuration Model (CM) 12,14 . Let us now implement the bipartite extension of the CM (BiCM, in what follows), by constraining the degree sequence of the binary, undirected, bipartite WTW and analyzing the system beyond the information contained into it. Since now we have two different layers of nodes, the hamiltonian reads.
Now we can calculate the probability coefficient (14), associating a probability to each network in the ensemble on the basis of the specific degree sequences Our null model provides the analytical expression of a network probability as a product over all the accessible C × P pairs of nodes. In other words, the BiCM interprets the links as independent random variables, thus defining a grandcanonical probability measure where links correlations are discarded. Notice also that no probability coefficients controlling for the presence of links between nodes in the same layer appear in the expression (16). This is a consequence of having considered an ensemble of bipartite networks as the support of our probability distribution: in so doing, the forbidden intra-layer links are automatically excluded by the choice of the allowable configurations volume.
The probability distribution in (16) depends on C + P unknown parameters (i.e. the Lagrange multipliers), also called hidden variables 13,24 . The recipe provided by statistical mechanics to estimate the hidden variables is summed up by the equations However, no indication about the numerical value to be assigned to the ensemble average of constraints is provided. Thus, in order to estimate the hidden variables from data, let us first note that P M θ ( → ) can be rewritten solely in terms of the observed constraints value, i.e. P x Then, let us consider the log-likelihood function The recipe provided by statistics to estimate the unknown parameters of a given probability distribution prescribes to maximize  19 . This means solving the system  x y 19 : In what follows the vector of solutions satisfying the system (19), for given d M → ( ) and u M → ( ) as degree mean values, will be indicated as x y ( → , → ) ⁎ ⁎ . Notice that the coefficients appearing at the second member of the system equations have the same functional form both for countries and products. This is a consequence of assigning only one Lagrange multiplier to each node but in such a way to distinguish the nodes in the first layer from the nodes in the second layer.
Expected topological measures for binary, undirected, bipartite networks. In the previous subsections several quantities of interest to be measured on binary, undirected, bipartite networks have been listed. In this subsection we will show how our method can be implemented to calculate their expected value (to be compared with the observed one) and the relative errors (to quantify the discrepancies) in order to assess up to what level our null model is able to explain the higher-order structure of the network.
Our method allows us to proceed in a two-fold way. The first one is analytical. Using the link-specific probability coefficients p cp and the passages sketched in 19 , we are able to analytically calculate both the expected value and the standard deviation of the (analytically-definable) quantities of the previous subsections. However, because of the impossibility to perform analytical evaluation of the average for some key quantities, we have adopted a different strategy: we have sampled the grancanonical ensemble of binary, undirected, bipartite networks induced by the BiCM according to the probability coefficients ( ) =   (N m being the number of networks in the ensemble having biadjacency matrix equal to m). Since our method is unbiased 19,21 ], numerically sampling  provides a faithful representation of the whole ensemble. We have also calculated the probability distribution (induced by P M ( )  ) of some of the properties of interest, in order to quantify the statistical significance of their observed value (via the z-score, for example).
Nevertheless, the analytical expressions of the expected value and standard deviation of the quantities explicitly defined in the previous subsections has been derived in the Supplementary Information.

Results
Let us first show our results on the temporal snapshot of the WTW corresponding to the year 2000. The number of nodes is C 2000 = 151 and P 2000 = 538, causing the R index to be R0 0 56 .  (see section Methods). The high asymmetry of our network is also pointed out by the different mean degrees, d 70  and u 20  , indicating that countries are, on average, almost three times more connected than products. However, the connectance is c 0 13 2000 .


: thus, our bipartite WTW is much sparser than its monopartite counterpart 26 . Notice that our null model, constraining (on average) the degree sequence, exactly reproduces any network's connectance by definition, spanning the domain of applicability of both the sparse and the dense network reconstruction algorithms. Figure 3 shows the comparison between observed and expected values of our coefficients of assortativity. Having plotted u c nn VS d c and d p nn VS u p , we firstly observe that the bipartite WTW shows a disassortative behavior, signalled by a globally decreasing trend of our measures. More detailedly, two distinct behaviors seem to characterize u c nn as a function of d c : while countries with low diversification are preferentially linked to products with high ubiquity (left side of panels 3a and 3b), countries with high diversification are linked to almost all products (right side of panels 3a and 3b). This is also reflected in the triangular structure of the matrix (see Fig. 1). For products, this distinction is less sharp (panels 3c and 3d): in fact, while high-ubiquity products are linked to almost all countries, low-ubiquity products can be found connected to both high-and low-diversification countries.

Assortativity
As can be seen from Fig. 3, the BiCM captures the disassortative behavior of both u c nn and d p nn ; however, only part of the observed points lies within the ± 2 standard deviations region. This means that the mechanism shaping the disassortative behavior of the WTW is not completely explained by our null model, signalling a non-trivial origin of the WTW degree correlations. What is strikingly surprising is the prediction based on the Random Graph model (BiRG): the corresponding trend is closer to the BiCM prediction than in the monopartite representation of the WTW 26 . Moreover, since disassortativity is more pronounced in real data, our results indicate that the BiCM performs better than BiRG for small values of d c and u p , while the BiRG correctly capture their flat behavior at large d c and u p (i.e. for competitive countries and ubiquitous products, for which d L C p nn ). This seems to indicate that the explanatory power of the degree sequence is far more limited in the bipartite representation than in the monopartite one and that additional information is required to improve the agreement between observations and predictions (even at the simplest level of binary, undirected networks). What emerges is that the evolution of expected points closely follows the evolution of the observed ones, pointing out that the BiCM correctly describes the temporal trend of the assortativity measures. Notice that, even if observed points are systematically more concentrated on higher levels (as shown in panels 4a and 4c), the confidence intervals are still close enough to let us interpret the BiCM predictions Scientific RepoRts | 5:10595 | DOi: 10.1038/srep10595 as correct. Moreover, the constancy of the amplitude of the confidence intervals for both observed and expected ANPU values indicates that the corresponding clouds of points maintain the same sparseness across our 38 years dataset; on the other hand, the amplitude of the observed ANCD confidence intervals slightly reduces, indicating a shrinkage of the corresponding cloud of points (compare panels 3b and 3d).
The temporal trends of u nn and d nn show interesting differences. In fact, while u nn keeps increasing across the whole dataset, d nn does not (and from 1975 starts decreasing). Since the countries mean degree keeps rising as well (d 48 1963  and d 70 2000  ), the increasing trend is probably due to the birth of new links, indicating that while existing countries have enlarged their production, new-born countries have started theirs. The results seem also to be compatible with the picture of several "appealing" products behaving as hubs and attracting links, including the ones of the new-born countries which in turn, having a low degree, reduce the value of the d p nn .
Since u nn ranges in the interval [0,C], the effect due to the varying number of countries can be washed away by further dividing it by C, G u C C nn = / (and thus normalizing it to the interval [0,1]). Remarkably, our index G C can be now interpreted a "genuine" measure of globalization, not affected by any spurious effect. Very interestingly, the temporal trend of G C after 1970 becomes now almost flat. This means that the WTW evolution does not actually affect the value of countries integration which organize in such a way to maintain the same value of G C , irrespectively of the rising number of countries, their higher diversification, etc. This seems to confirm the stationary evolution of such network, recently pointed out 45   ment on the "shape" of the clouds of points. The correlation of the latter is lower than the correlation of the former: this is due to the shape of the empirical cloud of ANCD which is less linear than the empirical ANPU, thus worsening the agreement with the corresponding expectations (which show an almost perfectly linear trend).

Complexity and fitness
Complexity and fitness can be obtained only numerically, as the result of the convergence of the algorithm proposed in 8,9,46 . Panels 5a and 5b show the comparison between observed and expected complexity (plotted VS ubiquity) for the years 1963 and 2000; panels 5c and 5d show the comparison between observed and expected fitness (plotted VS diversification) for the same years. Our null model capture both trends with a larger accuracy than in the measure of assortativity: notice how the expected trend under the BiCM reproduces the "beak" of the observed complexity in real data and the vast majority of the observed cloud lies within the ± 2 standard deviations region.
Similarly, the expected trend of reconstructed fitness captures the different growth regimes of the observed fitness in the WTW data, showing few sparse points outside the same error region (clearly visible in the log-log plots of Fig. 5). The regime with lower slope (left side of panels 5e and 5f) represents the so-called "poverty trap" 8,9 , i.e. the area populated by the group of countries with lowest fitness: notice how all such countries lie within the ± 2 standard deviation region (or immediately outside). Similar considerations hold for all the remaining years, indicating a constant performance of our method across our 38-years dataset.
The average trends in Fig. 5 are computed differently from those in Fig. 3 , the former represent averages taken over ranked nodes, ordered according to their complexity -panels a and b -and fitness -panels c and d. Generally speaking, ordering nodes on the basis of such procedure will produce a different ranking for different bipartite networks of the ensemble. Moreover, the ranking operation guarantees neither that the identity of ranked nodes remains the same (e.g. two different countries can be ranked first for two different networks), nor that the corresponding complexity and fitness maintain their value across our sample (i.e. the nodes ranked first will, in general, have different values of F c and Q p ): this in turn implies that each ranked node degree may change as well (i.e. the nodes ranked first for different networks will, in general, have different degrees). From these considerations, the need of quantifying 1) the variation of any country diversification as a function of its fitness and 2) the variation of any product ubiquity as a function of its complexity follows. This is in line with the spirit of the research in 8,9 : trying to establish a biunivocal relation both between ubiquity and complexity and between fitness and diversification, in order to unambiguously rank countries and products. This kind of analysis represents a highly non-trivial test bench of our model which appear to perform very well.

Motifs.
The motifs analysis has been carried on by calculating two different quantities. The first one has been defined as Notice that similarity and z-scores provide complementary information: in particular, the latter measures the statistical significance of the agreement found by the former, accounting for the role of higher-order correlations not included in our constraints. Moreover, their ratio s m /z m = σ m /〈 N m 〉 coincides with the motif-specific coefficient of variation, quantifying to what extent the average sums up the relevant information encoded into the corresponding ensemble distribution. Naturally, as for the observed abundances, both s m and z m can be defined for specific subsets of nodes as well. Figure 6 shows the analysis of the Vn and Λ n motifs. First, we have sampled the V-motifs and Λ -motifs abundance on the ensemble, in order to verify their distribution (see the Supplementary Information): both follow a gaussian very closely. Since all our motifs are sums of (neither independent nor identically distributed) random variables, this may be seen as a consequence of the generalized Central Limit  Q p (a, b) and d c VS F c (c, d). Observed points are in blue; the black solid curves are BiCM-induced ensemble averages; the gray dashed curves indicate the ± 1 standard deviation region; the gray dash-dotted curves indicate the ± 2 standard deviations region. Colored areas represent the ensemble density of expected points (sampling 5000 matrices). Our null model seems to satisfactorily capture both trends. Panels (e, f) show the so-called "poverty trap", i.e. the group of countries with lowest fitness 8,9 . Notice how all such countries lie within the ± 2 standard deviation region (or immediately outside). Theorem. z-scores can be thus attributed the correct probabilistic meaning of (gaussian) standardized variables 39 , 47 and choosing a threshold z 0 for z allows the identification of significantly deviating patterns. In what follows we will choose z 0 = ± 1.65 as threshold values for the aggregated Vn and Λ n families and z 0 = ± 2 for the subsets-specific corresponding ones (see the Supplementary Information for a justification of such values). Naturally, if the observations were exactly reproduced by our null model, the z-scores would be zero 49 .
The evolution of both similarity and z-scores across the years in our database point out that the Λ n family is better reproduced than the Vn family (showing a similarity and a z-score closer to zero -see panels 6a and 6b). In particular, Vn z-scores lie outside the boundary of the significance region, showing values lower than − 1.65. This indicates that for the binary, bipartite representation of the WTW, the degree sequence is far more effective in reproducing the products correlations than the correlations between countries. In other words, we correctly capture the countries tendency to expand their production, which seems to co-exist with a certain superposition of the countries baskets of products (see M-motifs in the Supplementary Information). However, the BiCM overestimates the resemblance of the different baskets: as z-scores indicate, world-countries tend to form less V-motifs than expected under our null model (further confirmed by the trend of X-motifs and W-motifs -see the Supplementary Information). Summing up, world countries show a clear tendency to diversify their production, at the same time avoiding to directly compete on the same products.
The comparison between similarity and z-score clarifies the role of average in characterizing the ensemble distribution of Vn and Λ n families: the ratio s m /z m ≤ 0.1 justifies our interest in their ensemble average alone.
However, z-scores of Vn and Λ n families result in almost flat trends which allow us to draw only general conclusions on the WTW as a whole. The reason lies in the "aggregated" character of such motifs, not distinguishing between different subsets of countries or products. To be more precise, let us consider the temporal evolution of our motifs on specific subsets of nodes (see panels 6c and 6d): in particular, the Asian Tigers (South Korea, Singapore, Taiwan, Hong Kong), the BRICS countries (Brazil, USSR/ Russia, India, China, South Africa), the european countries belonging to G7 (France, Italy, Germany, United Kingdom) and a number of eastern-european countries (Hungary, Romania, Bulgaria, Poland, USSR/Russia) and let us calculate the temporal evolution of V4 and V5 motifs restricted to them. The european countries show a z-score almost constantly equal to 4, indicating a significant affinity which is maintained over time. An even stronger internal affinity is shown by the Asian Tigers to which China should be added (in fact, its addition to the group rises the z-score). On the other hand, BRICS countries show a very limited affinity 8,9,48 : their trend becomes more and more consistent with the null model, to become negative in the recent years. The last two examples point out the limitations of the traditional economic classification (usually distinguishing China from Asian Tigers and gathering BRICS together), not capturing any actual economic likeness.
Eastern-european countries, on the other hand, show a strong correlation before 1989, gradually declining as this topical year approaches. Interestingly enough, after 1989 such correlation doesn't disappear, remaining statistically significant (and stabilizing around z 2  ): this seems to indicate a significant connection still persisting, having Russia replaced USSR as "reference" country. An additional test is provided by the random choice of four countries (Ghana, China, Mozambique, Austria): although close to zero, their trend is constantly negative. In fact, being Ghana and Mozambique low-diversification countries, they will be linked only to high-ubiquity products, common to all countries (see Fig. 3): thus, their basket will be far more limited than China's and Austria's, limiting in turn their possibility to compete. The constantly negative sign indicates, in this case, the impossibility to compete.
This kind of analysis can be repeated for Λ n motifs as well, allowing us to gain a substantial insight into the products correlations. Panels 6e and 6f show some examples. While the food sector we have considered shows a constantly high value of z, indicating the common origin of the chosen dairy products, the pink trend signals a non-trivial positive correlation between the sectors represented by worked aluminium artifacts, tractors and fruit. A possible explanation may rest upon the consideration that tractors are constituted by parts in aluminium to be, in turn, used to transport the picked fruit. Consistently, the last group of products (cheese, rods and locomotives) is characterized by the value z 0  .
Notice that while for some groups of nodes the first moment encloses great part of the relevant information (s m /z m ≤ 0.5), for other groups higher-order moments could provide additional, useful information (s m /z m ≃ 1), e.g. the distribution asymmetry. Interestingly, these circumstance are mostly encountered for countries and products, respectively. Assortativity coefficient and nestedness. As for the Vn and Λ n motifs, the assortativity coefficient has a gaussian ensemble distribution (see the Supplementary Information). Both the observed value r and its z-score signal that we are globally overestimating the network assortativity: more exactly, since our expected coefficient 〈 r〉 is still negative, we are predicting a less disassortative network than observed (see Fig. 7). This is a consequence of our randomization procedure, distributing links between nodes more homogeneously (recall that, consistently, our predicted { } = show less steeply decreasing trends than the observed ones -see Fig. 3).
In order to better understand the concept of nestedness, let us explicitly draw a matrix from the BiCM-induced grandcanonical ensemble, ranking its rows and columns according to the F c and Q p measures 8,9 . The result is shown in Fig. 7. Notice that nestedness cannot be simply reduced to the concept of . Bottom panels: z-scores (panel e) and similarity (panel f) evolution of Λ nmotifs, restricted to subsets of products -"fruit and parts of plants", "aluminium and aluminium alloys", "road tractors" (.), "milk and cream", "butter", "cheese" (.), four randomly chosen products (.). Right column, panel f: similarity evolution across our database years of the same motifs. Our method correctly captures the countries tendency to expand their production (Λ n-motifs), even if the resemblance of the different baskets of products is overestimated (Vn-motifs). Moreover, our method identifies statistically significative correlations among subsets of countries and products.
"triangularity" of a matrix. In fact, even if the drawn matrix shows a more curved boundary than the observed one, both the nestedness ensemble distribution (see the Supplementary Information) and its z-score (Fig. 7) signal that our method reproduces it correctly.
We have also measured the nestedness along rows and the nestedness along columns separately (according to the definitions in 41 . While the latter is reproduced and closely follows the trend of the global one, the former is, for a few years, significantly underestimated. This is non-trivially related to the way our null model redistributes V-motifs and Λ -motifs. However, as the bottom panel in Fig. 8 suggests, a role seems to be played by the asymmetry of our bipartite matrix as well: in other words, the higher cardinality of the products layer seems to induce a preferential filling of the rows, making them more homogenenous and lowering their expected nestedness. It should be also noted that the ensemble coefficient of variation for both r and NODF show such a small value (s m /z m ≃ 10 −2 for both, across our temporal dataset) that the ensemble average can be considered as the only moment carrying relevant information.

Discussion
In this paper we have both proposed a method to randomize binary, undirected, bipartite networks, by constraining essential network features as the total number of links and the nodes connectivity, and tested it on a real system as the World Trade Web. While, on the one hand, specifying the degree sequence allows highly non-trivial properties like countries fitness, products complexity and the matrix nestedness to become reproduced across our whole dataset, on the other quantities like assortativity and motifs still elude a satisfactorily explanation. This is even more surprising, when considering the high level of accuracy achieved by the CM predictions in the analysis of the monopartite representation of the WTW. Our findings suggest that analysing different representations of the same network can indeed convey additional information, as proved by the agreement between the observed assortativity and the expected one (see Fig. 3), lower than in the corresponding monopartite WTW 26 . In words, the correlations between countries induced by their productivity relations, clearly displayed by the bipartite representation of the WTW, are only partially explained by the degree sequence, calling for a higher amount of information to achieve the same level of accuracy obtained for the monopartite representation (and analogously for products). Otherwise stated, representing the same system via different network models (even belonging to the same class of binary, undirected configurations) may strongly affect the effectiveness of the corresponding piece of information (as the nodes connectivity) in reproducing the observed structure.
Assortativity provides again the clearest example: as previously pointed out, the bipartite Configuration Model predicts trends quite similar to those expected under the bipartite Random Graph. To better quantify this difference, we have calculated the Shannon entropy (normalized to the total number of nodes pairs, i.e. the network volume) of the probability distributions induced by the BiRG and the BiCM: are shown in the bottom panel of Fig. 9. As evident from the trends, while specifying the total number of links strongly reduces the uncertainty (as signalled by the low value of the connectance, reducing the ensemble entropy to half its maximum value), further specifying the degree sequence produces a less relevant effect one could expect on the basis of the well known, monopartite results 26 . Comparing the analyses of degree correlations for the bipartite and the projected WTW (on both countries and products layers -top and middle panels of Fig. 9 for the year 2000), what emerges is quite impressive: while the CM prediction correctly overlaps to the observed trend, the RG predicts a flat trend completely missing the observed cloud of points (in line with the results already obtained for the monopartite representation 26 . In terms of Shannon entropy, when passing from the RG to the CM the reduction of uncertainty on the observed, projected WTW amounts to 41%; for the bipartite WTW, this percentage reduces to only 16% (see Fig. 9). This findings clearly indicate a future extension of our work: constraining those quantities having a significant impact on nodes correlations, as V-motifs, Λ -motifs or nestedness, in order to define a more effective null model. However, as the analysis of motifs reveals, the BiCM provides the right benchmark to highlight meaningful correlations between countries and products, representing a purely topological alternative to the traditional economic classification, whose limitations have been already pointed out 8,9,36 . Remarkably, this kind of analysis can be repeated for different years, in order to monitor our system over time and detect significant temporal trends of the world economies co-evolution.
We stress that our approach is grandcanonical and possible extensions of the method move in the same direction. The paper in 31 , on the other hand, implements the microcanonical version of a mono-layer regular random graph: as for monopartite networks, comparing the performance of the two available approaches represents a challenging, future research direction.
Future work moves towards the direction of extending the present framework to directed, as well as weighted, networks models, to test the robustness of our findings also for configurations beyond the binary, undirected ones.