## Introduction

Network theory provides a simplified representation of a variety of complex systems, i.e. systems composed by many elements whose mutual interactions create new and emergent behaviours. The network description, despite its simplification, allows to detect and measure collective patterns, independently of the nature of the underlying interactions1,2,3,4.

Amongst the quantities analysed in network theory, nestedness5 is one of the most elusive. It was originally observed in biogeography6,7,8 where less frequently observed species are assumed to occupy a niche of the habitats occupied by more ubiquitous species. In terms of the resulting ecological network, nestedness is loosely defined as the observation that the neighbours of nodes with a few connections (lower degree) are typically a subset of the neighbours of nodes with more connections (higher degree). Generalised as such, nestedness has been detected in other networks as well, e.g. in trade networks9,10,11, interbank networks10,12, social-media information networks13, and mutualistic ecological networks5,14. In previous works15,16, nestedness has been found to be highly correlated with the stability of the ecosystem under different types of disturbances and perturbations. The ubiquity and structural importance of nestedness naturally raises some fundamental questions regarding the possible mechanism generating nested patterns in real networks17,18. Actually, while the intuitive notion of nestedness is straightforward, its mathematical definition is not trivial and different metrics, focusing on different aspects, have been proposed. One of the most popular metrics is NODF (Nestedness measure based on Overlap and Decreasing Fill19), which considers the (normalised) overlap between pairs of nodes in the same layer of a bipartite network. Such a definition was later adjusted in order to increase its robustness5,15.

An alternative measure has been proposed by looking at certain spectral properties of the adjacency matrix of a bipartite network. Since it can be shown that, when the degree sequence is constrained on one of the two layers of the network, the spectral radius is maximum for the perfectly nested network20, the spectral radius itself was proposed as a measure of nestedness21 (in the following SNES, i.e. Spectral NEStedness).

Beside the quest for measures properly capturing the sense of nestedness, some researchers focus on disentangling the role of other network properties from the nestedness itself. In this sense, some early contributions focused on the comparison of the measurements with some null models, i.e. statistical models that display some properties of the real system, in order to have a tailored benchmark. Null models were used, for instance, to detect the effect of the degree sequence22,23,24,25. Actually, to properly define a null model, the approaches to follow can be, essentially, 2: 1) to impose constraints exactly or 2) to impose constraints on average, respectively microcanonically and canonically, according to the Statistical Physics jargon. Theoretical tools from statistical physics are not new for the analysis of ecological system: they are commonly used to investigate patterns in biological networks, targeting, from time to time, hierarchical systems26, bipartite structures27 and topological properties of scale-free networks28.

Regarding the former case, beside various approaches, the algorithm of Ref.29 only was shown to be ergodic30, i.e. to visit uniformly the phase space and, thus, to provide unbiased predictions. Instead, in the canonical approach, constraints are satisfied on average and, due to its derivation, is ergodic a priori4,31. The canonical approach allows for some noise in the data: indeed if there is some noise, an existing pollinator-plant interaction may not be detected and the microcanonical approach will not consider the real configuration among the possible ones, while the canonical one will. Using the canonical approach, Ref.32 compared the metric introduced in Ref.15 with a null model preserving the degree sequence and found that in most of the cases the degree sequence is responsible for the high value of the nestedness (actually, the null model implemented in Ref.32 is out the regime of validity). The recent contribution of Payrató-Borràs et al.33 came to similar conclusions, using an improved null model still preserving the degree sequence, but valid for any level of link density of the network, although using an approximated formula for the average of NODF in the ensembles; subsequently the same group enlarged their analysis to a wider number of nestedness metrics34.

As presented above, in the literature, several papers compared the measurements with various null models22,23,24,25,32,33,34,35, but rarely the issue of ergodicity was targeted. For the first time here, we investigate the differences of the micro- and canonical approach in discounting the degree sequence for the analysis of the nestedness. Let us remark that both the null models implemented are ergodic, i.e. they explore the phase space homogeneously, in order to have unbiased and unequivocal results.

In the present paper, we provide an example in which two null models ergodically discounting the same information, i.e. the degree sequence, display opposite correlations between two among the most used nestedness measures. Beside providing another example of the statistical ensemble inequivalence, the main message of our manuscript is that choosing to quantify the amount of nestedness is a subtle task that has to be carried out carefully. Indeed, only being aware of the behaviour and the peculiar properties of the various approaches and options permits to derive the right conclusions from the analyses: our paper provides the necessary knowledge to handle properly the study of the nestedness of a real system. In this sense, we do not provide any univocal indication on which nestedness definition measure should be used or on which is the correct way to discount the information encoded in the degree sequence: both the definitions analysed have their own sense and both the null models examined satisfy their own rationale. Nevertheless, it is crucial to know the properties of the various tools that one is handling in order to derive the proper conclusions from the nestedness analysis of a real system.

## Methods

A bipartite network is defined by two sets of nodes $$\text {L}$$ (of size $$N_\text {L}$$) and $$\Gamma$$ (of size $$N_\Gamma$$) called layers and by the prescription that connections are allowed only between the layers and not inside them. Thus, a bipartite network can be univocally described by its biadjacency matrix $$\mathbf{M }$$, i.e. an $$(N_\text {L}\times N_\Gamma )$$-matrix, whose entries $$m_{i\alpha }=1$$ if a link exists between $$i\in N_\text {L}$$ and $$\alpha \in \Gamma$$ and $$m_{i\alpha }=0$$ otherwise. We will call a network perfectly nested (PNN, Perfectly Nested Network) if for every pair of nodes i, j belonging to the same layer with degrees $$d_i$$, $$d_j$$, if $$d_i \le d_j$$ then all neighbours of i are also neighbours of j. This type of network is also called chain graph20 or double nested graph36. In the following we shall use the previous definitions for generic biadjacency matrices and related quantities, but we shall add an asterisk $$*$$ whenever considering quantities measured on real networks.

### Nestedness measures

#### NODF

One of the most popular measure of nestedness, namely the Nestedness as a measure of Overlap and Decreasing Fill (NODF) was introduced in 2008 by Almeida-Neto et al.19. Such measure is based on the overlap between the neighbourhoods of nodes with different degrees. Given a generic bipartite graph $$G_\text {Bi}$$, the NODF expression reads

\begin{aligned} {\text {NODF}}({\mathbf {M}}) = \frac{1}{K}\Bigg [\sum \limits _{i,j=1}^{N_\text {L}}\Bigg (\theta (k_i - k_j)\cdot \frac{\sum \nolimits _{\alpha =1}^{N_\Gamma } m_{i\alpha }m_{j\alpha }}{k_j}\Bigg ) + \sum \limits _{\alpha ,\beta =1}^{N_\Gamma }\Bigg (\theta (h_\alpha - h_\beta ) \cdot \frac{\sum \nolimits _{i=1}^{N_\text {L}} m_{i \alpha }m_{i \beta }}{h_\beta }\Bigg )\Bigg ] \end{aligned}
(1)

where $$K = \left[ {N_\text {L}(N_\text {L}-1) + N_\Gamma (N_\Gamma -1)} \right] /{2}$$ is a normalisation factor to let the measure go from 0 to 1, $$k_i$$ and $$h_\alpha$$ are respectively the degrees of node $$i\in \text {L}$$ and $$\alpha \in \Gamma$$, and $$\theta$$ is the Heaviside step function with the convention $$\theta (0) = 0$$. The step function ensures that the overlap is only counted when the degrees of the nodes are different and that the denominator is the minimum of the two vertices’ degrees.

#### Stable-NODF

Due to the instability of the previous measure, with respect to small fluctuations on the degrees of the nodes, another version was proposed in5. The difference relies in considering also the contributions coming from couples of nodes with equal degrees; we will call it stable-NODF or sNODF. It is calculated as

\begin{aligned} {\text {sNODF}}({\mathbf {M}}) = \frac{1}{K}\Bigg [\sum \limits _{i<j}^{N_\text {L}}\Bigg (\frac{\sum \nolimits _{\alpha =1}^{N_\Gamma } m_{i\alpha }m_{j\alpha }}{\min (k_i,k_j)}\Bigg ) + \sum \limits _{\alpha <\beta }^{N_\Gamma }\Bigg ( \frac{\sum \nolimits _{i=1}^{N_\text {L}} m_{i \alpha }m_{i \beta }}{\min (h_\alpha , h_\beta )}\Bigg )\Bigg ] \end{aligned}
(2)

where K is the same normalisation factor as in Eq. (1) and the denominator this time is the minimum between the two degrees, that in Eq. (1) was guaranteed by the theta step function.

#### Spectral nestedness

A recently proposed measure of nestedness21 considers the spectral radius of the network, i.e. the largest eigenvalue $$\lambda$$ of the adjacency matrix. We will thus call it spectral nestedness (SNES). The adjacency matrix $${\mathbf {A}}$$ of a bipartite network can be expressed in terms of the biadjacency matrix as

\begin{aligned} {\mathbf {A}}= \left( \begin{array}{c|c} {\mathbf {0}}_{N_\text {L}\times N_\text {L}} &{} {\mathbf {M}}\\ \hline {\mathbf {M}}^T &{} {\mathbf {0}}_{N_\Gamma \times N_\Gamma } \end{array} \right) , \end{aligned}

where $${\mathbf {M}}^T$$ is the transpose of the biadjacency matrix $${\mathbf {M}}$$ and $${\mathbf {0}}_{N\times N}$$ is a $$N\times N$$-matrix whose elements are all zeros. Note that the adjacency matrix of the network is symmetric, yielding all real eigenvalues. The definition is based on two main theoretical results:

• The bipartite network that has the maximum eigenvalue in the set of connected networks with given n nodes and L links is a perfectly nested network36;

• Among all bipartite networks with a given degree sequence on one of the two layers, the one that maximises the spectral radius is the PNN, defined at the beginning of the present section20.

#### Normalised spectral nestedness

The spectral radius, though, has a strong dependence on the size of the network and on its density. It is well known that the maximum eigenvalue of a bipartite network with L links is bounded from above by $$\sqrt{L}$$ and that the only network for which $$\lambda ({\mathbf {M}}) = \sqrt{L({\mathbf {M}})}$$ (if it exists) is a complete bipartite network20,36.

For this reason we decide to introduce nSNES, where we normalise the measure with the square root of the number of edges:

\begin{aligned} {\text {nSNES}}({\mathbf {M}})=\frac{{\text {SNES}}({\mathbf {M}})}{\sqrt{L({\mathbf {M}})}} = \frac{\lambda ({\mathbf {M}})}{\sqrt{L({\mathbf {M}})}} \, . \end{aligned}
(3)

Although the nSNES ranges from 0 to 1, the drawback of this normalisation is that a perfectly nested matrix that is not full will not have a perfect score of 1.

### Null models

In the present paper, we aim at understanding the role of the degree sequence in the formation of bipartite nested structures. Thus, we would need a sort of network benchmark with the same degree sequence, but otherwise maximally random. This approach has strong similarities with Statistical Mechanics: actually, the recipe is to build an ensemble and fix the node degrees on it. As in the standard Statistical Mechanics, those constraints can be imposed on average, as in the canonical construction4,31,37,38,39, or considering stricter constraints, as in the microcanonical formulation29,30. The two approaches are known to be non equivalent40,41,42,43,44,45,46 and indeed such non equivalence is going to be crucial in the following.

After the randomisation with the null models, our aim is to quantify the statistical significance of the measures by computing the z-scores of the measures, calculated as

\begin{aligned} z(X)=\dfrac{X-\langle X \rangle }{\sigma _X} \end{aligned}

where $$\sigma _X$$ is the standard deviation and X the considered quantity.

#### The canonical approach: the bipartite configuration model

The Bipartite Configuration Model (BiCM47) is the bipartite extension of the entropy based null model4,31,37,38,39. The strategy is inspired by work by Jaynes48, which derived the canonical ensemble of Statistical Mechanics from Information Theory principles. The recipe is pretty simple: first, define an ensemble of all possible physical configurations, and then maximise its Shannon entropy constraining the relevant information about the system (in the case of Information Theory, the energy): the result is exactly the canonical ensemble. The maximisation of the Shannon entropy represents the crucial step: it can be interpreted as assuming maximal ignorance about the the non constrained degrees of freedom of the system.

Following the same strategy, starting from a real network, we can define $${\mathscr {M}}$$ the ensemble of all possible biadjacency matrices with the same number of nodes (nodes represent the volume in Statistical Mechanics). The Shannon entropy associated to the ensemble is $$S=-\sum _{\mathbf{M }\in {\mathscr {M}}} P(\mathbf{M })\ln P(\mathbf{M })$$ and we can maximise it, constraining the degree sequence (Note that while in the Jaynes derivation of the Statistical Mechanics, the constraints, i.e. the energy, was a global one, the degree sequence represents a local one. Actually, the local constraint is responsible for the nonequivalence of the microcanonical and canonical ensembles.). The entropy maximisation leads to an exponential probability for a generic biadjacency matrix $$\mathbf{M }$$:

\begin{aligned} P({\mathbf {M}})=\frac{e^{-H(\vec {\theta },\,\vec {C}({\mathbf {M}})})}{Z(\vec {\theta })}, \end{aligned}
(4)

where $$\vec {C}({\mathbf {M}})$$ is the vector of constraints and $$\vec {\theta }$$ the associated Lagrangian multipliers37. At this level the formula 4 is just formal, in the sense that the value of the Lagrangian multipliers is unknown. At the end of the day, we want a “tailored” benchmark for our real network, i.e. something with the same degree sequence, but otherwise completely random. In this sense, it is natural to maximise the likelihood of the real network in order to get the value of $$\vec {\theta }$$38,39. If $$\vec {C}(\mathbf{M }^*)$$ is the value of $$\vec {C}$$ measured on the real network, the previous condition is equivalent to impose $$\langle \vec {C}(\vec {\theta })\rangle =\sum _{{\mathbf {M}}\in {\mathscr {M}}}P(\mathbf{M })\vec {C}(\mathbf{M })=\vec {C}(\mathbf{M }^*)$$.

The exact solution for the probability $$P(\mathbf{M })$$ can be factorised as the product of probabilities per possible link:

\begin{aligned} P({\mathbf {M}})=\prod _{i, \alpha }p_{i\alpha }^{m_{i\alpha }}(1-p_{i\alpha })^{1-m_{i\alpha }}, \end{aligned}
(5)

where $$p_{i\alpha }$$ is the probability of existence of the link connecting nodes i and $$\alpha$$. Let us remark that the factorisation (5) is possible only when the constraints are linear in the biadjacency matrix. For other nonlinear contraints, the probability per link may not be analytical and other methods are necessary to obtain the probability per graph (see for instance49).

In the case of the BiCM, $$p_{i\alpha }$$ is a function of $$x_i$$ and $$y_\alpha$$, which are simple reparametrisations of the Lagrange multipliers associated to the observed degrees ($$k_i$$ and $$h_\alpha$$ respectively):

\begin{aligned} p_{i\alpha }=\frac{x_iy_\alpha }{1+x_iy_\alpha }. \end{aligned}
(6)

Their numerical value is determined by solving the likelihood-maximisation equations:

\begin{aligned} \left\{ \begin{array}{ll} \langle k_i\rangle &{}=\sum _\alpha p_{i\alpha } = k_i^*,\,\, i=1\dots N_\text {L}\\ &{}\\ \langle h_\alpha \rangle &{}=\sum _i p_{i\alpha } = h_\alpha ^*,\,\, \alpha =1\dots N_\Gamma \end{array} \right. , \end{aligned}
(7)

$$k_i$$ and $$h_\alpha$$ being the degree of the node i and $$\alpha$$ respectively.

#### The microcanonical approach: the Curveball algorithm

The microcanonical approach, differently from the BiCM, keeps the degrees of all nodes in the system constant. In a sense, it has a stricter ensemble (just all configurations with the given degree sequence are allowed) and all allowed configurations have the same probability. Such approach is computationally costly since the probabilities of links in the system are not pairwise independent and the fastest way of spanning the ensemble of networks with a given degree sequence relies on swapping endpoints of links iteratively. In the present manuscript, the ensemble was sampled using the strategy of29.

We will refer to this model as Curveball, as in the original paper; in30 it was shown that such approach is ergodic.

## Results

In this section we are going to present the results of our analyses on artificial and real networks. To test the measures and models, we analyze a set of 40 pollination networks taken from the Web of Life dataset (www.web-of-life.es). They represent ecological mutualistic networks of plant-pollinators. From the set of all available pollination networks in Web of Life, we selected only the binary ones, in order to avoid issues regarding binarisation. All of the considered networks are generally of small size, the smallest being of only 20 nodes while the biggest one consists of 1500 nodes. The density of the networks varies between 0.01 and 0.5. For the sake of completeness we remark that only 24 out of 40 networks of our dataset are actually made of a single connected component, the other including few disconnected components with more than one node. In the following, we compare the various measures and state their significance respect to the various null models.

### Measure differences

First, in order to study the behaviours of the previous measures, we compare them on the above-mentioned dataset. Fig. 1 shows that indeed the normalised SNES is highly correlated with NODF (actually, it is not true for the non normalised version of the spectral nestedness, due to its dependence on the total number of link). In a sense we may think that indeed, while they differ in the philosophy, the two measures are capturing the same structure, as stated in21. After a detailed comparison with the appropriate null models, we will see that it is not the case.

### Degree sequence versus nestedness

The degree sequence of the network carries some information about the nestedness of the system, the extreme case being the Perfectly Nested Network (PNN in the following) one. Actually, in this case, the degree sequence identifies completely the network and both the micro- and the canonical ensembles are composed by a single network, i.e. the PNN one. This was already observed in50 for the microcanonical ensemble, but it is surprisingly true also for the canonical ensemble. While the technical details can be found in Section 2 in the Supplementary Information, a pictorial representation is shown in Fig. 2.

Thus in these cases the degree sequence captures the level of nestedness of the whole system, and the statistical significance of the measure loses any value. Actually, even when the network is close to a perfectly nested one, its configuration model ensembles contain a limited number of configurations, that furthermore are all highly nested networks. Thus, a real network may show a high value of the nestedness measure (whatever it is), which is, nevertheless, statistically non significant with respect to a null model discounting the degree sequence: actually in such a case the high value of the nestedness is already captured by the degree sequence. We will examine in more details the role of the null model in the following sections.

### Measure and models differences

The averages of the measures are systematically different when using a microcanonical model or a canonical one for several reasons. The first observation is that the variance in the degrees of the nodes generates a bias in all quantities that scale non-linearly in the number of links.

Actually, there is another issue generating a bias the canonical ensemble. Indeed, a network sampled from the ensemble can present some isolated nodes that do not contribute to the measurements of both NODF and SNES (and their modifications). Given the steep power law degree distribution of many of the considered networks, this will typically be the case. For further details, in Section 3 of the Supplementary Information we analyse the frequency of isolated nodes in generating the canonical ensemble. We will discuss this issue and how it generates a bias in greater detail focusing on each of the two measures in the following paragraphs.

#### NODF versus null models

The displacement of the NODF measures between the two ensembles is the result of multiple effects. The most evident bias is caused by the normalisation factor that is the denominator in Eqs. (1) and (2). A network sampled from the BiCM ensemble will have, on average, the same number of links of the original network, but many isolated nodes (see the Supplementary Information Section 3 for more details). This is due to the small link probabilities related to nodes of low degree in large networks, that sometimes give rise to an empty row or column in a sampled matrix. These nodes, therefore, do not contribute to the total NODF or sNODF, and one has to choose how to handle the normalisation factor in 1 and 2.

If one chooses to consider the number of connected nodes of the sampled network (as in the original definition of the NODF), this will generate a positive bias by having a comparable quantity divided by a lower denominator (the number of the connected nodes in each realisation can be only smaller than the value of the real network). We call this approach heterogeneous.

Otherwise, considering the normalisation factor of the original network will introduce contributions even from isolated nodes, thus altering the philosophy of the original definition. Moreover, such approach will introduce a bias in the opposite direction, dividing by a factor that is larger than what it should be if considering only the connected network. We call this approach homogeneous.

Both choices are equally admissible, depending on the interpretation of the comparisons one wants to follow. Personally, we think that the normalisation should not involve the isolated nodes, as in the original definition, i.e. we prefer the heterogeneous normalisation. For completeness, in the next subsections we will consider both of them. Interestingly enough their differences do not affect the conclusions.

On top of this, another effect to be considered is the presence of fluctuations in considering the degree sequence. As mentioned in the previous section, both ensembles contain only one configuration in the case of a perfectly nested degree sequence and their measures are trivially exactly the same. When the two ensembles separate for a non-perfectly nested matrix, the canonical ensemble produces some variance in the degrees of the nodes. This effect is not present in the microcanonical ensemble, where the degrees of all nodes are fixed deterministically. Such an effect has an impact on the NODF and in particular it provides new evidences regarding the non equivalence of the various ensembles.

#### SNES versus null models

We observe that the spectral radius is slightly overestimated in the canonical model. Our guess for this behaviour is that on average, out of two matrices with the same number of links, the one with the smallest number of nodes has the largest radius, so when a sample of the canonical model has an empty row or column it has, on average, a higher radius. Some evidences for this behaviour are given in the Supplementary Information Section 4. Still, we are not able to evaluate such discrepancy. Regarding the nSNES, the overestimation seems to be strongly increased by the normalisation factor. Although this is intuitive, since $$\langle \sqrt{L} \rangle < \sqrt{\langle L \rangle }$$ because of the non-zero variance of L, the fact that the spectral radius has an intrinsic dependence on L makes it hard to evaluate precisely.

#### The significance of the nestedness measures with respect to the various statistical ensembles

Bearing in mind all of the considerations of the previous paragraphs, we can interpret the z-scores of the table in Fig. 3. The four canonical NODF columns of the table refer to the z-scores of the two variants of NODF, with the two different normalisations with respect to the canonical ensemble. A representation of the differences among the models and measures is also given in Fig. 4.

In the case of the heterogeneous normalisation the z-scores are all negative, because of the overestimation of both NODF and sNODF in the ensemble. There are, though, important differences between the NODF and sNODF measures in some cases, which are mainly due to the presence of many nodes with the same degree.

For the homogeneous normalisation, there is still a certain agreement in the signs of the z-scores of NODF and sNODF, with the same caveat discussed above for the heterogeneous case. In opposition to the heterogeneous columns, the z-scores are positive in most of the networks analysed, in agreement with the discussion of “NODF versus null models” section). In this sense, it is striking that the choice of the normalisation factor may drive to opposite conclusions, regarding the statistical significance of the measure on the real network.

Then we have the columns of SNES: similarly to the heterogeneous normalised NODF, even in this case, the canonical null model has all negative z-scores, due to the slight overestimation of the SNES. The second-last column contains the microcanonical SNES z-scores.

As it can be observed from the matrix, there is no agreement between the column of the SNES and the sNODF columns in the microcanonical ensemble. Note that for the microcanonical null model, the z-scores of the nSNES were not reported since they correspond to the one of the SNES (the normalising factor cancels). A hint is given by the assortativity z-scores, whose Pearson correlation coefficient with the SNES scores is 0.84, while SNES and NODF anti-correlate with a score of − 0.88.

In order to investigate this difference, we generate a scatter plot of the realisations of the different ensembles (Fig. 5), plotting the NODF against the SNES of the sampled networks. The results are striking: the two measures are highly anti-correlated on the microcanonical ensemble, while this effect is hindered by the fluctuations in the canonical ensemble.

Using the other proposed measures, the results are always similar when comparing a NODF measure and a spectral nestedness measure. NODF and SNES are actually capturing different ways of being “nested”. This is easily seen on a synthetic very small network, of size $$8\times 9$$. We generate a sample from the microcanonical model and see how the matrices maximising NODF and SNES are made (Fig. 6).

Actually the NODF-maximising matrix has one of the smallest value of assortativity, while the one that maximises the SNES presents a big hub of the highest degree nodes and two smaller disconnected subgraphs, see Fig. 6. Roughly speaking, on the one hand, the SNES prefers networks in which highly connected nodes link to highly connected nodes, since they are sort of carrying the “mass” of the adjacency matrix (which is what the spectral radius is measuring). On the other hand the NODF, due to the denominator of its contributions, prefers to link poorly connected nodes with highly connected ones, thus focusing on disassortative configurations. Regarding the anti-correlation between the NODF (or similar definition) and the assortativity, other studies got to similar conclusions32,52; as far as we know, there were no evidences regarding the opposite behaviour of the SNES.

Let us underline that the (anti)correlation between the NODF or SNES and the assortativity is present only when discounting microcanonically the degree sequence: in real data, such correlation is not evident, as Fig. 7 shows. Otherwise stated, the (anti)correlation is present only when the contribution of the degree sequence is discounted. More details regarding the correlation between assortativity and the various nestedness metrics can be found in the Supplementary Information Section 5: Fig. D in the Supplementary Information shows examples for other networks in the Web of Life dataset.

## Discussion

While the abstract idea of nestedness in networks is quite straightforward, we saw that a mathematical definition capturing the degree of nestedness of a real system is much less trivial. As a consequence, while nested structures are ubiquitously observed across several networks, measuring the actual level of nestedness along with its statistical significance remains a challenging task.

In the present manuscript we investigated in details different metrics of nestedness in both real-world and synthetic networks. In particular, we mainly focused on two measures, NODF19 and SNES21, and some of their modifications5. When applied to real networks, these metrics go in the same direction, as they give positively correlated results.

We then moved to discount the contribution of the degree sequence to the different nestedness measures. Literally, according to the case of study and to the available information on our system we can create suitably chosen series of randomised copies of our graph (ensembles). This procedure allows us to use the machinery of Statistical Physics to assess the significance of our measurements.

Thus, for our aim, we can define null models preserving the degrees of nodes either as hard (microcanonical ensembles29) or as soft (canonical ensembles4,31) constraints. Otherwise stated, we are using the extensions of the microcanonical and canonical ensembles to complex networks in order to discount the information carried by the node degrees: the degree sequence is supposed to have an effect on the nestedness32,33, thus we want to focus on the information carried by the different metrics that cannot be explained by the degree sequence only. Let us remark that the null models implemented are ergodic, i.e. they explore the phase space uniformly.

First, we concentrated our attention on Perfectly Nested Networks (PNN). A PNN has a degree sequence that admits only a single network, i.e. the PNN itself, irrespective of whether the degrees are treated as hard or soft constraints: both the microcanonical and canonical ensembles of a PNN are composed by the PNN network only. Otherwise stated, there exist perfectly nested degree sequences and each of them defines univocally a single network, i.e. the PNN one. In the case of PNNs, thus, the value of the nestedness is completely due to the degree sequence only. But what happens when the network is not perfectly nested?

We compared the values of NODF and SNES measured on real networks with the expectations of, respectively, the microcanonical and canonical ensembles. As theoretically demonstrated in other studies40,41,42,43,44,45,46, the two ensembles are not equivalent, thus they should be characterised by different macroscopic properties. Literally, we found that the two families of nestedness metrics are negatively correlated when the microcanonical ensemble is used, while they are positively correlated in the canonical ensemble. Actually, the fluctuations of the canonical ensemble cover the real behaviours of NODF and SNES. Instead, once the degree sequence is fixed as a hard constraint, the level of nestedness is influenced by higher-order correlations between the degrees themselves, and in particular the assortativity of the network. Indeed, the two classes of measures of nestedness give different results in the microcanonical ensemble, when considering networks with different assortativity: NODF tends to give larger values of nestedness when the network is disassortative, while SNES tends to give larger values of nestedness when the network is assortative. Otherwise stated, we present an example in which the same information, i.e. the degree sequence, is ergodically discounted, but the sign of the measured correlations are opposite if compared with a microcanonical of with a canonical null model.

Thus, other than the choice of the measure, if checking for the statistical significance of the nestedness of a system, one should make a principled choice of the ensemble used as a null model in the analysis45,46. The microcanonical ensemble, which treats degrees as hard constraints, should be preferred if the observed degrees are error-free, i.e. if they are the actual values of the property to be kept fixed in the null hypothesis. If one suspects that the observed degrees are instead subject to some sort of error (e.g. measurement errors, incomplete data collection, poor sampling, etc.), then the microcanonical ensemble should be avoided, as it will give zero probability to the true (undistorted) configuration and to any configuration with the same degree sequence as the true configuration. In this case, we suggest to use the measures that present the smallest biases for fluctuating degrees, i.e. the SNES and the homogeneous sNODF or NODF.

Let us finally remark that our paper does not provide any indication on which is the nestedness metric that should be used, or on which is the right null model to be implemented in order to state the statistical significance of the nestedness measured. In a sense, each nestedness measure has contraindications, and every null model, even if discounting the same information, has its peculiar properties. In this sense, it is crucial to know exactly the behaviour of the ingredients we are handling. Our contribution is in highlighting odd behaviours, previously undetected, that, if not under control, can take to unjustified conclusions.

Nevertheless, in light of our results, the question remains on how to tackle the problem of the nestedness in real system, even after a proper and justified choice of the nestedness measure. An easy solution can be, once the null model has been chosen, to report for the chosen nestedness measure, both the average over the ensemble and the z-scores on the real network. The former value provides an evaluation of the nestedness as encoded by the degree sequence, the latter how significant is the observed nestedness, once the degree sequence is discounted.