Introduction

Bipartite or two-mode networks are composed of two types of nodes, which we call agents and artifacts, and edges between nodes of one type and nodes of the other type. These networks can be used to represent a wide range of phenomena and therefore are studied in a diverse range of disciplines. For example, natural selection unfolds as species (the agents) compete over sites (the artifacts), commerce is possible as traders exchange resources, scientific advances are reported as scholars write papers, and laws are adopted as legislators sponsor bills. Although bipartite networks are useful in their own right, they can also be useful for inferring unipartite (i.e., one-mode) networks that are difficult to measure directly. For example, while it may be difficult to directly survey politicians about their political alliances because they are busy and may have reasons to misrepresent their true alliances, it may be possible to infer political alliances from politicians’ co-sponsorship of legislation, which is readily observable1,2. A bipartite projection transforms a bipartite network into a unipartite co-occurrence network in which pairs of agents are connected by edges whose weights capture their number of shared artifacts3,4,5. For example, competitive interaction networks can be inferred from species’ co-occurrence in sites6, trade networks can be inferred from firm co-location7,8,9 or product co-exchange3, scholarly collaboration networks can be inferred from paper co-authorship10, and political alliance networks can be inferred from bill co-sponsorship1. Throughout the paper we use these applications to offer concrete examples, however the models we discuss are general and can be applied to extract unipartite backbones in such diverse contexts as flavor11, misinformation12, text13, and genetic14 networks. Indeed, in principle any unipartite network can be represented as the projection of some bipartite network15,16,17.

Despite their promise, bipartite projections (i.e., co-occurrence networks) are challenging to analyse because they are typically dense and weighted, and because the edge weights do not necessarily capture the strength of the relationship between nodes18. As a result, it is often useful to analyze the backbone of a bipartite projection, which is an unweighted and typically sparser network that retains only the most ‘important’ edges. Although well-known methods exist for extracting the backbone of weighted networks that are not bipartite projections19,20, methods designed specifically for bipartite projections have recently been developed9,18,21,22. Among these methods, the fixed degree sequence model (FDSM) relies on an intuitive null model, but requires computationally expensive Monte Carlo simulations, making it impractical for extracting the backbone of large bipartite projections. Faster methods are available, however relatively little is known about whether they yield backbones that are similar to those that would be obtained from using FDSM23, and therefore whether they offer computationally efficient alternatives. To offer guidance to researchers wishing to extract an FDSM-like backbone from a large bipartite projection, in this paper we consider four potential alternatives to FDSM: fixed fill model (FFM) fixed row model (FRM), fixed column model (FCM), and stochastic degree sequence model (SDSM).

The paper is organized in six sections. We begin by formally defining bipartite projections, backbones, and the five backbone models, presenting proofs of the probability mass functions for their respective edge weight distributions in the Supplementary Text S1. In study 1, we evaluate the accuracy and speed of different approaches for estimating cell-filling probabilities used by the SDSM. In study 2, we evaluate the statistical power of the SDSM relative to the FDSM. In study 3, we examine how degree distributions impact the similarity of backbones extracted using FDSM and each of the alternative models. In study 4, we examine the extent to which backbones extracted using different models accurately recover a known community structure. Finally, we conclude with recommendations for backbone model selection and opportunities for future model development.

Backbone extraction for bipartite projections

Preliminaries

A bipartite network captures connections between nodes of one type (agents) and nodes of a second type (artifacts). Throughout this section, we use the ecological case of Darwin’s Finches to provide a concrete example24,25. On his voyage to the Galapagos Islands on the H.M.S. Beagle, Darwin observed that only some species of finches lived on each island. These patterns can be represented as a bipartite network in which finch species (the agent nodes) are connected to the islands (the artifact nodes) where they are found26. A bipartite network can be represented as a binary matrix in which the agents are arrayed as rows, and the artifacts are arrayed as columns. We use \({\mathbf {B}}\) to denote a bipartite network’s representation as a matrix, where \(B_{ik}=1\) if agent i is connected to artifact k, and otherwise is 0. The sequence of row sums and the sequence of column sums of \({\mathbf {B}}\) are called the agent and artifact degrees sequences, respectively. These sequences are among the bipartite network’s most significant features and are known to have implications for bipartite projections and backbones15,27,28. In the ecological case, the agent degree sequence captures the number of islands where each species is found, while the artifact degree sequence captures the number of species found on each island.

The projection of a bipartite network is a weighted unipartite co-occurrence network in which a pair of agents is connected by an edge with a weight equal to their number of shared artifacts. For example, the bipartite projection of Darwin’s finch network is a species co-occurrence network in which a pair of finch species is connected by an edge with a weight equal to the number of islands where they are both found. We use \({\mathbf {P}}\) to denote the matrix representation of a bipartite projection, which is computed as \({{\mathbf {B}}}{{\mathbf {B}}}^T\), where \({\mathbf {B}}^T\) indicates the transpose of \({\mathbf {B}}\). In a projection \({\mathbf {P}}\), \(P_{ij}\) indicates the number of times agents i and j were connected to the same artifact k in \({\mathbf {B}}\). The diagonal entries of \({\mathbf {P}}\), \(P_{ii}\), are equal to the agent degrees, but in practice are ignored.

The backbone of a bipartite projection is a binary representation of \({\mathbf {P}}\) that contains only the most ‘important’ or ‘significant’ edges. For example, the backbone of a species co-occurrence network connects pairs of species if they are found on a significant number of the same islands, which might be interpreted as evidence that the two species do not compete for resources and perhaps are symbiotic. We use \({\mathbf {P}}'\) to denote the matrix representation of the backbone of \({\mathbf {P}}\). Because multiple methods exist for deciding when an edge is significant and thus should be preserved in the backbone, we use \(\mathbf{P }^{'{\text {M}}}\) denote a backbone extracted using method M. It is important to note that for a given bipartite projection, there is no ‘true’ backbone, but only backbones corresponding to specific backbone methods M. The backbone extracted using FDSM (i.e. \(\mathbf{P }^{'{\text{FDSM}}}\)) may be similar or different from a backbone extracted using another method such as SDSM (i.e. \(\mathbf{P }^{'{\text {SDSM}}}\)), and these similarities and differences depend on the information that is considered by the respective methods when determining whether edges’ weights are significant. It is these similarities and differences that we explore in the four studies below.

Backbone extraction methods that were originally developed for non-projection weighted networks are often applied to weighted bipartite projections. One simple method preserves an edge in the backbone if its weight in the projection exceeds some global threshold T. However, when \(T = 0\), which is common, the backbone will be dense and have a high clustering coefficient because each artifact of degree d induces \(d(d-1)/2\) edges in the backbone29. Using \(T > 0\) can yield a sparser and less clustered backbone30,31,32, but still yields highly clustered networks in which low-degree nodes are excluded while high-degree nodes are preserved19. More sophisticated methods, including the disparity filter19 and likelihood filter20, aim to overcome these limitations of the global threshold method by using a different threshold for each edge based on a null model. However, all methods that were developed for non-projection weighted networks have the same shortcoming when applied to weighted bipartite projections: they ignore information about the artifacts, which is lost when generating the projection18. In the ecological case, the global threshold, disparity filter, and likelihood filter methods all decide whether two species should be connected in the backbone only by examining how many islands these two species are both found on, but do not consider the characteristics of those islands, including how many other species are found there, or even how many islands there are. Therefore, although these methods are promising for extracting the backbone from non-projection weighted networks, different methods are required for extracting the backbone from a bipartite projection.

Bipartite ensemble backbone models

Bipartite ensemble backbone models decide whether an edge’s observed weight \(P_{ij}\) is significantly large, and thus whether a corresponding edge should be included in the backbone by comparing it to an ensemble of random bipartite networks. Let \({\mathscr {B}}\) be the set of all bipartite networks \(\mathbf {B^*}\) having the same number of agents and artifacts as \({\mathbf {B}}\). In the ecological case, \(\mathbf {B^*}\) might be viewed as representing a possible world containing the same species and islands, but in which locations of species on islands is different, and likewise \({\mathscr {B}}\) is the set of all such possible worlds. The bipartite ensembles used in backbone models take a subset \({\mathscr{B}}^{\text{M}}\) of \({\mathscr {B}}\), subject to certain constraints M, and impose a probability distribution on it. In all models except the SDSM, the uniform probability distribution is imposed on \({\mathscr{B}}^{\text{M}}\), that is, each element of the ensemble is equally likely. The backbone is then extracted from the projection of \({\mathbf {B}}\) by using the distribution of edge weights arising from projections of members of the ensemble to evaluate their statistical significance.

We use \(P^*_{ij}\) to denote a random variable equal to \((\mathbf {B^*}\mathbf {B^*}^T)_{ij}\) for \(\mathbf {B^*}~\in ~{\mathscr {B}}^{\text {M}}\). That is, \(P^*_{ij}\) is the number of artifacts shared by i and j in a bipartite network randomly drawn from \({\mathscr {B}}^{\text {M}}\). In the ecological case, \(P^*_{ij}\) represents the number of islands that are home to both species i and j in a possible world, while the distribution of \(P^*_{ij}\) is the distribution of the number of islands shared by species i and j in all possible worlds.

Decisions about which edges should appear in a backbone extracted at the statistical significance level \(\alpha\) are made by comparing \(P_{ij}\) to \(P^*_{ij}\)

$$\begin{aligned} P_{ij}'= {\left\{ \begin{array}{ll} 1 &{} \quad {\text { if }} \Pr (P^*_{ij} \ge P_{ij}) < \frac{\alpha }{2},\\ 0 &{} \quad {\text {otherwise.}} \end{array}\right. } \end{aligned}$$

This test includes edge \(P'_{ij}\) in the backbone if its weight in the observed projection \(P_{ij}\) is uncommonly large compared to its weight in projections of members of the ensemble \(P^*_{ij}\). We use a two-tailed significance test in the studies below because, in principle, an edge’s weight in the observed projection could be uncommonly larger or uncommonly smaller than its weight in projections of members of the ensemble, however a one-tailed test may also be used. In the ecological case, two species are connected in the backbone if their number of shared islands in the observed world is uncommonly large compared to their number of shared islands in all possible worlds.

There are many ways that \({\mathscr {B}}\) can be constrained33, with each set of constraints describing a particular ensemble \({\mathscr {B}}^{\text {M}}\), which is used in a particular ensemble backbone model M to yield a particular backbone \({\mathbf {P}}^{'M}\). In the case of ensembles used to extract the backbone of bipartite projections, our focus in this paper, two broad types of constraints are common23. First, ensembles can be distinguished by what they constrain: only the number of edges, the degrees of the agent nodes, the degrees of the artifact nodes, or the degrees of both the agent and artifact nodes. Second, ensembles can be distinguished by how they impose these constraints: the constraints can be satisfied exactly, or only on average. In statistical physics, ensembles that impose exact or ‘hard’ constraints are known as microcanonical, while ensembles that satisfy constraints on average or impose ‘soft’ constraints are known as canonical9.

Prior work on these ensembles generally adopts either a theoretical focus on the ensembles themselves, or an applied focus on the consequences of ensemble choice. In the theoretical literature, some (primarily mathematicians) have aimed to characterize the properties of ensembles, such as estimating the cardinality of the ensemble of matrices with fixed rows and columns (below, we call this ensemble \({\mathscr{B}}^{{\text{FDSM}}}\))34. Others (primarily physicists) have aimed to identify conditions under which ensembles are equivalent or non-equivalent, typically interpreting ensembles as representing thermodynamic systems35,36,37. In the applied literature, the focus is not on identifying fundamental properties of ensembles, but instead on understanding the implications of choosing a particular ensemble when detecting a particular pattern, such as nestedness38 or community structure23,27. The present work falls into this latter group: we are not directly concerned with identifying fundamental properties of ensembles, but instead on identifying the consequences of ensemble choice, with the ultimate goal of offering practical guidance to applied researchers wishing to extract the backbone of a bipartite projection.

In the remaining subsections below, we first describe the FDSM in terms of its ensemble. We then present four potential alternative backbone models whose ensembles differ only slightly from FDSM, in terms of either what they constrain or how they impose constraints. We then turn to exploring the consequences of choosing one of these alternatives over FDSM when extracting a backbone.

Fixed degree sequence model (FDSM)

In the fixed degree sequence model (FDSM), \(\mathbf {B^*}~\in ~{\mathscr {B}}^{{\text{FDSM}}}\) are constrained to have the same agent and artifact degree sequences as \({\mathbf {B}}\). That is, FDSM constrains the degrees of both the agent and artifact nodes, and requires that these constraints are satisfied exactly, making it a tightly-constrained microcanonical ensemble. Adopting the FDSM implies, for example, that in all possible worlds a given species is found on exactly the same number of islands, and a given island is home to exactly the same number of species. The distribution of \(P^*_{ij}\) arising from \({\mathscr {B}}^{{\text{FDSM}}}\) is unknown, but can be approximated by uniformly sampling \(\mathbf {B^*}\) from \({\mathscr {B}}^{\text{FDSM}}\), constructing \(\mathbf {P^*}\), and saving the values \(P^*_{ij}\). In the studies below, we use 1000 samples of \(\mathbf {B^*}\) generated using the ‘curveball’ algorithm, which is among the fastest methods to sample \({\mathscr {B}}^{\text{FDSM}}\) uniformly at random39,40. The FDSM has been used to extract the backbone of bipartite projections of, for example, movies co-liked by viewers21 and conference panel co-participation by scholars41,42.

The FDSM offers an intuitively appealing approach to extracting the backbone of bipartite projections because it fully controls for both bipartite degree sequences, which are known to be responsible for many of the projection’s structural characteristics15,16. However, because the distribution of \(P^*_{ij}\) must be computed via Monte Carlo sampling, it is computationally costly, making it impractical for all but relatively small bipartite projections. There are at least three distinct computational challenges. First, although the curveball algorithm is the fastest among existing methods for randomly sampling a bipartite graph with fixed degree sequences (i.e. for sampling \(\mathbf {B^*}\) from \({\mathscr {B}}^{\text{FDSM}}\)), it still can require several seconds per sample for large graphs. Second, once a \(\mathbf {B^*}\) has been sampled, constructing each \(\mathbf {P^*}\) requires matrix multiplication, which must be performed repeatedly and has complexity of at least \({\mathscr {O}}(n^{2.37})\)43. Finally, computing an edge’s p value (i.e. \(\Pr (P^*_{ij} \ge P_{ij})\)) with sufficient precision to achieve a specified familywise error rate that controls for Type-I error inflation due to multiple testing22 can require these sampling and multiplication steps to be performed a very large number of times (see Supplementary Text S2).

These computational challenges have led researchers to develop other backbone models3,9,18. Many such models exist, however here we are focused on identifying methods that yield backbones similar to what would be obtained using FDSM, and thus which may serve as computationally-feasible alternatives to FDSM. Therefore, we consider only those models whose ensembles involve at least one of the two types of constraints imposed by FDSM. That is, we consider models that either (1) impose exact constraints, or (2) impose constraints on both the agent and artifact degrees.

Fixed fill model (FFM)

In the fixed fill model (FFM), \(\mathbf {B^*}~\in ~{\mathscr {B}}^{{\text {FFM}}}\) are simply constrained to contain the same number of 1s as \({\mathbf {B}}\). That is, the FFM constrains only the number of edges, but requires that this constraint is satisfied exactly. Adopting the FFM implies, for example, that in all possible worlds only the total number of species-island pairs is fixed, but any given species may be found on a different number of islands and any given island may be home to a different number of species. The distribution of \(P^*_{ij}\) arising from \({\mathscr {B}}^{{\text {FFM}}}\) has not been described before, but is derived in Supplementary Text S1.1. We call it a Jacobi distribution because it is related to Jacobi polynomials.

Fixed row model (FRM)

In the fixed row model (FRM), \(\mathbf {B^*}~\in ~{\mathscr {B}}^{{\text {FRM}}}\) are constrained to have the same agent degree sequence as \({\mathbf {B}}\), but have unconstrained artifact degree sequences. That is, the FRM constrains the degrees of the agent nodes, and requires that this constraint is satisfied exactly. A canonical variant of the FRM, the \(\hbox {BiPCM}_r\), also constrains the degrees of the agent nodes, but only requires this constraint to be satisfied on average; we do not consider it here because it involves neither of FDSM’s constraints9. Adopting the FRM for backbone extraction implies, for example, that in all possible worlds a given species is found on the same number of islands, but a given island may be home to a different number of species. The distribution of \(P^*_{ij}\) arising from \({\mathscr {B}}^{{\text {FRM}}}\) is hypergeometric (see Supplementary Text S1.2), and for this reason it is sometimes referred to as the hypergeometric model22,23,44. The FRM has been used to extract the backbone of bipartite projections of, for example, movies co-starring actors22, papers co-written by authors22, parties co-attended by women44, majority opinions joined by Supreme Court justices44, and microRNAs co-associated with diseases45.

Fixed column model (FCM)

In the fixed column model (FCM), \(\mathbf {B^*}~\in ~{\mathscr {B}}^{{\text {FCM}}}\) are constrained to have the same artifact degree sequence as \({\mathbf {B}}\), but have unconstrained agent degree sequences. That is, the FCM constrains the degrees of the artifact nodes, and requires that this constraint is satisfied exactly. A canonical variant of the FCM, the \(\hbox {BiPCM}_c\), also constrains the degrees of the artifact nodes, but only requires this constraint to be satisfied on average; we do not consider it here because it involves neither of FDSM’s constraints9. Adopting the FCM for backbone extraction implies, for example, that in all possible worlds a given species may be found on a different number of islands, but a given island is home to the same number of species. The distribution of \(P^*_{ij}\) arising from \({\mathscr {B}}^{{\text {FCM}}}\) has not been described before, but is derived in Supplementary Text S1.3, where we show it is Poisson-binomial.

Stochastic degree sequence model (SDSM)

Finally, the stochastic degree sequence model (SDSM) takes \({\mathscr {B}}^{{\text {SDSM}}}\) to be all binary \(m \times n\) matrices, but also gives a process for generating these matrices with different probabilities. Each \(\mathbf {B^*}\) is generated by filling the cells \(B^*_{ik}\) with a 0 or 1 depending on the outcome of an independent Bernoulli trial with probability \(p^*_{ik}\). The distribution of the random variable \(P^*_{ij}\) arising from \({\mathscr {B}}^{{\text {SDSM}}}\) is Poisson-binomial with parameters which can be computed using the \(p^*_{ik}\) (see Supplementary Text S1.4)27,46. There are many ways to choose \(p^*_{ik}\), but in the studies below we choose \(p^*_{ik}\) so that it approximates \(\Pr (B^*_{ik} = 1)\) for \(\mathbf {B^*}~\in ~{\mathscr {B}}^{{\text{FDSM}}}\). This choice of \(p^*_{ik}\) ensures that the SDSM constrains the degrees of both the agent and artifact nodes, but only requires these constraints to be satisfied on average. Adopting such a version of SDSM implies, for example, that in each possible world a given species may be found on many or few islands and a given island may be home to many or few species, but the average number of islands on which a given species lives in all possible worlds and the average number of species that live on an given island in all possible worlds matches these values the observed world. The SDSM has been used to extract the backbone of bipartite projections of, for example, legislators co-sponsoring bills1,18,47,48,49, zebrafish (Danio rerio) sharing operational taxonomic units50, countries sharing exports3, and genes expressed in genesets51.

Study 1: Choosing cell-filling probabilities for the SDSM

The SDSM requires choosing \(p^*_{ik}\), which we want to approximate \(\Pr (B^*_{ik} = 1)\) for \(\mathbf {B^*}~\in ~{\mathscr {B}}^{{\text{FDSM}}}\). There are three types of methods that might be used for doing so: arithmetic, general linear models, and entropy maximization. First, we can choose \(p^*_{ik} = (r_i~\times ~c_k)/f\), where \(r_i\) is the sum of entries in row i of \({\mathbf {B}}\), \(c_k\) is the sum of entries in column k of \({\mathbf {B}}\), and f is the sum of all entries in \({\mathbf {B}}\). When \(p^*_{ik}\) falls outside the [0, 1] range, it is simply truncated toward 0 or 1, respectively. This method has a long history in ecology25; we call it RCF because the value is chosen based on a row sum, a column sum, and the number of entries of \({\mathbf {B}}\) that are filled with a one, but elsewhere it has been called the ‘Chung-Lu method’52,53. Second, an estimate can be obtained by fitting a general linear model of the form:

$$\begin{aligned} B_{ik}&= \beta _0 + \beta _1r_i + \beta _2c_k + \epsilon {\text {, or}} \\ B_{ik}&= \beta _0 + \beta _1r_i + \beta _2c_k + \beta _3r_ic_k + \epsilon , \end{aligned}$$

where the \(\beta\)’s are estimated coefficients and \(\epsilon\) is an error term. If the model is treated as a linear regression and the coefficients are estimated using ordinary least squares, then the predicted value of \(B_{ik}\) is chosen for \(p^*_{ik}\), either truncating values outside the required [0, 1] range (linear probability model; LPM) or transforming them into the required range using a linear discriminant model (LDM)54. If the model is treated as a logistic regression and the coefficients are estimated using maximum likelihood, then the predicted probability that \(B_{ik} = 1\) is chosen for \(p^*_{ik}\). In prior work, the logistic regression approach has used a scobit or logit link function, with or without an interaction term (\(\beta _3\))1,18,47. Finally, an estimate can be obtained by entropy maximization methods, including the polytope method (Poly)27,55 or bipartite configuration model (BiCM)3,9,56. In this study, we evaluate the accuracy and speed of these methods for choosing \(p^*_{ik}\) that approximate \(\Pr (B^*_{ik} = 1)\) for \(\mathbf {B^*}~\in ~{\mathscr {B}}^{{\text{FDSM}}}\).

Methods

To evaluate accuracy, we begin by enumerating all the members of a small \({\mathscr {B}}^{\text{FDSM}}\). For example, given an agent degree sequence of [1, 1, 2] and an artifact degree sequence of [1, 1, 2], \({\mathscr {B}}^{\text{FDSM}}\) contains 5 members (see Table 1A). Second, from this complete enumeration, we compute the probabilities we wish \(p^*_{ik}\) to approximate (i.e., \(\Pr (B^*_{ik} = 1)\) for \(\mathbf {B^*}~\in ~{\mathscr {B}}^{{\text{FDSM}}}\), see Table 1B). Third, we compute \(p^*_{ik}\) using each of nine methods (see Table 1C for values obtained using the BiCM method). Finally, we quantify the accuracy with which \(p^*_{ik}\) approximates the desired probabilities using the mean absolute difference for all ik. In the example shown in Table 1, BiCM’s accuracy for these degree sequences is 0.028. That is, on average \(p^*_{ik}\) chosen using BiCM deviates from the desired probabilities by ± 0.028 on average. Because evaluating accuracy in this way requires enumerating all members of \({\mathscr {B}}^{\text{FDSM}}\), it is possible only for short degree sequences that define \({\mathscr {B}}^{\text{FDSM}}\) with small cardinality. We focus on degree sequences ranging in length from 2 to 5, which define 384 unique \({\mathscr {B}}^{\text{FDSM}}\) ranging in cardinality from 4 to 2040.

Table 1 SDSM probabilities given agent and artifact degree sequences [1,1,2].

After identifying each method’s accuracy, we evaluate the computational running time of the four most accurate methods by using them to choose \(p^*_{ik}\) for bipartite graphs defined by up to 1000 agents and up to 1000 artifacts, and thus requiring choosing up to 1,000,000 probabilities.

Results

Figure 1A shows the accuracy of each method’s computation of \(p^*_{ik}\). Each gray line plots the accuracy of each method for a single \({\mathscr {B}}^{\text{FDSM}}\), while the red line and shaded region plots the mean and 95% confidence interval of the accuracy of each method over all 384 \({\mathscr {B}}^{\text{FDSM}}\). We find that choosing \(p^*_{ik}\) using a logistic regression with an interaction term (i.e., Scobit-I and Logit-I) is on average least accurate1,18, while choosing \(p^*_{ik}\) using the two entropy maximization method (BiCM and Poly) yield numerically equivalent results, which were on average most accurate3,27.

Figure 1B shows the number of seconds required to compute \(p^*_{ik}\) using a 2.3 GhZ Intel i7 processor; lines illustrate the mean running time, while the shaded regions show the 95% confidence interval. Among the two most accurate methods, BiCM is several orders of magnitude faster than Polytope. When computing more than \(10^4\) probabilities, BiCM is also faster than the two slightly less accurate Logit and LDM methods. In the largest case we evaluated, computing \(10^6\) probabilities, BiCM took only about 0.026 seconds. Therefore, we use BiCM for choosing \(p^*_{ik}\) when extracting SDSM backbones in the remaining studies because it is both the most accurate and fastest.

Figure 1
figure 1

(A) Accuracy and (B) speed computing \(p^*_{ik}\) using different methods. Lines show means, while shaded regions show 95% confidence intervals.

Study 2: Statistical power of SDSM

Ensemble backbone models require the specification of a statistical significance level \(\alpha\), which determines how uncommonly large an observed edge weight \(P_{ij}\) must be when compared to edge weights \(P^*_{ij}\) arising from an ensemble in order for a corresponding edge to be included in the backbone. For a given model, smaller values of \(\alpha\) represent more stringent criteria for retaining edges, and therefore yield sparser backbones. Although FDSM and SDSM define their respective ensembles by constraining both agent and artifact degree sequences, and thus aim to yield similar backbones, a given \(\alpha\) does not necessarily represent the same level of stringency in these two models. Because the SDSM allows variation in the degree sequences of \(\mathbf {B^*}~\in ~{\mathscr {B}}^{\text {SDSM}}\), the distribution of \(P^*_{ij}\) is wider23,28. These wider distributions mean that the SDSM provides a more conservative test of edge weight significance than FDSM, or alternatively the SDSM has less statistical power to detect significant edges than FDSM.

A concrete example serves to illustrate this difference. In economic geography, it is common to study the world city network using a bipartite projection where two cities are linked to the extent that firms maintain locations in both cities. The Globalization and World Cities (GaWC) dataset has been widely-used in this context, and takes the form of a bipartite network recording the presence or absence of 100 firms (artifacts) in 196 cities (agents) in the year 20007,28. In this bipartite network, the agent degrees are right-tailed because most cities contain only a few firms, while a few cities such as New York contain many. Likewise, the artifact degrees are also right tailed because most firms maintain locations in only a few cities, while a few firms such as the accounting firm KPMG maintain locations in many.

Figure 2A illustrates the distribution of the Milan-Paris edge weight in projections arising from \({\mathscr {B}}^{\text{FDSM}}\) and \({\mathscr {B}}^{\text {SDSM}}\) of which the observed bipartite network is a member (i.e., the random variable \(P^*_{ij})\). These distributions allow a researcher to decide whether Milan and Paris’s observed number of co-located firms is significantly large, and therefore whether Milan and Paris should be connected in a world city network backbone. The SDSM distribution is wider than the FDSM distribution23,28, which has implications for whether the Milan-Paris edge will be included in a backbone extracted at a given significance level using each model. In the observed data, there are 26 firms co-located in Milan and Paris (i.e., \(P_{ij} = 26\)). The probability of observing the same or larger edge weight in projections from the FDSM ensemble is 0.0033, which is less than \(\frac{0.05}{2}\), and therefore a Milan-Paris edge is deemed significant by the FDSM and is included in the FDSM backbone extracted at \(\alpha = 0.05\). In contrast, the probability of observing the same or larger edge weight in projections from the SDSM ensemble is 0.0275, which is not less than \(\frac{0.05}{2}\), and therefore a Milan-Paris edge is not deemed significant by the SDSM and is not included in the SDSM backbone extracted at \(\alpha = 0.05\). For a given level of significance \(\alpha\), this difference in statistical power leads the SDSM backbone to be sparser than the FDSM backbone (density \(= 0.004\) vs. 0.012), and means that these two backbones are dissimilar (Jaccard \(= 0.36\)).

In this study, we investigate SDSM’s statistical power relative to FDSM, and specifically whether extracting an SDSM backbone using a more liberal (i.e., larger) \(\alpha\) makes it more similar to an FDSM backbone extracted at \(\alpha = 0.05\).

Methods

To evaluate SDSM’s statistical power and the effect of significance levels on the similarity of SDSM and FDSM backbones, we first extracted the FDSM backbone from the GaWC bipartite network at \(\alpha = 0.05\). We then extracted SDSM backbones from the GaWC bipartite network at \(0.01 \le \alpha \le 0.3\) in 0.001 increments, each time computing the Jaccard index (J) to measure the similarity between the SDSM and FDSM backbones. After comparing SDSM and FDSM backbones extracted from the empirical GaWC bipartite network, we repeat this process using 100 synthetic bipartite networks with the same dimensions (\(196 \times 100\)), density (0.08) and right-tailed agent and artifact degree distributions.

Results

The green line in Fig. 2B shows the Jaccard similarity between an FDSM backbone extracted from the empirical GaWC network at \(\alpha = 0.05\) and SDSM backbones extracted at the significance levels shown on the x-axis. We find that an SDSM backbone achieves its maximum similarity to the FDSM backbone (\(J = 0.81\)) when it is extracted using the more liberal significance level of \(\alpha = 0.12\). Returning to the example in Fig. 2A, using this more liberal significance level would result in the Milan-Paris edge being deemed significant and included in the SDSM backbone because its SDSM p value \(0.0275 < \frac{0.12}{2}\). Because this more liberal significance level results in the inclusion of additional edges, the new SDSM backbone extracted at \(\alpha = 0.12\) has a density (0.01), which is closer to that of the FDSM backbone extracted at \(\alpha = 0.05\) (0.012).

Figure 2
figure 2

Statistical power of SDSM. (A) Distribution of weights for the Paris-Milan edge in projections derived from FDSM and SDSM ensembles. (B) Similarity of an FDSM backbone extracted at \(\alpha = 0.05\) to SDSM backbones extracted at various \(\alpha\) from an empirical bipartite network (green line) and from 100 synthetic bipartite networks (purple line = mean, purple region \(= 10{{\mathrm{th}}}\)\(90{\mathrm{th}}\) percentile).

The purple line in Fig. 2B shows the mean Jaccard similarity between an FDSM backbone extracted using \(\alpha = 0.05\) and SDSM backbones extracted using \(0.01 \le \alpha \le 0.3\) from 100 bipartite networks generated to resemble the empirical GaWC network. The shaded purple region shows the 10th and 90th percentile of Jaccard similarities of these backbones. We find that these synthetic networks behave similarly to the empirical network. Specifically, SDSM and FDSM backbones extracted from a low-density \(196 \times 100\) bipartite network with right-tailed degree distributions achieve a maximum similarity of \(0.49< J < 0.76\) when the FDSM backbone is extracted using \(\alpha = 0.05\) and the SDSM backbone is extracted using \(\alpha = 0.14\). This is promising because it suggests that, given the characteristics of an empirical bipartite network, it may be possible to select a significance level for extracting a computationally-efficient SDSM backbone that closely resembles a computationally-infeasible FDSM backbone.

Study 3: Backbone similarity under varying degree distributions

Agent and artifact degree distributions are a key feature of a bipartite network, and are known to have implications for bipartite projections15,27,28. The FDSM is particularly appealing because it allows decisions about the significance of edges in a projection to be conditioned on both bipartite degree sequences, thereby taking into account these important features. However, because the computational requirements of the FDSM make it impractical for extracting the backbone from most bipartite projections, it is often necessary to use a different backbone model. In this study, we evaluate the similarity of an FDSM backbone and backbones extracted using more computationally efficient models. We perform this comparison for backbones extracted from bipartite networks characterized by five types of degree distributions: right-tailed, left-tailed, normal, constant, and uniform.

For the sake of concreteness, in this section we use the example of a bipartite network in which authors (agents) are linked to the papers they have written (artifacts). The projection of such a network yields a co-authorship network in which the edge weight between a pair of authors indicates their number of co-authored papers10. These edge weight values will depend heavily on the distribution of papers written by authors (i.e., the agent degree sequence), and on the distribution of authors on each paper (i.e., the artifact degree sequence). Different degree distributions describe different kinds of scholarly environments as shown in Table 2. The choice of a backbone model affects whether these distributions are considered, and in this example affects whether decisions about the significance of two authors’ number of co-authored papers consider the scholarly environment. The FDSM compares their observed number of co-authored papers to the number that might be observed in alternative realizations of the same environment, while other backbone models relax the extent to which the environment is held constant.

Table 2 Bipartite degree distributions, with examples in the context of a scholarly authorship bipartite network.

Methods

We evaluate similarities among the backbones extracted using different models by comparing backbones extracted from synthetic \(100 \times 100\) bipartite networks with a density of 0.1, and with a combination of agent and artifact degree distributions shown in Table 2. Following our example, these synthetic bipartite networks might represent a college of 100 faculty who collectively wrote 100 papers, in a particular type of scholarly environment where each individual had a 10% chance of being an author on each paper. After generating a bipartite network with a given size, density, and degree distributions, we extract five different backbones from the generated bipartite network, using the fixed fill model, fixed row model, fixed column model, stochastic degree sequence model, and fixed degree sequence model; in all cases we use \(\alpha = 0.05\). We compute the similarity of the first four backbones to the FDSM backbone using a Jaccard index, repeating this process 100 times for each of the 25 possible combinations of agent and artifact degree distributions.

Results

The heatmaps in Fig. 3 illustrate the similarity between an FDSM backbone and a backbone extracted using an alternative model. The rows of each heat map correspond to different agent degree distributions, and the columns correspond to different artifact degree distributions, in the synthetic bipartite networks from which the backbones were extracted. The lightest patches identify conditions under which a given backbone model yields a backbone that is similar to what would be obtained using the computationally costly FDSM, while darker patches identify conditions under which these two backbones differ. We find that when agent degrees are constant (i.e., every agent has the same degree) and artifact degrees are constant or left-tailed, all backbone models yield the same backbone as FDSM (Mean \(J = 1\)). However, beyond this special case, which is likely to be rare in empirical data, similarity to FDSM-extracted backbones varies.

Figure 3
figure 3

Jaccard similarity of a backbone extracted at \(\alpha = 0.05\) using the Fixed Degree Sequence Model and a backbone extracted using (A) the Fixed Fill Model, (B) Fixed Row Model, (C) Fixed Column Model, (D) Stochastic Degree Sequence Model. Each cell represents the mean over 100 instances of a \(100 \times 100\) bipartite network with given agent and artifact degree distributions.

As expected, the similarity of backbones extracted using FRM and FDSM depends primarily on the distribution of artifact degrees, not agent degrees (see Fig. 3B). For example, for any agent degree distribution, these two models yield very different backbones when artifact degrees follow a right-tailed distribution (Mean \(J = 0.186\)), but very similar backbones when artifact degrees follow a normal distribution (Mean \(J = 0.863\)). This occurs because both models exactly control for agent degrees, however FDSM also controls for artifact degrees, while FRM does not.

A similar but rotated pattern emerges when considering the FCM: the similarity of backbones extracted using FCM and FDSM depends primarily on the distribution of agent degrees, not artifact degrees (see Fig. 3C). For any artifact degree distribution, these two models yield very different backbones when agent degrees follow a right-tailed or uniform (Mean \(J = 0.084\)) distribution , but more similar backbones when agent degrees follow a left-tailed distribution or are constant (Mean \(J = 0.617\)). This occurs because both models exactly control for artifact degrees, however FDSM also controls for agent degrees, while FRM does not. However, there is a notable exception to this general pattern: when artifact degrees follow a uniform distribution, FCM and FDSM always yield different backbones (Mean \(J = 0.151\)).

The conditions under which the FFM yields FDSM-similar backbones occur at the intersection of the conditions under which the FRM and FCM both yield FDSM-like backbones (see Fig. 3A). When artifact degrees follow a right-tailed distribution or the agent degrees follow a right-tailed or uniform distribution, then FFM and FDSM backbones differ (Mean \(J = 0.1\)). In contrast, for other combinations of degree distributions, FFM and FDSM backbones are more similar (Mean \(J = 0.724\)).

Finally, as expected based on the findings from study 2, we observe that the SDSM generally yields different backbones than FDSM when both are extracted at \(\alpha = 0.05\) (see Fig. 3D). Specifically, except in the narrow case where agent degrees are constant and artifact degrees are constant or left-tailed (Mean \(J = 1\)), SDSM and FDSM backbones exhibit only modest similarity (Mean \(J = 0.314\)). This lack of similarity occurs because SDSM offers a less statistically powerful (or more conservative) test of edges statistical significance than FDSM, and therefore retains fewer edges in the backbone. However, findings from study 2 also suggested that careful selection of the significance level used for extracting an SDSM backbone can yield results more similar to FDSM.

To explore this possibility, we expanded the analysis reported in Fig. 3D by extracting SDSM backbones at different significance levels \(\alpha\). We find that when a suitably more liberal (i.e., larger) significance level \(\alpha\) is used to extract an SDSM backbone, the resulting SDSM backbone is very similar to an FDSM backbone extracted at \(\alpha = 0.05\) (see Fig. 4A). Specifically, for backbones extracted from bipartite networks with any agent or artifact degree distributions, these two backbones tend to be very similar (Mean \(J = 0.865\)). This suggests that in principle the fast SDSM can be used to obtain a close approximation of a computationally-infeasible FDSM backbone from any bipartite network.

In practice, using SDSM to obtain an FDSM-like backbone requires selecting an \(\alpha\) value for the SDSM that corresponds to \(\alpha = 0.05\) in the FDSM. We observe that there are three distinct values of such an ‘optimal’ \(\alpha\) that depend on agent and artifact degree distributions (see Fig. 4B). First, when agent degrees are constant, a value only slightly higher than 0.05 (Mean \(= 0.062\), SD \(= 0.021\)) achieves the best approximation of an FDSM backbone. Second, when artifact degrees are constant, a value roughly double (Mean \(= 0.09\), SD \(= 0.022\)) achieves the best approximation of an FDSM backbone. Finally, when neither agent nor artifact degrees are constant, which is likely in most empirical bipartite networks, a value roughly 2.5 times larger (Mean \(= 0.13\), SD \(= 0.014\)) achieves the best approximation of an FDSM backbone. Although further work is needed to facilitate the a priori selection of an \(\alpha\) that allows an SDSM backbone to closely approximate an \(\hbox {FDSM}_{\alpha = 0.05}\) backbone, these results suggest that under the most common circumstances (i.e., when there is variation in degrees) \(\alpha \approx 0.13\) may be appropriate.

Figure 4
figure 4

(A) Given agent and artifact degree distributions, there exists a statistical significance level \(\alpha\) that maximizes the similarity between an SDSM backbone extracted at this level and an FDSM backbone extracted at \(\alpha = 0.05\), and (B) when used yields an SDSM backbone that is very similar to the corresponding FDSM backbone.

Study 4: Recovery of community structure

Studies 1–3 examine the backbones extracted from random bipartite networks; however, empirical bipartite networks are not random. Frequently they contain a block structure that implies a particular community structure in the bipartite projection. In this study, we evaluate the extent to which backbones extracted using different models reflect a known community structure that is encoded in the bipartite data from which they are extracted57. Recent work has shown that FDSM, FRM, SDSM, and BiPCM (a canonical variant of FRM) yield backbones with similar communities structures23. Other work has shown that SDSM and FDSM backbones extracted from a bipartite network representing bill co-sponsorship in the 114th session of the US Senate more clearly captured the hypothesized partisan community structure than an FRM backbone27. We build on this prior work using synthetic data that is constructed to contain a ground truth communities, which allows us to evaluate backbone models’ ability to recover true communities, and not simply similar or hypothesized ones.

Methods

We investigate the ability for backbones to recover a known community structure in three steps. First, we simulate a \(200 \times 1000\) bipartite network with a density of 0.1 and right-tailed agent and artifact degree distributions. We focus on a bipartite network with more artifacts than agents to ensure that these data contain sufficient information to encode potential community memberships. We focus on a bipartite network with right-tailed degree distributions because they are common in many empirical unipartite58 and bipartite networks1,11,28. This synthetic bipartite network could represent a legislative body composed of 200 legislators casting votes on 1000 bills, where any given legislator had a 10% chance of voting in favor of any given bill. The right-tailed degree distributions capture the fact that most legislators vote in favor of only a few bills, and that most bills receive the support of only a few legislators, which is typical of legislative bodies. The backbone of a projection of such a bipartite network would represent a network of collaboration or ideological alignment among legislators1.

Second, we incorporate evidence of communities in this bipartite network by randomly assigning each agent and each artifact to one of two groups. We then perform checkerboard swaps, which preserve the degree distributions, until a given fraction of edges W are within-group, connecting an agent and artifact from the same group59. Figure 5A provides graphical depictions of the matrices describing synthetic bipartite networks at two values of W. In each plot, the rows represent agents assigned to group A or B, the columns represent artifacts assigned to group A or B, and a cell is shaded black if the row agent is connected to the column artifact. When \(W = 0.5\), agents in a given group are equally likely to associate with artifacts in either group, placing \(\approx 0.5\) of the edges (i.e., shaded cells) in the diagonal blocks and \(\approx 0.5\) of the edges in the off-diagonal blocks. In contrast, when \(W = 0.8\), agents in a given group are much more likely to associate with artifacts from their own group than artifacts in the other group, placing \(\approx 0.8\) of the edges in the diagonal blocks and \(\approx 0.2\) of the edges in the off-diagonal blocks. Returning to our example, the groups could represent political parties: each legislator belongs to one of two parties (i.e., there are conservative and liberal legislators), and each bill advances the agenda of one of these parties (i.e., there are conservative and liberal bills). When \(W = 0.5\), a conservative legislator is equally likely to vote for conservative and liberal bills, while when \(W = 0.8\), a conservative legislator is four-times more likely to vote for a conservative bill than a liberal bill.

Finally, we extract a backbone from the bipartite network using a given model and compute the backbone’s modularity Q with respect to the agents’ group assignments60. If a backbone model is able to recover the community structure from evidence in the bipartite network, then we expect a positive association between W and Q. In the legislative example, if legislators are bipartisan in their voting patterns (i.e., \(W = 0.5\)), then legislators should not be clustered by party in the backbone (i.e., \(Q \approx 0\)). In contrast, if legislators are strongly partisan in their voting patterns (i.e., \(W = 0.8\)), then legislators should be clustered by party in the backbone (i.e., \(Q \gg 0\)).

We repeat these three steps 10 times for \(0.5 \le W \le 0.8\) in 0.05 increments. When evaluating the SDSM backbone, we consider both a backbone extracted using the conventional significance level of \(\alpha = 0.05\) and one extracted at the more liberal \(\alpha = 0.13\), which study 3 suggests yields a backbone similar to FDSM.

Figure 5
figure 5

(A) Synthetic bipartite networks with varying levels of block structure, from which (B) backbones extracted using different models exhibit varying modularity.

Results

Figure 5B shows the modularity (y-axis; with respect to known community memberships) of backbones extracted using different models from bipartite networks containing different fractions of within-community edges (x-axis). Solid lines illustrate the mean modularity across 10 replications, while the shaded regions illustrate 95% confidence intervals. All six lines increase monotonically, confirming that all backbone models yield backbones that can recover a known community structure; however, there is notable variation among the models. As evidence of community structure grows stronger in the bipartite network, the modularity of backbones extracted using the FFM and FCM slowly increase, but even when the evidence of such a structure is quite strong (i.e., when \(W = 0.8\)) they only achieve average values of \(Q = 0.15\) and 0.18, respectively. Backbones extracted using the FRM display a similar pattern, but achieve a statistically significantly higher average modularity (\(Q = 0.39\)) value when W is large.

Backbones extracted using FDSM and SDSM yield modularity values that are statistically significantly larger than those obtained from FFM, FRM, or FCM backbones, but that are not statistically significantly different from each other. That is, these backbone models are indistinguishable in their ability to recover the known community structure, and do so very well. As evidence of a community structure grows stronger in the bipartite network, the modularity of backbones extracted using these models rapidly increases. When the evidence of community structure is strong (i.e., when \(W = 0.8)\), these backbones have very high modularity (mean \(Q = 0.49\)). However, even when there is only modest evidence of community structure in the bipartite network (e.g., when \(W = 0.65\)), these backbones are still able to identify the community structure and have a distinctively high modularity (mean \(Q = 0.37\)).

These findings suggest that although all backbone models can yield backbones that recover a known community structure, SDSM and FDSM backbones are able to detect this structure more clearly and from a weaker signal.

Discussion

Bipartite networks can be used to represent a wide range of phenomena in the social and natural worlds including interspecies competition, global trade, scientific advances, and legislative deliberation. Likewise, projections of bipartite networks, which take the form of co-occurrence networks, can be useful for inferring unipartite networks whose edges would otherwise be difficult to measure directly. The fixed degree sequence model (FDSM) offers an appealing null model for making such inferences, but its computational complexity often makes it impractical. Several computationally simpler alternatives to FDSM have been proposed, including the fixed fill model (FFM) fixed row model (FRM), fixed column model (FCM), and stochastic degree sequence model (SDSM). In this paper we have systematically compared FDSM to each of these alternatives to evaluate their aspects of their accuracy, speed, statistical power, backbone similarity, and ability to recover a known community structure.

In study 1, we examined several methods for choosing the probabilities used by the stochastic degree sequence model (SDSM), finding that the bipartite configuration model (BiCM) is both the fastest and most accurate. In study 2, we examined the statistical power of the SDSM relative to the fixed degree sequence model (FDSM), finding that the SDSM can be viewed as a statistically less powerful (or more conservative) variant of the FDSM. In study 3, we examined the similarity of an FDSM-extracted backbone to backbones extracted using other models, finding that the SDSM and FDSM extract very similar backbones from bipartite networks with a wide range of possible degree distributions when an appropriate significance level \(\alpha\) is chosen. Finally, in study 4, we examined the ability for backbones extracted using different models to recover a known community structure, finding that although all models yield a backbone that recovers the structure, SDSM and FDSM can detect a community structure more clearly and from a weaker signal.

Based on these findings, and with the goal of offering researchers some guidance in extracting the backbones of bipartite projections, we offer three recommendations. First, we recommend the stochastic degree sequence model (SDSM) for extracting the backbones of bipartite projections because it is fast, controls for both agent and artifact degree sequences, and yields modular backbones when the bipartite data contains even modest evidence of within-community clustering. Second, when the SDSM is used, we recommend that the cell-filling probabilities \(p^*_{ik}\) be chosen using the Bipartite Configuration Model (BiCM) because it is faster and more accurate than any other currently available method. Third, when an FDSM backbone extracted at the \(\alpha = 0.05\) significance level is desired but computationally infeasible, we recommend extracting an SDSM backbone at the \(\alpha = 0.13\) significance level, which we observe is very similar when there is variation in the agent and artifact degree sequences. The models and options necessary to adopt these recommendations are implemented in the backbone package for R27.

These findings and recommendations must be viewed in light of the fact that, due to the computational requirements of the FDSM and of extracting a large number of backbones across the four studies, these studies have relied on small synthetic bipartite networks ranging in size from \(3 \times 3\) (study 1) to \(200 \times 1000\) (study 4). However, in practice bipartite networks may be several orders of magnitude larger. For example, a bipartite network used to infer collaborations in the US House of Representatives includes 435 agents (representatives) and over 6000 artifacts (bills)1,55, while a bipartite network used to infer movie recommendations includes 17,770 agents (films) and nearly 500,000 artifacts (viewers)21. Future research should explore whether these findings extend to backbones extracted from such large bipartite networks. Limitations of existing backbone models also point to directions for future research. First, using the FDSM will generally be computationally infeasible in practice because the distribution of \(P^*_{ij}\) arising from \({\mathscr {B}}^{{\text{FDSM}}}\) must be estimated via numerical simulation. Identifying this distribution’s probability mass function, which is known for the other ensembles (see Supplementary Text S1), would facilitate the use of this otherwise attractive model. Second, all the ensemble models we have considered impose constraints on the degree sequences, but other types of constraints may also be useful. For example, in some contexts it may be necessary to constrain all members of an ensemble to contain a 0 in a particular cell (e.g., to represent that an author was not alive to co-author a paper, or a legislator was not present to co-sponsor a bill)61 These limitations and future directions notwithstanding, the results presented above provide a starting point for further development of backbone models, and provide applied researchers with some practical guidance on model selection.