A Bootstrap Method for Goodness of Fit and Model Selection with a Single Observed Network

Network models are applied in numerous domains where data arise from systems of interactions among pairs of actors. Both statistical and mechanistic network models are increasingly capable of capturing various dependencies among these actors. Yet, these dependencies pose statistical challenges for analyzing such data, especially when the data set comprises only a single observation of one network, often leading to intractable likelihoods regardless of the modeling paradigm and limiting the application of existing statistical methods for networks. We explore a subsampling bootstrap procedure to serve as the basis for goodness of fit and model selection with a single observed network that circumvents the intractability of such likelihoods. Our approach is based on flexible resampling distributions formed from the single observed network, allowing for more nuanced and higher dimensional comparisons than point estimates of quantities of interest. We include worked examples for model selection, with simulation, and assessment of goodness of fit, with duplication-divergence model fits for yeast (S.cerevisiae) protein-protein interaction data from the literature. The proposed approach produces a flexible resampling distribution that can be based on any network statistics of one’s choosing and can be employed for both statistical and mechanistic network models.


Materials and Methods
Subsampling scheme and resampling distributions. Each subsample of the bootstrap subsampling scheme of Bhattacharyya et al. 28 consists of a uniform node-wise subsample of all the nodes in the observed network G o (with node set V o and edge set E o ) and their induced subgraph, i.e., the nodes in the subsample and all edges between these nodes. For each subsample, one may compute any set of statistics to form a resampling distribution of these statistics. Although the subsamples will not have the same properties as the full network or a network of the same size as the subsample drawn from the true data generating mechanism 32 , they will still retain features of the true data generating mechanism since the subsampling does not change any between-edge or between-node dependence that influenced the formation of the network, despite adding a degree of "missingness" by removing elements correlated with those in the subsample. In comparison, should one generate draws from a particular fitted model to form a resampling distribution, the between-edge and between-node dependencies will be those specified by the fitted model; in this case, the generated networks will only be representative of the true data generating mechanism if the fitted model is the true model, which is a strong assumption and usually not verifiable in practice.
Because each subsample only consists of a subset of V o and E o , each subsample will be missing elements that are correlated with those that are included in the subsample. As a result, this must be taken into account when any comparisons are made with a null/candidate model M c . One may be tempted to compare subsamples of G o with draws from M c of the same size as the subsamples. This should however be avoided since there is a degree of "missingness" in the subsamples of G o that are not present in such draws from M c . Even if M c was the true model, this disparity could make the two behave differently. Instead, one should generate draws from M c of the same size as G o and then apply the same subsampling scheme to these draws. This ensures that both the subsamples of G o and those of M c have the same amount of "missingness" and are comparable. Should M c be representative of the true data generating mechanism, then the behavior of the two subsamples their corresponding resampling distributions of computed statistics should be similar. The representativeness of the subsamples from G o , as well as their comparability with the subsamples from M c , form the basis of our procedure. Even though we only consider uniform subsampling, the subsampling method is flexible and can be chosen to be representative of sampling in practice or for statistical and computational ease. The proposed bootstrap subsampling procedure is summarized in Fig. 1.
In contrast to existing methods that also use draws from the fitted model to assess goodness of fit, this approach can lead to a richer comparison. For existing methods, after choosing the statistics for assessing goodness of fit, the given statistics are computed for G o and for a large number of draws from M c . The point estimate of these statistics for G o are then placed within the distribution of said statistics of the draws from M c . Goodness of fit is assessed by the location of the point estimate from G o within the draws from M c . This can be done visually or by quantifying the proportion of the draws with values of the statistics deemed more extreme. With our approach, the two resampling distributions can be compared in many ways, such as their location, spread, and shape. In addition, one can quantify the distance between the two distributions using, for example, the Kolmogorov-Smirnov (KS) statistic (defined for discrete distributions also 33 ) or the Kullback-Leibler divergence to order the fit of different candidate models.
One point of interest and emphasis is that the subsamples from G o are all from a single network, while the subsamples from M c are subsamples of independent network realizations drawn from M c instead of subsamples from a single network drawn from M c . This scheme is proposed due to potential instability of single generated networks and the corresponding subsamples, since there can be a great deal of instability in the generated networks depending on the model, including the seed network used to grow networks specified by mechanistic models. In addition, the disparity between the two types of subsamples may depend on the proportion of the nodes in each subsample. Both of these points are important to the performance of the procedure and are further examined in the next two sections.
Stability under sampling. When sampling from the candidate model, one needs to take care to ensure that the draws from M c behave like the observed network even if the candidate model is the true model or an accurate model, and in turn, the subsamples of these draws behave like the subsamples of the observed network. In the worst case, such draws can look nothing like the observed network despite using a good candidate model, e.g., the draws could have highly varying degree distributions that look nothing like that of the observed network. This issue can be more prominently demonstrated in the context of some mechanistic network models.
Networks generated from mechanistic models are often grown from a small (relative to the final size of the network) seed network according to the model's generative mechanism until some stopping condition is reached, e.g., attaining a requisite number of nodes. There is research showing that the original seed network has no influence on the degree distribution in the limit, i.e., for a large number of nodes, for certain types of mechanistic network models 34,35 . While some data sets, such as online social networks, may be sufficiently large to reach this asymptotic regime, others, such as protein-protein interaction networks, may not be. Thus, when generating draws from candidate models for analysis of smaller networks, the original seed network can potentially have a great deal of influence. The seed network maybe as simple as a single node, or a complete graph of only three nodes, up to bigger complete graphs, or something more elaborate with more than one component. We briefly examine the effect of the seed network on the stability of the degree distribution of networks generated from the Erdös-Rènyi and duplication-divergence models, which are frequently used to model protein-protein interaction networks.
Erdös-rènyi model. The Erdös-Rènyi (ER) model 36 is a simple but rather unique model in that it can be framed as both a mechanistic and a statistical model. In the ER model, the number of nodes n is fixed, and there are two variants of the model that determine how the edges are placed. In the first variant, the G(n,p) model, each of the C(n, 2), n choose 2, possible edges are independent and are included in the graph with probability p, so the number of edges in the graph is binomial. In the other variant, the G(n,m) model, the number of edges in the graph m is also fixed. In this case, the random graph has a uniform distribution over all C(C(n, 2),m) possible graphs with n nodes and m edges.
The first variant can be easily framed as a mechanistic model. The network generation starts with a seed network of a single node. Then at each stage, a new node is added, and an edge between the new node and each existing node is added with probability p. This is done until there are n nodes in the network. Rather than starting with a seed network of a single node, networks can be generated according to the generative mechanism of the G(n,p) model initialized with a different seed network. Here, we generated G(n = 1000, p = 0.1) networks according to these rules, with complete graphs of 5, 8, 10, 20, 50, 100 nodes as the seed networks. We generated 50 networks of each size of the seed to evaluate the influence of the seed network on the stability of the degree distribution of the fully grown network.
The degree distribution of the 50 generated graphs at each size of the seed network are plotted in Fig. 2. While the shape of the degree distribution understandably changes as the complete graph used as the seed network gets bigger, the size of the seed network seems to have little influence on the stability of the degree distribution. All 50 networks, for each size of the seed network, have very similar degree distributions. The width of the "band" of the 50 distributions stacked on top of one another also looks to be mostly unchanging. This seems to indicate that the variability in the degree distribution is largely unaffected by the size of the seed network.
Duplication-divergence models. Duplication-divergence models are a popular class of models used for protein-protein interaction networks. Some examples include the duplication-mutation-complementation www.nature.com/scientificreports www.nature.com/scientificreports/ (DMC) 18 and duplication-mutation-random mutation models (DMR) 17,37 . Given a seed network, both DMC and DMR models grow the network according to their respective generative mechanisms until the requisite number of nodes, n, is reached. In both the DMC and DMR models, a new node is first added at the beginning of each step in network generation. An existing node is chosen uniformly at random for duplication, and an edge is then added between the new node and each neighbor of the chosen node. After this, the two models diverge. For DMC, for each neighbor of the chosen node, one of the edge between the chosen node and the neighbor or the edge between the new node and the neighbor is randomly chosen and then removed with probability q mod . The step is concluded by adding an edge between the chosen node and the new node with probability q con . For DMR, each edge connected to the new node is removed independently with probability q del . The step concludes by adding an edge between the new node and any existing node at the start of step t with probability q new /n(t), where n(t) is the number of nodes in the network at the start of step t.
To assess the stability of the degree distribution, we generated 50 network realizations of 1000, 3000, 5000, 7000, 10000 nodes from both models with the seed network set as a complete graph with 5, 8, 10, 20, 50, 100 nodes. The parameters of the DMC model were set as q mod = 0.2 and q con = 0.1, while those of the DMR model were q del = 0.2 and q new = 0.1. The degree distribution for the 50 generated DMC networks for a subset of all combinations of the size of the seed network and the total number of nodes are plotted in Fig. 3; those for all combinations for both DMC and DMR models can be found in the Supplementary Information (Figs S1 and S2). A general trend in the plots is that the total number of nodes in the network has little to no influence on the stability of the degree distribution, while the size of the seed network has a great deal of influence, with stability increasing sharply with the size of the seed network, up to 50. For smaller seed networks (5 nodes), the shape and spread of the degree distributions vary wildly even for larger networks. For a modest increase in the size of the www.nature.com/scientificreports www.nature.com/scientificreports/ seed network (10 nodes), the shape and the spread of the degree distributions become more similar. Finally, for larger seed networks (20 or 50), the shape and spread of the degree distributions are quite uniform, and the width of the "band" of the 50 degree distributions stacked on top of one another also decreases. Clearly, the variability of the degree distribution depends greatly on the size of the seed network.
One important difference between the ER and DMC/DMR models is the dependence on exisiting edges on the formation of new ones. The instability in the degree distribution of networks generated from DMC/DMR models with small seed networks can be attributed to this dependence. While these two examples show the influence the seed network can potentially have in generating networks of modest size with mechanistic models, it does beg the question of how one selects a meaningful seed that leads to stable sampling while mimicking the behavior of the observed network in a principled way. Hypothetically, if the observed network is indeed generated from an ER model and assuming the seed network and the parameter values are well chosen, then the generated networks should mostly appear similar to the observed network due to the low variability regardless of the size of the seed.
On the other hand, should the observed network come from a DMC/DMR model and assuming well chosen parameter values, as well as an appropriate but small seed network, then the generated networks are unlikely to appear similar to the observed network due to the high variability with small seeds as demonstrated.
portion of nodes to include in subsamples. The portion of nodes included in each subsample should not be so small such that no characteristics of the observed network or candidate models are retained, but also not so big such that the subsamples contain little variability. In one extreme, each subsample consists of just one node so that there is no structure within the induced subgraph, and in the other extreme, each subsample is simply the entire network. While the latter is of little concern when taking subsamples from independent draws from candidate models, it leaves no variability in the subsamples from a single observed network such that any resulting resampling distribution would simply be a point mass. We investigate what is an appropriate portion of nodes to include in each subsample through a detailed example with one particular model. The details can be found in the Supplementary Information.
In our example, we define the criterion for performance in terms of the expectation of the KS statistic (smaller values are better) between F 1 , the resampling distribution from the subsamples of a single network drawn from candidate model M c , and F c , the resampling distribution from subsamples of several independent networks drawn from M c , where each subsample comes from a different independent draw. This quantity is a measure of how closely F o , the resampling distribution from the subsamples of the observed network, matches F c when the observed network is truly generated by M c . If the KS statistic is small, discrepency between F o and F c will be small if the model is correct. Additionally, this quantity being small implies that there is not much difference between using F 1 or F c for comparison with F o , thus we would be better off in electing for the stability of F c . Note that the computation time required for F c is greater than that for F 1 . Although not completely generalizable, our example suggests to keep the portion of nodes in the subsample low (<30% in this example) as long as enough features of the models can be retained.
proposed use cases. There are a variety of statistical procedures that can take advantage of this sampling scheme, with a few of them detailed below. Before proposing the general framework for a few typical statistical procedures via the bootstrap subsampling procedure, we define the following notation for the rest of the section. The observed network will be referred to as G o with B o subsamples and corresponding induced subgraphs  www.nature.com/scientificreports www.nature.com/scientificreports/ One distinct advantage of the model selection through this bootstrap subsampling procedure is that it gives inherent evidence about uncertainty or confidence in the selected model as well as other candidate models. The proportion of o that are assigned to each model can be seen as evidence in favor of each candidate model, while the proportion of subsamples assigned the model that forms the majority can be seen as confidence in the selected model. With algorithms like random forest, where the decision is based on plurality rule, this aspect of our approach does not add anything new. But with others, such as support vector machine or the Super Learner that are not based on plurality rule, this approach offers a way to quantify uncertainty without the need to alter the learning algorithm itself.
Goodness of fit. To assess the goodness of fit for candidate models M 1 … M c , the procedure is similar to that of model selection. For a set of statistics S for assessing goodness of fit, one computes Rather than training a learning algorithm based on Assessment based on any one of these aspects may however lead to conflicting results, i.e., different models having the best fit depending on which aspect the comparison is based on, and it might be desirable to make comparisons through a more holistic measure. One solution to this is to compute a distance measure, such as the KS statistic or the Kullback-Leibler divergence, between to quantify the fit of model i. This gives a single statistic that takes the entire distribution into account to quantify and to categorically order the fit of each candidate model. The KS test statistic and Kullback-Leibler divergence are typically computed in one dimension and can be used to compare the fit for each statistic individually as is. Instead, should one wish to make a comparison based on all statistics S at the same time, one can look to use generalizations of these statistics [41][42][43] .
Comparison of multiple networks. If multiple networks are observed instead of a single network, and the goal is to assess how similar they are, then one can do so by building a resampling distribution from multiple networks. For the case of two observed networks with a set of statistics S for comparison and observed networks G o1 and G o2 , one can compute Should there be more than two observed networks for comparison, then the distance measure statistics can once again be used to quantify all pairwise relative similarities between the observed networks.

Results
Simulation and empirical data. We use simulation studies as well as data from an empirical network to illustrate the use of the bootstrap subsampling procedure in some of the scenarios described in the previous section. The simulated data and all code can be found under the Supplementary Information, while the protein-protein interaction data can be downloaded from the database of interacting proteins (DIP) 44 website directly.

Model selection.
The simulation studies conducted for model selection consider instances of a variation on the afformentioned G(n,m) model we introduced 40 . This variation generates random graphs with n nodes and m edges just as the G(n,m) model with each edge being added one at a time. At each step in network generation, a pair of unconnected nodes are selected at random, and the probability for adding an edge between the two is determined based on the number of triangles it would close; the edge is then added with the given probability. This is repeated until there are m edges in the network. If the probability for adding an edge is fixed, then this is the G(n,m) model. Instead, we start with a base probability p 0 to add the edge. Should the edge close at least one triangle, the probability increases by p 1 . Should multiple triangles be closed by the edge, then the probability further increases by p Δ for each additional triangle closed.
In the simulation, we select between two instances of this model, both having p 0 = 0.3 and p 1 = 0.1. The difference comes in p Δ , with p Δ = 0 for model 1, while p Δ varies over 0.05, 0.03, 0.01, 0.005 for model 2. For a given choice of n and m, as p Δ decreases and gets closer to 0, the difference between the two models becomes more difficult to detect. The generated networks from both models consist of 100 nodes with edge count varying over www.nature.com/scientificreports www.nature.com/scientificreports/ 100, 500, 1000, 2000. This gives a total of 20 comparisons between the models, one for each combination of values of p Δ and m. For a given set of parameter values, the difference between the two models should be easier to detect as edge count increases, since the difference due to p Δ has more opportunities to manifest itself. The training data consists of a single subsample of 80 nodes for each of 10000 draws from each model ( The test data consists of 1000 draws from each model (G o ), while the model selection is based on 100 subsamples of 80 nodes from each draw ( ). Although 100 nodes seems few, it is already large enough for a network to give rise to a very large resampling distribution. Additionally, despite the simplicity of the model we are using, 100 nodes is large enough for the likelihood function to be intractable.
The model selection is through the Super Learner [38][39][40] , with support vector machine (ν-classification with ν = 0.5, radial kernel), random forest (N tree = 1000, min terminal node size = 1), and k-nearest neighbors (k = 10) as candidate algorithms, and average clustering coefficient, triangle count, and the three quartiles of the degree distribution as predictors. Note the parameters for the candidate algorithms are in parentheses. These statistics were chosen as predictors since the difference in p Δ directly affects formation of triangles, while the other statistics are influenced strongly by triangles. For each of the 100   Fig. 4 and Table 1. Table 1 contains the proportion of test networks whose model was correctly classified by the Super Learner at each combination of p Δ and edge count. Unsurprisingly, the proportion decreases as p Δ decreases for a fixed edge count, and increases as edge count increases for a fixed p Δ . Figure 4 shows the histogram of the confidence for the correct model. When model 1 is the true model of the test network, this is the proportion of the 100 subsamples that were assigned model 1, and  Figure 4. Histograms of the confidence score (proportion of subsamples assigned the correct model here rather than the majority) for p Δ from 0.05, 0.03, 0.01, 0.005, from left to right, and edge count from 100, 500, 1000, 2000, from top to bottom, with the red vertical lines representing the median. This shows that our proposed approach for model selection behaves as one would intuitively expect, i.e., greater differences between the models are more frequently classified correctly than smaller differences.  www.nature.com/scientificreports www.nature.com/scientificreports/ vice versa. When the proportion of correctly classified models is around 0.5, i.e., as good as a random guess, the confidence is symmetric and centered close to 0.5. When the proportion is higher than 0.5, the distribution of the confidence is shifted to the right, meaning that the two models are easier to tell apart. In addition, the more right skewed the histograms, the more confidence in the correct model. The red vertical line indicates the median, which also moves to the right as the proportion increases and as the confidence becomes more right skewed. This behavior indicates that the confidence for the selected model from the bootstrap subsampling procedure quantifies well the degree of uncertainty in the selected model. Random forest feature importance of all five predictors can be found in the Supplementary Information (Fig. S3) to see the shifting role of the predictors in the different scenarios.
Goodness of fit. To display our method for assessment of goodness of fit, we examine the yeast (S.cerevisiae) protein-protein interaction network data from DIP 44 . This data set has been much examined in the literature, including using network models. There are two particular publications 45,46 that fit different duplication divergence models to two different previous versions of the yeast data set, with differing seed networks. Here we apply our method to compare the fit of the two different models on the most recent version of the data.
Both papers use the same duplication divergence model 17,37 , which we described as DMR earlier. However, the papers used different parameter values and different seed networks. The fit from Hormozdiari et al. 45 has parameter values p = 0.365 and r = 0.12, and the seed network contains 50 nodes. The seed network was constructed by highly connecting cliques, complete graphs where an edge exists between every pair of nodes, of 7 nodes and 10 nodes, then connecting additional nodes to the cliques. To highly connect the cliques, each possible edge between nodes in different cliques (70 such edges) was added with probability 0.67. Then, another 33 nodes were attached to randomly chosen nodes from the two cliques. At each step of the network generation, if a singleton (a node not connected to any other node) was generated, it was immediately removed in their model. Note that the details for obtaining the seed network from Hormozdiari et al. 45 were somewhat incomplete, so this is our interpretation of the description of their seed network. , and degree assortativity (panel c) from independent draws from the two model fits (blue for Hormozdiari et al. 45 and red is for Schweiger et al. 46 ) as well as the PPI network (black). In addition, there are two resampling distributions from a single draw from each of the two model fits (green for Hormozdiari et al. 45 and orange is for Schweiger et al. 46 ). This figure gives a visual representation of the additional information provided by the goodness of fit approach as well as difference from comparing point estimates with distribution of the statistics from full networks as seen in Fig. 6. www.nature.com/scientificreports www.nature.com/scientificreports/ On the other hand, the fit from Schweiger et al. 46 has parameter values p = 0.3 and r = 1.05. The authors use a smaller seed network of 40 nodes, generated with an inverse geometric model. To generate this seed network, a set of coordinates {x 1 … x 40 } in R d is generated for each node. Then, each pair of nodes with distance x i − x j greater than some threshold R is connected with an edge. Each dimension of the coordinates is independently generated from the standard normal distribution N(0, 1). In their fit, the seed network uses d = 2 and R = 1.5. Unlike Hormozdiari et al. 45 , Schweiger et al. 46 does not remove singletons as they are generated.
Both papers assessed the fit of their model by comparing certain aspects of the generated network to those of the yeast PPI network. In Hormozdiari et al. 45 , model fit was assess via k-hop reachability, the number of distinct nodes reachable in ≤k edges, the distribution of particular subraphs, such as triangles and stars, as well as some measures of centrality. Schweiger et al. 46 assess fit with the distribution of bicliques, i.e., subgraphs of two disjoint sets of nodes where every possible edge between the two sets exists. Here, we assess the fit of both models via our method with the average local clustering coefficient 16 , triangle count, and the degree assortativity 47 . The local clustering coefficient of a particular node is a measure of to what extent its neighbors resemble a clique. Mathematically, this is computed as the number of edges between a node's neighbors divided by the maximum possible number of such edges. We use the average of the local clustering coefficient over all nodes in the network as a meassure of local clustering that is also attributable to the network as a whole. We also consider the number of triangle subgraphs that appear in the network. Unlike Hormozdiari et al. 45 , which counts the total number of various subgraphs together, the count of triangles alone is a strictly global measure of clustering. Lastly, the degree assortativity of a network is a measure of how similar are the degrees of nodes connected by an edge. It is defined as the Pearson correlation of the degrees of nodes connected by an edge, so positively assorted networks have more edges between nodes of similar degrees, while negatively assorted networks have more edges between nodes of dissimilar degrees.
For the analysis, we consider the largest connected component (LCC) of the PPI network just as in Hormozdiari et al. 45 . The full network from the current version of the data contains 5176 nodes and 22977 edges, while the LCC contains 5106 (98.6%) nodes and 22935 (99.8%) edges. Networks drawn from each model contain the same number of nodes as the LCC, starting from their respective seed networks described above. Subsamples from the PPI network as well as networks drawn from each model contain 1550 nodes, roughly corresponding to 30%. This was the largest portion considered in our study of portion of nodes subsampled above.
The results of the data analysis are summarized in Fig. 5, where it is clear that the ordering of the fit of both models differs based on the network statistic of comparison. In accordance with earlier notation, for each statistic, we refer to the resampling distribution of the model of Hormozdiari et al. 45 as F c h and that of Schweiger et al. 46 as F c s , while that of the PPI network is referred to as F o . For clustering coefficient (Fig. 5a) (Fig. 5b), the model of Schweiger et al. 46 seems to fit better as F c s 's spread has a much bigger overlap with F o . The KS statistic between F o and F c s (0.6778) is also much smaller than that between F o and F c h (0.9018). Lastly, for degree assortativity (Fig. 5c), the model of Hormozdiari et al. 45 fits much better as the spread of F c h overlaps with that of F o , and most of F c h 's spread is negative just as F o . On the other hand, F c s is entirely positive and has little overlap with F o . The KS statistic tells the same story, with 0.4373 for Hormozdiari et al. 45 and 0.9782 for Schweiger et al. 46 .
In, Fig. 6, we plot the distribution of the same statistics from full network realizations drawn from the two models, as well as the point estimate from the full PPI network. We use L c h and L c s as the full network analogs to F c h and F c s , respectively, and S o to denote the point estimate for the full PPI network. For clustering coefficient, L c h and L c s look very similar, so this comparison would not lead to a different conclusion. For triangle count, L c s visually appears somewhat closer to S o than L c h . The spread of L c s also contains S o , albeit barely. However, L c s is also much more variable than L c h . In fact, L c s 's spread reaches farther than that of L c h on both ends. Based on L c h , L c s and S o , it is not obvious which model fits better, whereas our method gives a clear numerical ordering between the two models. For degree assortativity, the entirety of L c h is closer to S o than L c s , so this comparison would not lead to a different conclusion just as clustering coefficient. Finally, since our method provides a joint distribution of the three statistics from each model as well as the PPI network, we are able to quantify overall fit that takes all three statistics into account jointly via a distance between the joint distributions (such as the multidimensional KS statistic as discussed earlier). This example demonstrates that considering the full resampling distributions, rather than point estimates as existing methods do, results in a more nuanced comparison of network models with empirical data.
Additionally, in Fig. 5, we plot the subsamples from two individual networks drawn from each model against the subsamples from independent networks drawn from each model. For each statistic, the spread and location of the two types of subsamples are similar, although triangle count shows a little more deviation than the other two since it is a sum rather than a mean. This is likely due to the rather large seeds (50 and 40 nodes) both models use as well as the rather small portion of nodes in each subsample (~30%), reflecting our observations in earlier sections.

Discussion
Network models are able to model increasingly complex dependencies that arise in network data. Yet this very dependency poses a statistical challenge, especially in the case of a single observed network. We propose a bootstrap subsampling procedure as a basis for statistical procedures in this setting that is based on a flexible resampling distribution built from the single observed network and demonstrate the procedure in both simulation and empirical test settings.