Hidden network generating rules from partially observed complex networks

Yang, Ruochen; Sala, Frederic; Bogdan, Paul

doi:10.1038/s42005-021-00701-5

Download PDF

Article
Open access
Published: 01 September 2021

Hidden network generating rules from partially observed complex networks

Communications Physics volume 4, Article number: 199 (2021) Cite this article

4579 Accesses
14 Citations
5 Altmetric
Metrics details

Subjects

Abstract

Complex biological, neuroscience, geoscience and social networks exhibit heterogeneous self-similar higher order topological structures that are usually characterized as being multifractal in nature. However, describing their topological complexity through a compact mathematical description and deciphering their topological governing rules has remained elusive and prevented a comprehensive understanding of networks. To overcome this challenge, we propose a weighted multifractal graph model capable of capturing the underlying generating rules of complex systems and characterizing their node heterogeneity and pairwise interactions. To infer the generating measure with hidden information, we introduce a variational expectation maximization framework. We demonstrate the robustness of the network generator reconstruction as a function of model properties, especially in noisy and partially observed scenarios. The proposed network generator inference framework is able to reproduce network properties, differentiate varying structures in brain networks and chromosomal interactions, and detect topologically associating domain regions in conformation maps of the human genome.

Inferring gene regulatory networks from single-cell multiome data using atlas-scale external data

Article Open access 12 April 2024

Spatially organized cellular communities form the developing human heart

Article Open access 13 March 2024

Single-cell multiplex chromatin and RNA interactions in ageing human brain

Article Open access 27 March 2024

Introduction

Mining the topological complexity of networks must go beyond estimating statistical network metrics (e.g., degree^1,2, clustering coefficient³, path-length distribution) or measuring the network’s geometric^4,5,6 properties. Instead, we must elucidate the underlying hidden heterogeneous rules that govern the emergence and dynamics of complex networks. For instance, the interactions between brain regions or neurons (topological structures) generate cognitive functional states, but challenges remain in understanding the brain wiring mechanism and the rules related to cognitive processes in network neuroscience⁷. Furthermore, topological analysis of yeast chromatin maps reveals a transition from intra- to inter-chromosomal interactions when the yeast undergoes different growing states⁸, but fails to identify the network generators or rules corresponding to this transition. Moreover, multifractal topological analysis reveals that chromosomal interactions are bifractal⁹. While these multifractal network analysis efforts can detect subtle conformational changes among complex network components (e.g., chromosomes in human stem cells), they fail to explain the emergence and evolution of networks, identify their general set of hidden network generators, and explain how small changes to these generators can lead to exhibited or enhanced complex behavior. Aside from chromosomal interactions, it has also been proven that various real networks possess a hierarchically organized (self-similar) community structure which grows recursively and copies themselves^{10,11,12,13,14,15}. For example, neuronal culture networks¹⁶ and protein interaction networks¹⁷ also possess complex multifractal behaviors.

Multifractality has been studied as a topological feature with box covering/box counting methods^4,18. The scaling behavior is examined by a renormalization process which coarse-grains the network into boxes^6,19. However, commonly used approaches, like renormalization group-inspired algorithms (such as box covering, sandbox) fail to illustrate the network emergence²⁰. Various graph models have been proposed to model the growing scale-free properties and multifractal degree distribution by self-repeatably inheriting and adding nodes and connections^21,22. Nevertheless, the multifractality that exists on a topological level cannot uncover the hidden community structure or the generator rules.

Unlike exploring and measuring the multifractality of various topological structures, we focus on identifying the underlying network generating functions and developing a general mathematical framework together with efficient algorithms to mine the multifractality encoded in the node attributes and the weighted interactions among nodes. The network generating function can provide high level and condensed description of complex systems. Uncovering the generating rules will enable us not only to generate synthetic graph structures with different topological properties, but also reproduce and mine their topological complexity and heterogeneity. The probabilistic description of the generating function should also help to explore the validity of links in a noisy graph and apply to various scenarios. To the best of our knowledge, a robust and general framework of multifractal generating model along with comprehensive analysis is lacking. Although the multifractal network generators²³ can generate networks with multifractal properties and any given graph metrics, the simulated-annealing based parameter estimation is not robust and the model is limited to binary graphs. The stochastic kronecker graph model^10,24,25 is also capable to capture self-similarity, but the network size is required to be related to the model and the heterogeneity in node attributes is neglected. The multiplicative attribute graph model generalizes the two aforementioned models and characterizes the node attributes in social networks^26,27. Though the model formulation is general, the estimation algorithm targets only binary node attributes.

To address these research gaps and better understand the complexity and multifractality of real-world networks, one must address the following major challenges: (1) How can we construct a general multifractal network generating model capable of not only capturing the observed multifractal characteristics, but also provide mathematical tools for efficiently investigating and engineering their macroscopic properties? (2) How can we efficiently and correctly reconstruct the underlying network generator model? (3) Can we recover the weighted multifractal network generative model from incomplete (partial) observations and noisy or adversarial data/influences? (4) Do such techniques scale up to real-world networks and enable us to study whether multifractality appears in real-world applications such as the brain connectomes and chromosomal interactions?

To answer these questions, here we propose a weighted multifractal graph model (WMGM) constructed recursively from generative measures of linking probabilities and capable to capture the multifractality and weighted heterogeneity of functional interactions. To clarify the difference between characterizing the multifractality in topological structure and in the network generating model, we specify that the functional level and model level terms refer to analyzing networks through their reconstructed generative model. In contrast, the graph level term means that we are examining the properties of the network topology. To efficiently learn the parameters of the network generating model, we provide a rigorous variational inference framework capable of reconstructing the underlying multifractal network generator for partially observed networks. This inference method can deal with networks of arbitrary sizes and any attribute cardinality; it also offers a robust parameter estimation. We examine our proposed approach on both synthetic and real-world networks. We show that the proposed model can characterize and reproduce many graph properties (i.e., degree, clustering coefficient, weight distribution). We present the efficiency and robustness of the proposed model and inference method against incomplete and noisy observations. By applying the network generator inference framework to real-world datasets (e.g., brain networks, chromatin interactions), we reveal the hidden structure of complex systems at the functional level. The results indicate that the WMGM is capable of differentiating between various structures in brain networks and in chromatin interactions. We further show that the proposed inference algorithm can help to detect topologically associating domains (TADs) in chromosomal interaction maps.

Results

Weighted-multifractal graph model

We propose the weighted multifractal graph model (WMGM). It is meant to serve as a generalized network generating rule that captures the observed multifractal properties associated with node attributes and the heterogeneity in weights (intensities of pairwise interactions). Building on measure theory concepts, the crux of this multifractal network generating model is to construct a series of probabilities that we associate with the side lengths of rectangles that are recursively built up by repeatedly splitting a unit square. This ensures a heterogeneous self-similar network structure. These probabilities are then used to generate the node attributes and edges for the network.

We first define an initial generating measure ${\theta }^{(1)}=\left({l}^{(1)},{p}^{(1)}\right)$ on a unit square. The rationale for considering a unit square is to ensure that the probability mass function of node attributes sums to 1. Next, the unit square is divided into M² rectangles, where ${\{{l}_{i}^{(1)}\}}_{i = 1}^{M}$ are the side lengths of each rectangle. To these rectangles, we assign the probabilities ${\{{p}_{ij}^{(1)}\}}_{i,j = 1}^{M}$. We consider symmetric p⁽¹⁾ terms in this work, but as an extension, we could permit the asymmetric case which can model directed networks. Along the same lines as in the multifractal network generator^20,23, the self-similar WMGM ${\theta }^{(K)}=\left({l}^{(K)},{p}^{(K)}\right)$ is formulated recursively from this unit square θ⁽¹⁾ with interval length l^(K) = l^(K−1) ⊗ l⁽¹⁾ and linking probability p^(K) = p^(K−1) ⊗ p⁽¹⁾. Here ⊗ denotes the Kronecker product.

An undirected weighted graph is then generated by the following procedure: (1) N nodes are spread into M^K classes with prior probabilities l^(K). The indicator variable ϕ_uq denotes the label indicating whether a node u has attribute q. Note that ϕ_uq = 0 or 1, $\mathop{\sum }\nolimits_{q = 1}^{{M}^{K}}{\phi }_{uq}=1$. The attributes follow the categorical distribution $P\{{\phi }_{uq}=1\}={l}_{q}^{(K)}$, q = 1…M^K. (2) The edges between nodes u and v are generated with a linking probability p^(K). Let ${\{w(r)\}}_{r = 0}^{\infty }$ denote the predefined weight distribution, where w(0) = 0 and $\left\{w(r)\right\}$ is monotonically increasing. The probability that an edge between node u and v has weight w(r) is given by $P\{{e}_{uv}=w(r)| {\phi }_{uq}=1,{\phi }_{vh}=1\}={\left({p}_{qh}^{(K)}\right)}^{r}\left(1-{p}_{qh}^{(K)}\right)$. For simplicity, we denote it as ${p}_{qh}^{(K)}({r}_{uv})$, where r_uv is the weight category r of the edge between node u and v. Here, the probability that an edge does not exist is ${p}_{qh}^{(K)}(0)=1-{p}_{qh}^{(K)}$ and the chance that an edge (regardless of the weight) exists is $\mathop{\sum }\nolimits_{r = 1}^{\infty }{p}_{qh}^{(K)}(r)={p}_{qh}^{(K)}$. It naturally satisfies $\mathop{\sum }\nolimits_{r = 0}^{\infty }{p}_{qh}^{(K)}(r)=1$ and can be easily mapped to unweighted graphs. In contrast to²⁰, where the linking probabilities p⁽¹⁾(r) are determined for each weight level r, we design the edge distribution $P\{{e}_{uv}=w(r)| {\phi }_{uq}=1,{\phi }_{vh}=1\}$ as the geometric distribution with ${p}_{qh}^{(1)}$ identical to all weights. The rationale is that smaller weights are more common. We also aim at using fewer parameters to capture the heterogeneous graph structure.

The multifractality of the model emerges from the recursive construction. The derivation of the partition function and the multifractal metrics are presented in the method section Multifractal analysis of WMGM. Special cases of the proposed model correspond to several related models. When M^K = N, the proposed weighted multifractal graph model retrieves the Kronecker model¹⁰ as a particular case. When the weight is neglected (i.e., total weight level R = 1), the proposed model reduces to the multifractal network generator²³.

Figure 1a shows the numerical example of the model building procedure and graph generation. The model θ⁽²⁾ in the middle is constructed by θ⁽¹⁾ shown on the left-hand side at the first iteration. In the simulated graph, the colors of nodes represent the attributes generated by l⁽²⁾ and the weights of the links are generated by p⁽²⁾. Figures 1b and c show two different models which generate networks with different community structures. In the generating model θ^(K), the linking probability p^(K) approximates the connection rules and community structure. Larger value of ${p}_{qh}^{(K)}$ leads to denser connection between nodes in community q and h. If ${p}_{qh}^{(K)}$ is even across different q, h, it suggests relatively ambiguous heterogeneity (Fig. 2a). When ${p}_{qh}^{(K)}\to 1$, each pair of nodes in community q and h has a connection and thus it creates a fully connected subgraph (Fig. 2b). When ${p}_{qh}^{(K)}\to 0$, no connection may exist between category q and h, leading to an m-partite structure (q = h, Fig. 2c) or a community structure (q ≠ h, Fig. 2d).

**Fig. 1: Model illustration and examples.**

**Fig. 2: Networks generated by different linking probability p in the weighted multifractal graph model (WMGM).**

To overcome the challenges related to mining large-scale complex systems (e.g., heterogeneity in weights, scale-dependent higher order interactions), we investigate how the proposed WMGM can decipher the hidden rules that govern their complex topological architecture and functionality. Indeed, mapping a large network to a generative model can contribute to losing some intricate details of subnetworks and their interactions. However, reconstructing a function level model can compress unnecessary redundant information and allow us to deal with incomplete or noisy data, which is common in real-world datasets. Consequently, the WMGM can learn the hidden rules that govern structures in complex systems such as brain networks (e.g., understanding and interpreting the emergence of neuronal computation in brain networks), biological systems (e.g., understanding the emerging genotype-phenotype relationships) and social networks. To investigate the benefits and limitations of the WMGM, we evaluate its capabilities in reconstructing the network generating models from scarce noisy observations on a series of artificially generated and real-world networks. First, we validate our method on synthetic data in terms of convergence and estimation error. Next, because real-world networks are usually only partially observed and often noisy, we investigate the robustness of our approach to such factors. We also apply our method to three real-world complex networks, namely the brain connectome of Drosophila, the chromosome interactions of yeast cells undergoing different growth states, and the conformation maps of replicated human chromosomes. The results show that our method can reproduce and elucidate important properties of real-world complex systems.

Learning the hidden network generators (rules) from partial and noisy observations of synthetic networks

To validate the ability of the proposed WMGM framework to reconstruct the ground-truth generators and to understand its estimation error, we examine three synthetic network case studies, from least to most challenging: (i) Clean fully observed networks, (ii) Threshold-varying partially observed networks, and (iii) Noisy networks with spurious edges. In the case of fully observed network setting, we demonstrate that our model can successfully reconstruct the ground-truth generator and reproduce the graph properties of the synthetic network. We also show that the WMGM inference is robust up to a certain level of missing observations in the case of partially observed networks, and can handle noise by distinguishing between spurious and true links in the last case.

Network generator reconstruction

We first examine the network generator reconstruction accuracy and the ability of the WMGM to recover simulated graph properties. We use a synthetic graph G_syn of 500 nodes generated by l⁽¹⁾ = [0.7, 0.3], p⁽¹⁾ = [0.8, 0.5; 0.5, 0.4] with hyperparameters M = 2, K = 3 and a predefined weight set $\left\{w(r)=r\right\}$. We implement the variational expectation maximization (EM)-based estimation method (described in the Methods section Parameter estimation of WMGM) on the fully observed simulated network with the step length γ = 10⁻⁷ of the gradient methods in M-step. The algorithm stops when the increment of the objective function (lower bound of log-likelihood) after one EM iteration is smaller than 0.1. Figures 3a and b show the convergence of the lower bound of the log-likelihood ${{{{{{\mathscr{L}}}}}}}_{Q}(\theta ,R)$ and the reconstructed parameters as the EM iteration proceeds, respectively. We note that the lower bound exhibits a fast convergence within the first 20 EM iterations and later slightly increases and converges after 120 iterations. The relative absolute error of the reconstructed lower bound ${{{{{{\mathscr{L}}}}}}}_{Q}(\theta ,R)$ and the true log-likelihood ${{{{{{\mathscr{L}}}}}}}_{{{{\rm{syn}}}}}(R)$ of the synthetic graph G_syn is $| ({{{{{{\mathscr{L}}}}}}}_{Q}(\theta ,R)-{{{{{{\mathscr{L}}}}}}}_{{{{\rm{syn}}}}}(R))/{{{{{{\mathscr{L}}}}}}}_{{{{\rm{syn}}}}}(R)| =0.0022$, which shows the recoverability of the proposed WMGM framework. Meanwhile, the estimated parameters show a similar trend and converge to the ground-truth values. Figure 3c presents the mean relative absolute error (RAE) per parameters as a function of the EM iterations. It shows that the error decreases fast within the first 20 EM iterations, and when the small increment of the log-likelihood lower bound emerges at 100 EM iterations in Figure 3a, the error begins to drop sharply again. After 120 EM iterations, it achieves the minimum error of 0.013 (1.3% error per parameter), and when the algorithm terminates, the error is 0.032 (3.2% error per parameter). Figure 3b shows that the value of the recovered ${p}_{22}^{(1)}$ (yellow line) crosses the ground truth 0.4 at 120 EM iterations and then decreases by a small quantity. This suggests that we can early terminate the algorithm when the objective function starts to converge and achieve the best performance.

**Fig. 3: Model reconstruction, graph metrics and estimation error.**

We also investigate the recoverability of the network structures through the proposed WMGM framework. In Fig. 3d–f, we compare the simulated and reconstructed network properties including the clustering coefficients, degree distribution, and weight distribution. The simulated distributions (blue lines) are directly calculated from the synthetic network G_syn and the reconstructed results are calculated from a network G_recon generated by the recovered WMGM. We observe that the proposed estimation method can successfully reproduce the network properties of synthetic networks. In Supplementary Note 5 we further quantify the dissimilarity of the simulated and reconstructed graph properties, while in Supplementary Note 3 we include null models as comparison to show the efficiency of the estimation algorithm.

No guarantee exists that the EM algorithm converges to the maximum likelihood estimator. If the objective function is non-convex, the algorithm may terminate at or near a local optimum. The estimation accuracy is also related to the size of the network and its density. Consequently, we investigate the dependency between mean relative absolute error and the multifractal spectrum width of the model and other key properties in Fig. 4e. We select different levels of randomness in the model with varying positions of the multifractal spectrum, generate a graph of 200 nodes, then recover the model with same model initialization and measure the mean relative absolute error per parameter. We consider three cases: all parameters are randomly generated (in blue asterisks); ${p}_{12}^{(1)}={p}_{21}^{(1)}=0.5$ with random ${p}_{11}^{(1)}$, ${p}_{22}^{(1)}$, ${l}_{1}^{(1)}$, ${l}_{2}^{(1)}$ (in green dots); and ${l}_{1}^{(1)}={l}_{2}^{(1)}=0.5$, ${p}_{12}^{(1)}={p}_{21}^{(1)}=0.5$ with fixed center of multifractal spectrum (in red circles). The multifractal spectrum width is calculated as in Methods section Multifractal analysis of WMGM. Figure 4e shows that the random cases make such local minima particularly prominent.

**Fig. 4: Model reconstruction from incomplete observations.**

Reconstruction of network generator from partial observations

Complex networks are usually partially observed. This situation has many causes, including the following scenarios: (1) the network is still growing and new nodes can join in the future; (2) it is computationally expensive or technologically impossible to examine the whole network (e.g., all neurons in the human brain). Therefore, we investigate the ability of successfully reconstructing the ground-truth WMGM from partial observations.

For the partially observed experiments, we use a synthetic graph with N₀ = 100, 000 nodes generated by l⁽¹⁾ = [0.7, 0.3], p⁽¹⁾ = [0.8, 0.5; 0.5, 0.4], M = 2, K = 3. At each time, we randomly select N of N₀ nodes and take the connections among the selected N nodes as incomplete data. We repeat this process 10 times for each N and measure the recovered parameters. The mean and standard deviation of recovered parameters and error are shown in Fig. 4a, b against the number of nodes observed N. We find that the model is correctly recovered with low standard deviation at N = 200 or more nodes observed, where the mean relative absolute error (RAE) per parameter with N = 200 is 3.1% and the standard deviation is 0.5%. We conclude that the proposed WMGM and the inference method is robust against missing components in the system.

We repeat the experiments with different full original network sizes N₀ = 10³, 5 × 10³, 10⁴, 5 × 10⁴, 10⁵ and the same N. We calculate the mean RAE and report the minimum fraction of observed nodes f to achieve a small certain error in Fig. 4c. Both axes are in log scale. The blue dots represent mean RAE smaller than 0.035 (3.5%) and red asterisks represent mean RAE smaller than 0.050 (5.0%). They are well fitted by power laws (shown as solid lines). The regression for error smaller than 0.035 is $f=332\times {N}_{0}^{-1.04}$ and for error less than 0.050 is $f=95\times {N}_{0}^{-0.97}$. It shows that the required size of observation to achieve a certain small error is decreasing and follows a power law as the original network size grows. Figure 4d shows the relationship of the average error of 10 experiments with combinations of N = 50, 100, …, 500 and N₀ = 10³, 5 × 10³, 10⁴, 5 × 10⁴, 10⁵. The axis of N₀ is set as log scale. The underlying generating model is recoverable when the partial observation contains more than 200 nodes, regardless of the original network size N₀. This is critical, as in real-world complex systems, only partial observation without full monitoring and detection is possible. Since we use the WMGM as the generating rule and we assume the networks are partially but evenly observed, reconstruction with partial observation (a subgraph) can achieve good performance while saving on computational cost. It suggests that when dealing with very large networks, it is possible to correctly estimate the hidden generating rules even using a small subset of the network with only 200 nodes. We further perform more individual experiments to show the robustness in Supplementary Note 7.

Reconstructing the network generators from noisy observations and quantifying the reconstructed link reliability

We test the proposed WMGM and the estimation algorithm on noisy networks with spurious links. For this noisy setting, we first generate a synthetic binary graph with the same model as in section Reconstruction of network generator from partial observations. In the binary version, weights are neglected. Any edge in the weighted graph with e_uv ≠ 0 is considered as an edge in the binary version G₀ of the graph. Next, spurious links are randomly added with probabilities p = 10⁻³, 3 × 10⁻³, 10⁻², 3 × 10⁻², 10⁻¹, 3 × 10⁻¹ between pairs of nodes where no edges exist in the original network, producing a noisy graph G_n. For each noise level p, we individually run the experiments 10 times. We call links that exist in the synthetic graph G₀ and the noisy graph G_n as true positives, and links added in G_n are false positives. We aim at differentiating true positive and false positive links in noisy graphs. We first reconstruct the generative model θ_n from the noisy observation G_n. We define the link reliability of an edge e_uv in the noisy network with its likelihood given by the reconstructed model as $L{R}_{uv}={{{{\mathrm{log}}}}}\,P({e}_{uv}| {G}_{n};{\theta }_{n})$. The link reliability of the link between node u and node v can be estimated as follows:

$$L{R}_{uv}\approx \mathop{\sum}\limits_{qh}{\tau }_{uq}{\tau }_{vh}{{{{\mathrm{log}}}}}\,{P}_{qh}^{(K)}({r}_{uv})+\mathop{\sum}\limits_{q}{\tau }_{uq}{{{{\mathrm{log}}}}}\,{l}_{q}^{(K)}+\mathop{\sum }\limits_{h}{\tau }_{vh}{{{{\mathrm{log}}}}}\,{l}_{h}^{(K)}.$$

(1)

Figure 5a shows the cumulative distribution of link reliability for different labels, true positive and false positive, with relative noise level p = 10⁻¹. Spurious links (false positive) have lower reliability than true ones (true positives) in their distributions. The average link reliability of false positives is also smaller than the one with true positives. The distinctness implies that the WMGM is able to detect noise in observations and therefore can help to denoise graphs. We validate the ability of graph denoising and its application in Supplementary Note 1. Figure 5b, c shows the reconstructed model parameters and the estimation error for different noise levels p. More spurious links (noise) are added to the network when the noise level p increases. As a consequence, recovered parameter error also increases as the noise level grows. The curve shows that the estimation error is smaller than 0.05 (5%) with low noise level p < 10⁻¹. The estimations are also robust (with low variance of reconstructed parameters and error) when p ≤ 10⁻¹. Though the estimation error is relatively large (10%) when p = 10⁻¹, Fig. 5a shows that even with relatively high-level noise, our model is still capable of distinguishing noise links and true links in the network. We also perform more repetitions to show the robustness in Supplementary Note 7.

**Fig. 5: Model reconstruction from noisy observations.**

Learning the hidden network generators (rules) of biological networks

To demonstrate the capabilities and benefits of the proposed WMGM inference framework, we investigate and learn the network generators (rules) of the following three biological networks: (i) the neuronal connections in adult Drosophila central brain²⁸, (ii) the chromatin interactions of yeast genome⁸, and (iii) the conformation maps of replicated human chromosomes²⁹. We show that the WMGM enables us to reveal important properties of these biological datasets such as recovering their topological network properties, differentiating growing states, identifying specific features of brain structures in different regions, and detecting TADs. We also conduct experiments on various social networks. The results of social networks can be found in Supplementary Note 2.

Revealing the network generators of Drosophila brain connectome

We use the largest synaptic-level connectome obtained through a three photon microscopy from fruit fly brain²⁸. Chemical synapses between neurons are detected and the numbers of synapses are calculated as the intensity of neuron connections. The original Drosophila connectome G₀ of the left alpha lobe in the mushroom body consists of 10,790 neurons (nodes) and 6444 identified synaptic connections (edges). We delete neurons without connections and use the connected 693 nodes to construct a network G with the full 6444 connections and reconstruct the WMGM. We neglect the isolated nodes to avoid the extra computational cost and construct a relatively denser network to achieve better model estimation performance. When the network is large and sparse, the estimation tends to be unstable and inaccurate because we have very limited link observations. In Supplementary Note 9 we show the results with different node and edge sampling size. Note that the method of sampling nodes could influence the network topological structure and the reconstructed model. In the future, we will also investigate and develop strategies that allow us to select the minimum number of nodes required to accurately reconstruct the WMGM model obeying different network properties and for different network sizes. Also, we can always involve the sparsity to the recovered WMGM by adding a negative bias to each linking probability parameter p⁽¹⁾ when the variational EM algorithm process is finished. We discretize the network G with 2^r ≤ w(r) < 2^r+1 − 1 and then use it as the input to the proposed estimation algorithm.

Figure 6 shows the estimation and reconstruction results. Figure 6a, b show the convergence of the lower bound of log-likelihood and parameters with EM iterations. Figure 6c illustrates the reconstructed WMGM. The colors in the square represent the values of p^(K) probabilities, and the interval lengths reflect l^(K). The brain connectome in the alpha lobe is sparse, therefore most regions in the square model have small linking probability values. The exception is on the right-bottom diagonal, which has the value ${p}_{88}^{(K)}=0.5213$. Its presence is due to the appearance in the connectome of a group of very strong interactions among around 20 neurons. Figure 6d–f presents the clustering coefficients, degree distribution and the cumulative weight distribution of the empirical and reconstruced brain networks. The blue line is the empirical distribution directly extracted from the brain connectome. The red line is the result calculated from a synthetic network generated by the reconstructed WMGM. The yellow dash line represents a null network which is generated by the Erdos-Renyi model³⁰ with average linking probability. We also show the distribution in log scale in Supplementary Note 8. The empirical and reconstructed distributions are close to each other, showing that the WMGM and the proposed inference approach can also reconstruct the statistical properties of real networks. In the scenarios of brain connectome and neuronal connections, it is extremely hard to detect and monitor all neurons or the full functional connectivity due to its complex three dimensional structure and the unknown physico-chemical interactions. In the Reconstruction of network generator from partial observations section, we discussed the recoverability of the WMGM with limited observations. Therefore, the proposed model can enable neuroscientists to estimate hidden rules and learn topological properties of brain networks even if only limited and partial observations are available.

**Fig. 6: Network generator reconstruction for a brain connectome.**

Brain networks in different brain regions have varying topological structures and features. We exploit our proposed WMGM inference framework to examine the structure and connectivity in four regions with different functionalities of the Drosophila optical lobe: Medulla, Accessory Medulla, Lobula and Lobula Plate. Recall that the brain connections are sparse and tend to appear in a small subset. Therefore, we select the most connected 200 nodes in these brain regions and binarize them as the input to the WMGM inference algorithm. For reconstruction accuracy, we run the inference algorithm 50 times on each brain network and calculate their mean and standard deviation. For the Medulla connectome, we obtain an average network generator with the following parameters l⁽¹⁾ = [0.63, 0.37], p⁽¹⁾ = [0.18, 0.26; 0.26, 0.92]. For the Accessory Medulla connectome, the parameters of the average network generator are as follows: l⁽¹⁾ = [0.48, 0.52], p⁽¹⁾ = [0.07, 0.14; 0.14, 0.34]. For the Lobula connectome, the parameters of the average network generator read: l⁽¹⁾ = [0.46, 0.54], p⁽¹⁾ = [0.41, 0.42; 0.42, 0.95]. For the Lobula Plate connectome, we obtain the following average network generator model: l⁽¹⁾ = [0.42, 0.58], p⁽¹⁾ = [0.12, 0.21; 0.21, 0.82]. The standard deviation for each parameter in each network is smaller than 10⁻¹⁰. The inference results p⁽¹⁾ and l⁽¹⁾ are visualized as colors and side lengths of the yellow-green squares in Fig. 7. We further show the clustering coefficients and degree distribution of the four brain networks in Supplementary Note 10. It is impossible to obtain a concise description for each network while encoding all their properties. We conclude that the reconstructed network generator models can be easily distinguished and our WMGM can be used to differentiate scale-dependent brain regions with different functionalities. Moreover, we can exploit the WMGM to define the regional cognitive functionality using the reconstructed generating rule θ = (p, l). This enables us to measure and quantify the neural behaviors and cognition divergence.

**Fig. 7: Network generator inference for a Drosophila connectome in different regions of the optical lobe.**

Inferring the network generators of chromosomal interactions of yeast genome in different growing states

The chromosome conformation capture analysis (also known as Hi-C technique) reveals the topological structure of the genomic sequences^29,31 and allows scientists to examine the chromatin’s 3D structure. It measures the contacts between any pair of genomic loci³¹. In the chromosomal interaction matrix built by the Hi-C technique, the nodes represent the genomic loci and pairwise edge indicates the interaction frequency between two loci in the genome³².

During the various growing states, the yeast genome exhibits a complex topological reorganization⁸. To mine this topological complexity, we infer the WMGMs emerging from the chromosome interaction data^8,32. For each chromosomal interaction matrix, we first downsample the 12,048-by-12,048 matrix to 503-by-503 and then discretize it with 200 ≤ w(r) < 200 + 100r.

Figure 8a, b illustrates the reconstructed WMGM θ^(K) from the chromosome interactions of the yeast genome in the exponential growth and quiescence states, respectively. We fix l⁽¹⁾ in both growing states to be identical such that we can compare the linking probabilities. The value of p^(K) in the major sub-blocks are shown in the figures. Figure 8a shows that the linking probabilities on the diagonals of exponentially growing yeast cells are larger compared to cells in quiescence state shown in Fig. 8b, while the non-diagonal elements are relatively smaller than the ones in quiescence state. This suggests that when the yeast is growing, the inter-chromosomal interactions become weaker (${\sum }_{i\ne j}{p}_{ij}^{(K)}{l}_{i}^{(K)}{l}_{j}^{(K)}$ changes from 0.2217 to 0.1758) and intra-chromosomal interactions become stronger (${\sum }_{i = j}{p}_{ij}^{(K)}{l}_{i}^{(K)}{l}_{j}^{(K)}$ changes from 0.1201 to 0.1516). This conclusion is consistent with the analysis in⁸, where the authors measure the intra-chromosomal distances between two sites on one chromosome. Figure 8c shows the cumulative weight distribution of the chromosome contact maps. The difference in the recovered model is clearer when comparing with the statistical properties of the network weights. We note that the WMGM enables us to identify these properties and our model can therefore help to distinguish between different growth states.

**Fig. 8: Weighted-multifractal graph model (WMGM) inference for the yeast genome in different growing states.**

Network generator inference and analysis for the conformation maps of replicated human chromosomes

Chromosome conformation capture analysis (also known as Hi-C technique) reveals topological structure of the genomic sequences²⁹. TADs detection identifies the highly self-interacting chromatin regions. TADs emerge as square blocks whose centers locate at the diagonal of the interaction matrices. Though TADs emerge as critical features to characterize the high intradomain contacts, an unambiguous definition is still evolving³¹. We infer the WMGM from a binarized Cis sister contact maps from²⁹ and show that our model can help to detect the TADs in Hi-C matrices.

In the variational EM based estimation method (see Methods section on Parameter estimation of WMGM), we introduce the variational parameters τ_uq to calculate the lower bound of the log-likelihood ${{{{{{\mathscr{L}}}}}}}_{Q}(\theta ,R)$ in Eq. (3). For each node u, ${\{{\tau }_{uq}\}}_{q = 1\ldots {M}^{K}}$ can be viewed as soft assignments regarding the probability that node u has attribute q. They are also estimators of node attribute distribution parameters l^(K) and each node is assigned with one τ. We calculate the entropy of the variational estimator τ_u with each node u as $H(u)=-\mathop{\sum }\nolimits_{q = 1}^{{M}^{K}}{\tau }_{uq}{{{{\mathrm{log}}}}}\,{\tau }_{uq}$. Figure 9a shows the binarized Hi-C interaction matrix of the human chromosome 21 Cis sister contacts. For the best reconstruction of the WMGM, we downsample it to a 483-by-483 matrix and apply the proposed inference algorithm. TADs are circled with orange and green rectangles. The node attribute entropy H(u) is shown in Fig. 9b, where low entropies are close to zero and are circled. We find that the positions (node index) with zero entropies are in correspondence with the region where TADs emerge. We conclude that nodes in TADs have high intra-interactions and tend to have low entropy (where the WMGM has high confidence). This suggests that our WMGM can also help to detect TADs in Hi-C data analysis.

**Fig. 9: Conformation maps of replicated human chromosomes.**

Discussion

Exploring topological features in complex networks has the potential to enhance our understanding of the behavior of natural and social phenomena. Among massive topological features, we focus on multifractality, an important property that widely exists in complex systems from numerous domains, including biology, sociology, neuroscience, and geology. Analyzing the multifractality of complex systems enables scientists to measure the multi-scale interactions among components in large-scale complex networks⁵.

Network multifractality is commonly studied and analyzed at the graph level, where the structure of connections among nodes are considered as self-similar^6,19. However, past approaches suffer from a number of limitations. Renormalization group-inspired algorithms, capable of estimating the multifractality of graphs fail to explain the emergence and evolution of networks and cannot decipher the hidden generating rules. The stochastic Kronecker graph model²⁵ captures self-similarity by building a probabilistic model with Kronecker products. However, it requires the graph and model size to be the same, limiting applications to arbitrary scale and partially observed networks. In^20,23, the authors propose multifractal network generators, but they reconstruct the model parameters by fitting graph metrics via a simulated-annealing procedure. The simulated-annealing algorithm is unstable and can return various sets of unrelated parameters. This makes it difficult to interpret the generator’s physical meaning. The majority of network models also neglect the importance of weights characterizing the interactions in complex networks.

To decipher the hidden network generators (rules) governing the complex systems dynamics at the functional level, we proposed the weighted multifractal graph model (WMGM). It is capable of capturing their heterogeneity and varying degrees of self-similarity. The proposed WMGM serves as a function that maps and compresses the large graph onto a model. The network generating function can also provide high-level and condensed description of complex systems, which integrates varying graph metrics such as degree and clustering coefficient. To efficiently infer the model parameters, we develop a variational EM inference framework for reconstructing the underlying network generating function encoded in complex networks. We investigate the ground-truth recovery and the robustness of the model inference against incomplete data and noisy observations. The provided mathematical tools can help to investigate network (topological) features by describing large-scale complex systems through functional approaches. Uncovering the generating rules enables us not only to generate synthetic graphs with different properties, but also reproduce the topological complexity and heterogeneity of real-world systems. The proposed WMGM framework is applied to several real-world networks – neuronal connections and chromosomal interactions. The recovered WMGMs demonstrate the potential of the WMGM framework to capture and reproduce the topology structures of real networks. The probabilistic description of the generating function also helps to explore the validity of links in a noisy graph and denoise the system. The reconstructed generator is also able to distinguish different functional regions in the brain and yeast growing states, as well as to detect the boundaries of TADs.

In this work, we assume the distribution of node category is the same across nodes and communities. Generalizing the topological scales as the authors show in³³ can improve the ability of the proposed method to capture the heterogeneity embedded in the topological structure of real-world complex systems such as brain networks. For example, instead of using one rule $P\{{\phi }_{uq}=1\}={l}_{q}^{(K)}$ to generate the node attribute, we can introduce a heterogeneous distribution where the value of K is varied across nodes in different community. The category (community) of each node u is assigned by ${\phi }_{u} \sim categorical({l}^{({k}_{u})})$. The heterogeneity is introduced as the random variable k_u (scale of node u, which is a positive integer between 1 and K under distribution f(k_u). The linking probability between nodes u and v given the community ϕ_u, ϕ_v and scale k_u, k_v can be calculated as ${\sum }_{q,h}{p}_{qh}^{(K)}$, where the summation over q, h satisfies ${\sum }_{q}{l}_{q}^{(K)}={l}_{{\phi }_{u}}^{({K}_{u})}$ and ${\sum }_{h}{l}_{h}^{(K)}={l}_{{\phi }_{v}}^{({K}_{v})}$.

In the future, we will also investigate the effect of various methods to sample the networks (as we have performed in the Drosophila connectome and social networks case studies) and develop strategies that will provide consistent WMGM generating model across various subgraphs with different sizes and properties. Moreover, future applications of the proposed WMGM framework includes inferring the WMGM models that correspond to partially observed neuronal activity and quantifying how the identified WMGM models evolve and self-optimize during the observed cognition activities. A crucial question in neuroscience is, how can we measure, identify and compare higher order topological characteristics of neuronal behaviors and activities under different (cognitive) circumstances. On the spike train level, it is difficult to compare the spiking behavior of different recording lengths and varying number of neurons in the neuronal systems during non-stationary brain activity. Different network sizes are hard to compare at the neuronal network level. In the future, we will propose an approach to mine the neuronal activity that focuses on identifying the compressed WMGM models and quantifying their evolution and the distances among various WMGM models corresponding to cognition tasks. More precisely, we can first reconstruct the generating measure from observed neuronal networks exhibiting or performing different cognitive behavior and quantify the changes in generators programming the neural behaviors through modifications in the model parameters θ. Subsequently, we can calculate the distance between two cognitive behaviors as a distance between the parameters between two WMGM models. Another important future work includes an implementation aimed at networks that are very sparse and the application of the WMGM to detect hierarchical community structures in complex systems such as cyber-physical systems. With the aid of the proposed WMGM, we are also looking into quantifying cognition given neuronal behaviors and neuron-glia (astrocyte) metabolic coupling and information processing under different cognitive tasks. In the future, when real-time brain activity monitoring is available, the WMGM can be extended to analyze time-varying complex networks generators of label free real-time imaging of neuron-glia activity. This could represent a major step towards a comprehensive understanding of non-Markovian learning and decision making and other brain cognitive functions. With respect to the 4D Nucleome networks, future work will focus on constructing more robust strategies for identifying the TADs with applications to Hi-C analysis.

Methods

Parameter estimation of WMGM

In this section, we discuss how to recover the parameters for the WMGM via a variational approach. We first provide a probabilistic description of a weighted network within the WMGM framework. Let R denote the N-by-N adjacency matrix of the weight category in the graph. Recall that ϕ is the latent node attribute indicator. The probabilistic description of nodes and edges are given by $p(\phi ;l)=\mathop{\prod }\nolimits_{u = 1}^{N}\mathop{\prod }\nolimits_{q = 1}^{{M}^{K}}{\left({l}_{q}^{(K)}\right)}^{{\phi }_{uq}}$ and $p(R| \phi ;p)={\prod }_{u\,{ < }\,v}\mathop{\prod }\nolimits_{q,h = 1}^{{M}^{K}}{\left({p}_{qh}^{(K)}({r}_{uv})\right)}^{{\phi }_{uq}{\phi }_{vh}}$. Here, we focus on undirected graphs without self-loops. Directed graphs and graphs containing self-loops can also be expressed by changing the summation condition over u, v. In this work, we view M and K as hyperparameters; they are selected prior to the inference procedure.

Given an observed weighted network, we seek to estimate the underlying network generating function θ⁽¹⁾ by maximizing the likelihood function ${{{{{\mathscr{L}}}}}}(\theta )$ on the left of the following:

$${{{{{\mathscr{L}}}}}}(\theta )={{{{\mathrm{log}}}}}\,p(R;\theta )={{{{\mathrm{log}}}}}\,\mathop{\sum}\limits_{\phi }Q(\phi )\frac{p(R,\phi ;\theta )}{Q(\phi )}={{{{\mathrm{log}}}}}\,{{\mathbb{E}}}_{Q}\left\{\frac{p(R,\phi ;\theta )}{Q(\phi )}\right\}\ge {{\mathbb{E}}}_{Q}\left\{{{{{\mathrm{log}}}}}\,\frac{p(R,\phi ;\theta )}{Q(\phi )}\right\}$$

(2)

However, the summation over ϕ makes the log-likelihood ${{{{{\mathscr{L}}}}}}(\theta )$ intractable. Therefore, instead of maximizing the log-likelihood, we alternatively aim to maximize the evidence lower bound ${{\mathbb{E}}}_{Q}\left\{{{{{\mathrm{log}}}}}\,\frac{p(R,\phi ;\theta )}{Q(\phi )}\right\}$ (the right-hand side above). In order to minimize the gap between the log-likelihood and its lower bound, which is the Kullback–Leibler divergence from P(ϕ∣R; θ) to Q(ϕ), we choose the distribution over ϕ to be $Q(\phi )=\mathop{\prod }\nolimits_{u = 1}^{N}\mathop{\prod }\nolimits_{q = 1}^{{M}^{K}}{{\tau }_{uq}}^{{\phi }_{uq}}$, where the variational parameters τ_uq measure the soft assignments of node u, $\mathop{\sum }\nolimits_{q = 1}^{{M}^{K}}{\tau }_{uq}=1$ for u = 1…N. This is known as the mean-field approach in variational inference³⁴. Therefore, the lower bound of ${{{{{\mathscr{L}}}}}}(\theta )$ can be computed as

$${{{{{{\mathscr{L}}}}}}}_{Q}(\theta ,R)= \, \mathop{\sum}\limits_{\phi }Q(\phi ){{{{\mathrm{log}}}}}\,P(\phi ;l)+\mathop{\sum }\limits_{\phi }Q(\phi ){{{{\mathrm{log}}}}}\,P(R| \phi ;p)-\mathop{\sum }\limits_{\phi }Q(\phi ){{{{\mathrm{log}}}}}\,Q(\phi )\\ = \,\mathop{\sum }\limits_{u = 1}^{N}\mathop{\sum }\limits_{q = 1}^{{M}^{K}}{\tau }_{uq}{{{{\mathrm{log}}}}}\,({l}_{q}^{(K)})+\mathop{\sum }\limits_{u\,{ < }\,v}\mathop{\sum }\limits_{q,h = 1}^{{M}^{K}}{\tau }_{uq}{\tau }_{vh}{{{{\mathrm{log}}}}}\,\left({p}_{qh}^{(K)}({r}_{uv})\right)-\mathop{\sum }\limits_{u = 1}^{N}\mathop{\sum }\limits_{q = 1}^{{M}^{K}}{\tau }_{uq}{{{{\mathrm{log}}}}}\,({\tau }_{uq}).$$

(3)

Algorithm 1: Reconstructing the WMGM through a variational EM algorithm

input: N, M, K and adjacency matrix R in weight category

output: p⁽¹⁾, l⁽¹⁾

parameter sweep: M = 1, 2, 3 and K = 1, 2, 3, 4

initialization: p⁽¹⁾, l⁽¹⁾, τ

repeat

E-step: update τ by p⁽¹⁾ and l⁽¹⁾

repeat

${\tau }_{uq}\leftarrow {\lambda }_{u}{l}_{q}^{(K)}\exp \left\{\mathop{\sum }\nolimits_{v\ne u}\mathop{\sum }\nolimits_{h}{\tau }_{vh}{{{{\mathrm{log}}}}}\,\left({p}_{qh}^{(K)}({r}_{u}v)\right)\right\}$

until τ converges;

M-step: update p⁽¹⁾ and l⁽¹⁾ by τ

${l}_{i}^{(1)}=\frac{1}{NK}\mathop{\sum }\nolimits_{u,q}{\tau }_{uq}\mathop{\sum }\nolimits_{k = 1}^{K}{\mathbb{1}}\left\{q(k)=i\right\}$

repeat

$$\frac{\partial {{{{{{\mathscr{L}}}}}}}_{Q}(\theta )}{\partial {p}_{ij}^{(1)}}\;= \;\frac{1}{{p}_{ij}^{(1)}}\mathop{\sum}\nolimits_{q,h}\mathop{\sum}\nolimits_{u\ne v}\Bigg\{{\tau }_{uq}{\tau }_{vh}\left({r}_{uv}-\frac{{p}_{qh}^{(K)}}{1-{p}_{qh}^{(K)}}\right)\\ \;\times \mathop{\sum }\nolimits_{k = 1}^{K}{\mathbb{1}}\left\{q(k)\;=\;i,h(k)\;=\;j\right\}\Bigg\}$$

${p}_{ij}^{(1)}\leftarrow {p}_{ij}^{(1)}+\gamma \frac{\partial {{{{{{\mathscr{L}}}}}}}_{Q}(\theta )}{\partial {p}_{ij}^{(1)}}$

until p(1) converges;

until log-likelihood converges;

The inference procedure is then performed via variational expectation maximization (EM), as shown in Algorithm 1. In the E-step, given the parameters l⁽¹⁾ and p⁽¹⁾, we maximize Eq. (3) with respect to τ. We update τ_uq by a fixed point iteration following a similar strategy as in³⁴. λ_u is the normalization factor to satisfy the constraint $\mathop{\sum }\nolimits_{q = 1}^{{M}^{K}}{\tau }_{uq}=1$ in Algorithm 1. In the M-step, with τ obtained from the E-step, we maximize ${{{{{{\mathscr{L}}}}}}}_{Q}(\theta ,R)$ with respect to the parameters l⁽¹⁾ and p⁽¹⁾. In Eq. (3), the terms regarding l and p are written separately. Therefore, they are independently updated. ${l}_{i}^{(1)}$ can be analytically computed by setting the partial derivatives to zero. Moreover, the ${p}_{ij}^{(1)}$ are numerically computed by the gradient method with step length γ. q(k) and h(k) are the kth index of decomposing q and h (a ‘reverse’ process of taking the Kronecker products). $q(k)=\left(\lfloor \frac{q-1}{{M}^{k-1}}\rfloor \,{{{{\mathrm{mod}}}}}\,\,M\right)+1$. To avoid confusion between θ⁽¹⁾ and θ^(K), we use i and j for indices in θ⁽¹⁾ and q, h in θ^(K). Finally, we also note that in this section, all summations over nodes u and v are taken from 1 to N. Summations of i, j are from 1 to M and summations of q, h are taken from 1 to M^K. Note that M, K are considered as hyperparameters in the variational EM framework. We perform more analysis on the choice of best hyperparameters M, K in Supplementary Notes 4 and 6.

Multifractal analysis of WMGM

Next, we use our proposed WMGM to analytically compute the statistical physics inspired and multifractal metrics, such as the partition function, the Lipschitz-Hölder exponent, and the multifractal spectrum. For simplicity, we first reshape the linking probabilities ${\left\{{p}_{ij}^{(1)}\right\}}_{i,j = 1:M}$ in θ⁽¹⁾ as ${\left\{{p}_{i}\right\}}_{i = 1:{M}^{2}}$. We also reshape the area of each sub-rectangle in the unit square ${\left\{{l}_{i}^{(1)}{l}_{j}^{(1)}\right\}}_{i,j = 1:M}$ as ${\left\{{a}_{i}\right\}}_{i = 1:{M}^{2}}$. Following^20,35, the partition function of the model at an average sub-block size $\epsilon ={(\frac{1}{M})}^{2K}$ can be written as

$${Z}_{\epsilon }(q)=\mathop{\sum}\limits_{{\left\{{k}_{j}\right\}}_{j = 1:{M}^{2}}}\left(\begin{array}{cc}{K}\\ {{k}_{1}\ldots {k}_{j}\ldots {k}_{{M}^{2}}}\end{array}\right){\left\{\mathop{\prod }\limits_{i = 1}^{{M}^{2}}{[{a}_{i}{p}_{i}]}^{{k}_{i}}\right\}}^{q}={\left[\mathop{\sum }\limits_{i,j = 1}^{M}{({l}_{i}^{(1)}{l}_{j}^{(1)}{p}_{ij}^{(1)})}^{q}\right]}^{K}.$$

(4)

In Eq. (4), $\left(\begin{array}{cc}{K}\\ {{k}_{1}\ldots {k}_{j}\ldots {k}_{{M}^{2}}}\end{array}\right)$ is the number of sub-blocks which have the same area $\mathop{\prod }\nolimits_{i = 1}^{{M}^{2}}{{a}_{i}}^{{k}_{i}}$ and linking probability $\mathop{\prod }\nolimits_{i = 1}^{{M}^{2}}{{p}_{i}}^{{k}_{i}}$. ${\left\{{k}_{i}\right\}}_{i = 1:{M}^{2}}$ is subjected to $\mathop{\sum }\nolimits_{i = 1}^{{M}^{2}}{k}_{i}=K$. $\mathop{\prod }\nolimits_{i = 1}^{{M}^{2}}{[{a}_{i}{p}_{i}]}^{{k}_{i}}$ is the proportion of edges which are generated under linking probability in those sub-blocks.

In the multifractal analysis, multifractal metrics are calculated based on the partition function Z_ϵ(q), where q is the order of the moment. The mass exponent is given by

$$\tau (q)=\frac{{{{{\mathrm{log}}}}}\,{Z}_{\epsilon }(q)}{{{{{\mathrm{log}}}}}\,\epsilon }=\frac{K}{{{{{\mathrm{log}}}}}\,\epsilon }{{{{\mathrm{log}}}}}\,\left\{\mathop{\sum }\limits_{i,j=1}^{M}{({l}_{i}^{(1)}{l}_{j}^{(1)}{p}_{ij}^{(1)})}^{q}\right\}$$

(5)

The Lipschitz–Hölder exponent (refer to coarse Hölder exponent or singularity index in some other scientific works) is defined as $\alpha (q)=\frac{{{{\rm{d}}}}\tau (q)}{{{{\rm{d}}}}q}$. The multifractal spectrum reads f(α) = α(q)q − τ(q). Here, we provide the expression for the Lipschitz–Hölder exponent:

$$\alpha (q)=\frac{1}{{Z}_{\epsilon }(q){{{{\mathrm{log}}}}}\,\epsilon }\left\{K{\left\{\mathop{\sum }\limits_{i,j = 1}^{M}\left[{\left.{l}_{i}^{(1)}{l}_{j}^{(1)}{p}_{ij}^{(1)}\right]}^{q}\right)\right\}}^{K-1}\mathop{\sum }\limits_{i,j=1}^{M}{\left[{l}_{i}^{(1)}{l}_{j}^{(1)}{p}_{ij}^{(1)}\right]}^{q}{{{{\mathrm{ln}}}}}\,\left[{l}_{i}^{(1)}{l}_{j}^{(1)}{p}_{ij}^{(1)}\right]\right\}.$$

(6)

When the values of the order of the moment q takes q = −q₀: dq: q₀, the width of the multifractal spectrum can be defined and calculated as dα = α(q)_max − α(q)_min = α(q₀) − α(−q₀). The center of the multifractal spectrum is located at α_center = α(0).

Data availability

The data supported the results in the study is from public dataset Drosophila connectome https://www.janelia.org/project-team/flyem/hemibrain and Hi-C chromosomal interaction https://aidenlab.org/juicebox/.

Code availability

Source code is available at https://github.com/ruocheny/Weighted-Multifractal-Graph-Model.

References

Erdős, P. & Rényi, A. On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci 5, 17–60 (1960).
MathSciNet MATH Google Scholar
Barabási, A.-L. & Albert, R. Emergence of scaling in random networks. Science 286, 509–512 (1999).
Article ADS MathSciNet MATH Google Scholar
Watts, D. J. & Strogatz, S. H. Collective dynamics of ‘small-world’ networks. Nature 393, 440 (1998).
Article ADS Google Scholar
Song, C., Havlin, S. & Makse, H. A. Self-similarity of complex networks. Nature 433, 392 (2005).
Article ADS Google Scholar
Gallos, L. K., Song, C. & Makse, H. A. A review of fractality and self-similarity in complex networks. Phys. A: Statistical Mech. Appl. 386, 686–691 (2007).
Article ADS Google Scholar
Xue, Y. & Bogdan, P. Reliable multi-fractal characterization of weighted complex networks: algorithms and implications. Sci. Rep. 7, 7487 (2017).
Article ADS Google Scholar
Lynn, C. W. & Bassett, D. S. The physics of brain network structure, function and control. Nat. Rev. Phys. 1, 318 (2019).
Article Google Scholar
Rutledge, M. T., Russo, M., Belton, J.-M., Dekker, J. & Broach, J. R. The yeast genome undergoes significant topological reorganization in quiescence. Nucl. Acids Res. 43, 8299–8313 (2015).
Article Google Scholar
Pigolotti, S., Jensen, M. H., Zhan, Y. & Tiana, G. Bifractal nature of chromosome contact maps. Phys. Rev. Res. 2, 043078 (2020).
Article Google Scholar
Leskovec, J., Chakrabarti, D., Kleinberg, J., Faloutsos, C. & Ghahramani, Z. Kronecker graphs: An approach to modeling networks. J. Mach. Learning Res. 11, 985–1042 (2010).
MathSciNet MATH Google Scholar
Ravasz, E. & Barabási, A.-L. Hierarchical organization in complex networks. Phys. Rev. E 67, 026112 (2003).
Article ADS Google Scholar
Ravasz, E., Somera, A. L., Mongru, D. A., Oltvai, Z. N. & Barabási, A.-L. Hierarchical organization of modularity in metabolic networks. Science 297, 1551–1555 (2002).
Article ADS Google Scholar
Mandelbrot, B. B. The Fractal Geometry of Nature, Vol. 173 (WH Freeman, 1983).
Mandelbrot, B. B. in Fractals in Geophysics, (eds Scholz, C. H. & Mandelbrot, B. B.) 5–42 (Springer, 1989).
Dan-Ling, W., Zu-Guo, Y. & Anh, V. Multifractal analysis of complex networks. Chin. Phys. B 21, 080504 (2012).
Article Google Scholar
Yin, C. et al. Network science characteristics of brain-derived neuronal cultures deciphered from quantitative phase imaging data. Sci. Rep. 10, 1–13 (2020).
Article Google Scholar
Vázquez, A., Flammini, A., Maritan, A. & Vespignani, A. Modeling of protein interaction networks. Complexus 1, 38–44 (2003).
Article Google Scholar
Song, C., Havlin, S. & Makse, H. A. Origins of fractality in the growth of complex networks. Nat. Phys. 2, 275 (2006).
Article Google Scholar
Song, C., Gallos, L. K., Havlin, S. & Makse, H. A. How to calculate the fractal dimension of a complex network: the box covering algorithm. J. Statistical Mech.: Theory Exp. 2007, P03006 (2007).
Article Google Scholar
Yang, R. & Bogdan, P. Controlling the multifractal generating measures of complex networks. Sci. Rep. 10, 1–13 (2020).
Google Scholar
Dorogovtsev, S. N., Goltsev, A. V. & Mendes, J. F. F. Pseudofractal scale-free web. Phys. Rev. E 65, 066122 (2002).
Article ADS Google Scholar
Dorogovtsev, S. N., Mendes, J. F. F. & Samukhin, A. Multifractal properties of growing networks. EPL (Europhys. Lett.) 57, 334 (2002).
Article ADS Google Scholar
Palla, G., Lovász, L. & Vicsek, T. Multifractal network generator. Proc. Natl Acad. Sci. 107, 7640–7645 (2010).
Article ADS Google Scholar
Leskovec, J., Chakrabarti, D., Kleinberg, J. & Faloutsos, C. Realistic, mathematically tractable graph generation and evolution, using kronecker multiplication. In European Conference on Principles of Data Mining and Knowledge Discovery (eds Jorge, A., Torgo, L., Brazdil, P., Camacho, R. & Gama, J.) 133–145 (Springer, 2005).
Leskovec, J. & Faloutsos, C. Scalable modeling of real graphs using kronecker multiplication. In Proc. 24th International Conference on Machine learning (ed. Ghahramani, Z.) 497–504 (ACM, 2007).
Kim, M. & Leskovec, J. Modeling social networks with node attributes using the multiplicative attribute graph model. In Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence. AUAI Press, Arlington, Virginia, USA 400–409.
Kim, M. & Leskovec, J. Multiplicative attribute graph model of real-world networks. Internet Math. 8, 113–160 (2012).
Article MathSciNet Google Scholar
Xu, C. S. et al. A connectome of the adult drosophila central brain. BioRxiv https://doi.org/10.1101/2020.01.21.911859 (2020).
Mitter, M. et al. Conformation of sister chromatids in the replicated human genome. Nature 586, 139–144 (2020).
Erdos, P. & Renyi, A. On random graphs. Publ. Math. Debrecen 6, 290–297 (1959).
Article MathSciNet MATH Google Scholar
Pal, K., Forcato, M. & Ferrari, F. Hi-c analysis: from data generation to integration. Biophys. Rev. 11, 67–78 (2019).
Article Google Scholar
Robinson, J. T. et al. Juicebox. js provides a cloud-based visualization system for hi-c data. Cell Systems 6, 256–258 (2018).
Article Google Scholar
Betzel, R. F. & Bassett, D. S. Multi-scale brain networks. Neuroimage 160, 73–83 (2017).
Article Google Scholar
Daudin, J.-J., Picard, F. & Robin, S. A mixture model for random graphs. Statistics Comput. 18, 173–183 (2008).
Article MathSciNet Google Scholar
Cheng, Q. Generalized binomial multiplicative cascade processes and asymmetrical multifractal distributions. Nonlinear Process. Geophys. 21, 477–487 (2014).
Article ADS Google Scholar

Download references

Acknowledgements

The authors gratefully acknowledge the support by the National Science Foundation Career award under Grant No. CPS/CNS-1453860, the NSF award under Grant CCF-1837131, MCB-1936775, CNS-1932620, the U.S. Army Research Office (ARO) under Grant No. W911NF-17-1-0076, the Okawa Foundation award, and the Defense Advanced Research Projects Agency (DARPA) Young Faculty Award and DARPA Director Award under Grant No. N66001-17-1-4044, and a Northrop Grumman grant. The views, opinions, and/or findings contained in this article are those of the authors and should not be interpreted as representing the official views or policies, either expressed or implied by the Defense Advanced Research Projects Agency, the Air Force Research Lab, the Department of Defense or the National Science Foundation.

Author information

Authors and Affiliations

University of Southern California, Ming Hsieh Department of Electrical and Computer Engineering, Los Angeles, CA, USA
Ruochen Yang & Paul Bogdan
University of Wisconsin-Madison, School of Computer Sciences, Madison, WI, USA
Frederic Sala

Authors

Ruochen Yang
View author publications
You can also search for this author in PubMed Google Scholar
Frederic Sala
View author publications
You can also search for this author in PubMed Google Scholar
Paul Bogdan
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

R.Y., F.S., and P.B. designed the research study. R.Y. and F.S. wrote the codes and conducted to the simulations. All authors contributed to the applications, results analysis, and manuscript writing.

Corresponding author

Correspondence to Paul Bogdan.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Communications Physics thanks Arian Ashourvan and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Yang, R., Sala, F. & Bogdan, P. Hidden network generating rules from partially observed complex networks. Commun Phys 4, 199 (2021). https://doi.org/10.1038/s42005-021-00701-5

Download citation

Received: 29 March 2021
Accepted: 09 August 2021
Published: 01 September 2021
DOI: https://doi.org/10.1038/s42005-021-00701-5

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Inferring gene regulatory networks from single-cell multiome data using atlas-scale external data

Spatially organized cellular communities form the developing human heart

Single-cell multiplex chromatin and RNA interactions in ageing human brain

Introduction

Results

Weighted-multifractal graph model

Learning the hidden network generators (rules) from partial and noisy observations of synthetic networks

Network generator reconstruction

Reconstruction of network generator from partial observations

Reconstructing the network generators from noisy observations and quantifying the reconstructed link reliability

Learning the hidden network generators (rules) of biological networks

Revealing the network generators of Drosophila brain connectome

Inferring the network generators of chromosomal interactions of yeast genome in different growing states

Network generator inference and analysis for the conformation maps of replicated human chromosomes

Discussion

Methods

Parameter estimation of WMGM

Multifractal analysis of WMGM

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Supplementary information

Supplementary Information

Rights and permissions

About this article

Cite this article

Share this article

Comments

Search

Quick links