Introduction

Mining the topological complexity of networks must go beyond estimating statistical network metrics (e.g., degree1,2, clustering coefficient3, path-length distribution) or measuring the network’s geometric4,5,6 properties. Instead, we must elucidate the underlying hidden heterogeneous rules that govern the emergence and dynamics of complex networks. For instance, the interactions between brain regions or neurons (topological structures) generate cognitive functional states, but challenges remain in understanding the brain wiring mechanism and the rules related to cognitive processes in network neuroscience7. Furthermore, topological analysis of yeast chromatin maps reveals a transition from intra- to inter-chromosomal interactions when the yeast undergoes different growing states8, but fails to identify the network generators or rules corresponding to this transition. Moreover, multifractal topological analysis reveals that chromosomal interactions are bifractal9. While these multifractal network analysis efforts can detect subtle conformational changes among complex network components (e.g., chromosomes in human stem cells), they fail to explain the emergence and evolution of networks, identify their general set of hidden network generators, and explain how small changes to these generators can lead to exhibited or enhanced complex behavior. Aside from chromosomal interactions, it has also been proven that various real networks possess a hierarchically organized (self-similar) community structure which grows recursively and copies themselves10,11,12,13,14,15. For example, neuronal culture networks16 and protein interaction networks17 also possess complex multifractal behaviors.

Multifractality has been studied as a topological feature with box covering/box counting methods4,18. The scaling behavior is examined by a renormalization process which coarse-grains the network into boxes6,19. However, commonly used approaches, like renormalization group-inspired algorithms (such as box covering, sandbox) fail to illustrate the network emergence20. Various graph models have been proposed to model the growing scale-free properties and multifractal degree distribution by self-repeatably inheriting and adding nodes and connections21,22. Nevertheless, the multifractality that exists on a topological level cannot uncover the hidden community structure or the generator rules.

Unlike exploring and measuring the multifractality of various topological structures, we focus on identifying the underlying network generating functions and developing a general mathematical framework together with efficient algorithms to mine the multifractality encoded in the node attributes and the weighted interactions among nodes. The network generating function can provide high level and condensed description of complex systems. Uncovering the generating rules will enable us not only to generate synthetic graph structures with different topological properties, but also reproduce and mine their topological complexity and heterogeneity. The probabilistic description of the generating function should also help to explore the validity of links in a noisy graph and apply to various scenarios. To the best of our knowledge, a robust and general framework of multifractal generating model along with comprehensive analysis is lacking. Although the multifractal network generators23 can generate networks with multifractal properties and any given graph metrics, the simulated-annealing based parameter estimation is not robust and the model is limited to binary graphs. The stochastic kronecker graph model10,24,25 is also capable to capture self-similarity, but the network size is required to be related to the model and the heterogeneity in node attributes is neglected. The multiplicative attribute graph model generalizes the two aforementioned models and characterizes the node attributes in social networks26,27. Though the model formulation is general, the estimation algorithm targets only binary node attributes.

To address these research gaps and better understand the complexity and multifractality of real-world networks, one must address the following major challenges: (1) How can we construct a general multifractal network generating model capable of not only capturing the observed multifractal characteristics, but also provide mathematical tools for efficiently investigating and engineering their macroscopic properties? (2) How can we efficiently and correctly reconstruct the underlying network generator model? (3) Can we recover the weighted multifractal network generative model from incomplete (partial) observations and noisy or adversarial data/influences? (4) Do such techniques scale up to real-world networks and enable us to study whether multifractality appears in real-world applications such as the brain connectomes and chromosomal interactions?

To answer these questions, here we propose a weighted multifractal graph model (WMGM) constructed recursively from generative measures of linking probabilities and capable to capture the multifractality and weighted heterogeneity of functional interactions. To clarify the difference between characterizing the multifractality in topological structure and in the network generating model, we specify that the functional level and model level terms refer to analyzing networks through their reconstructed generative model. In contrast, the graph level term means that we are examining the properties of the network topology. To efficiently learn the parameters of the network generating model, we provide a rigorous variational inference framework capable of reconstructing the underlying multifractal network generator for partially observed networks. This inference method can deal with networks of arbitrary sizes and any attribute cardinality; it also offers a robust parameter estimation. We examine our proposed approach on both synthetic and real-world networks. We show that the proposed model can characterize and reproduce many graph properties (i.e., degree, clustering coefficient, weight distribution). We present the efficiency and robustness of the proposed model and inference method against incomplete and noisy observations. By applying the network generator inference framework to real-world datasets (e.g., brain networks, chromatin interactions), we reveal the hidden structure of complex systems at the functional level. The results indicate that the WMGM is capable of differentiating between various structures in brain networks and in chromatin interactions. We further show that the proposed inference algorithm can help to detect topologically associating domains (TADs) in chromosomal interaction maps.

Results

Weighted-multifractal graph model

We propose the weighted multifractal graph model (WMGM). It is meant to serve as a generalized network generating rule that captures the observed multifractal properties associated with node attributes and the heterogeneity in weights (intensities of pairwise interactions). Building on measure theory concepts, the crux of this multifractal network generating model is to construct a series of probabilities that we associate with the side lengths of rectangles that are recursively built up by repeatedly splitting a unit square. This ensures a heterogeneous self-similar network structure. These probabilities are then used to generate the node attributes and edges for the network.

We first define an initial generating measure \({\theta }^{(1)}=\left({l}^{(1)},{p}^{(1)}\right)\) on a unit square. The rationale for considering a unit square is to ensure that the probability mass function of node attributes sums to 1. Next, the unit square is divided into M2 rectangles, where \({\{{l}_{i}^{(1)}\}}_{i = 1}^{M}\) are the side lengths of each rectangle. To these rectangles, we assign the probabilities \({\{{p}_{ij}^{(1)}\}}_{i,j = 1}^{M}\). We consider symmetric p(1) terms in this work, but as an extension, we could permit the asymmetric case which can model directed networks. Along the same lines as in the multifractal network generator20,23, the self-similar WMGM \({\theta }^{(K)}=\left({l}^{(K)},{p}^{(K)}\right)\) is formulated recursively from this unit square θ(1) with interval length l(K) = l(K−1)l (1) and linking probability p(K) = p(K−1)p(1). Here  denotes the Kronecker product.

An undirected weighted graph is then generated by the following procedure: (1) N nodes are spread into MK classes with prior probabilities l(K). The indicator variable ϕuq denotes the label indicating whether a node u has attribute q. Note that ϕuq = 0 or 1, \(\mathop{\sum }\nolimits_{q = 1}^{{M}^{K}}{\phi }_{uq}=1\). The attributes follow the categorical distribution \(P\{{\phi }_{uq}=1\}={l}_{q}^{(K)}\), q = 1…MK. (2) The edges between nodes u and v are generated with a linking probability p(K). Let \({\{w(r)\}}_{r = 0}^{\infty }\) denote the predefined weight distribution, where w(0) = 0 and \(\left\{w(r)\right\}\) is monotonically increasing. The probability that an edge between node u and v has weight w(r) is given by \(P\{{e}_{uv}=w(r)| {\phi }_{uq}=1,{\phi }_{vh}=1\}={\left({p}_{qh}^{(K)}\right)}^{r}\left(1-{p}_{qh}^{(K)}\right)\). For simplicity, we denote it as \({p}_{qh}^{(K)}({r}_{uv})\), where ruv is the weight category r of the edge between node u and v. Here, the probability that an edge does not exist is \({p}_{qh}^{(K)}(0)=1-{p}_{qh}^{(K)}\) and the chance that an edge (regardless of the weight) exists is \(\mathop{\sum }\nolimits_{r = 1}^{\infty }{p}_{qh}^{(K)}(r)={p}_{qh}^{(K)}\). It naturally satisfies \(\mathop{\sum }\nolimits_{r = 0}^{\infty }{p}_{qh}^{(K)}(r)=1\) and can be easily mapped to unweighted graphs. In contrast to20, where the linking probabilities p(1)(r) are determined for each weight level r, we design the edge distribution \(P\{{e}_{uv}=w(r)| {\phi }_{uq}=1,{\phi }_{vh}=1\}\) as the geometric distribution with \({p}_{qh}^{(1)}\) identical to all weights. The rationale is that smaller weights are more common. We also aim at using fewer parameters to capture the heterogeneous graph structure.

The multifractality of the model emerges from the recursive construction. The derivation of the partition function and the multifractal metrics are presented in the method section Multifractal analysis of WMGM. Special cases of the proposed model correspond to several related models. When MK = N, the proposed weighted multifractal graph model retrieves the Kronecker model10 as a particular case. When the weight is neglected (i.e., total weight level R = 1), the proposed model reduces to the multifractal network generator23.

Figure 1a shows the numerical example of the model building procedure and graph generation. The model θ(2) in the middle is constructed by θ(1) shown on the left-hand side at the first iteration. In the simulated graph, the colors of nodes represent the attributes generated by l(2) and the weights of the links are generated by p(2). Figures 1b and c show two different models which generate networks with different community structures. In the generating model θ(K), the linking probability p(K) approximates the connection rules and community structure. Larger value of \({p}_{qh}^{(K)}\) leads to denser connection between nodes in community q and h. If \({p}_{qh}^{(K)}\) is even across different q, h, it suggests relatively ambiguous heterogeneity (Fig. 2a). When \({p}_{qh}^{(K)}\to 1\), each pair of nodes in community q and h has a connection and thus it creates a fully connected subgraph (Fig. 2b). When \({p}_{qh}^{(K)}\to 0\), no connection may exist between category q and h, leading to an m-partite structure (q = h, Fig. 2c) or a community structure (q ≠ h, Fig. 2d).

Fig. 1: Model illustration and examples.
figure 1

a Illustration of weighted multifractal graph model (WMGM) and graph generation for initial model size M = 2, model iteration K = 2. The initial generating measure θ(1) is given by node attribute probability l(1) = [0.4, 0.6] and linking probability p(1) = [0.2, 0.25; 0.25, 0.15]. Model θ(2) obtained after 1 iteration reads: l(2) = [0.16, 0.24, 0.24, 0.36] and p(2) = [0.04, 0.05, 0.05, 0.0625; 0.05, 0.03, 0.0625, 0.0375; 0.05, 0.0625, 0.03, 0.0375; 0.0625, 0.0375, 0.0375, 0.0225]. Height of each 3D bar corresponds with the value of linking probability. We use different color to differentiate diverse regions in the model block. The graph on the right is generated by θ(2). Color of node in the right graph represents the hidden node attribute. For example, the circled blue-colored node has attribute indexed by 4 and is generated under node attribute probability \({l}_{4}^{(2)}=0.36\). The pair of nodes circled on this graph represent two nodes of attributes indexed with 1 and 4, respectively. Their corresponding link is generated by linking probability \({p}_{14}^{(2)}=0.0625\). b and c Different models generate networks with different topological structures. The model in b has high intra-attribute linking probability and low inter-attribute linking probability. This model leads to a two community structure network. In contrast, the linking probability in c is relatively closer and the community structure is ambiguous. Here the superscript refers to the number of iteration when building the model. Subscripts refers to the attribute index.

Fig. 2: Networks generated by different linking probability p in the weighted multifractal graph model (WMGM).
figure 2

By choosing p, the WMGM can control the topology of generated networks. a p = [0.5, 0.5; 0.5, 0.5]. b p = [1, 1; 1, 1]. c p = [0.01, 0.2; 0.2, 0.01]. d p = [0.8, 0.02; 0.02, 0.8].

To overcome the challenges related to mining large-scale complex systems (e.g., heterogeneity in weights, scale-dependent higher order interactions), we investigate how the proposed WMGM can decipher the hidden rules that govern their complex topological architecture and functionality. Indeed, mapping a large network to a generative model can contribute to losing some intricate details of subnetworks and their interactions. However, reconstructing a function level model can compress unnecessary redundant information and allow us to deal with incomplete or noisy data, which is common in real-world datasets. Consequently, the WMGM can learn the hidden rules that govern structures in complex systems such as brain networks (e.g., understanding and interpreting the emergence of neuronal computation in brain networks), biological systems (e.g., understanding the emerging genotype-phenotype relationships) and social networks. To investigate the benefits and limitations of the WMGM, we evaluate its capabilities in reconstructing the network generating models from scarce noisy observations on a series of artificially generated and real-world networks. First, we validate our method on synthetic data in terms of convergence and estimation error. Next, because real-world networks are usually only partially observed and often noisy, we investigate the robustness of our approach to such factors. We also apply our method to three real-world complex networks, namely the brain connectome of Drosophila, the chromosome interactions of yeast cells undergoing different growth states, and the conformation maps of replicated human chromosomes. The results show that our method can reproduce and elucidate important properties of real-world complex systems.

Learning the hidden network generators (rules) from partial and noisy observations of synthetic networks

To validate the ability of the proposed WMGM framework to reconstruct the ground-truth generators and to understand its estimation error, we examine three synthetic network case studies, from least to most challenging: (i) Clean fully observed networks, (ii) Threshold-varying partially observed networks, and (iii) Noisy networks with spurious edges. In the case of fully observed network setting, we demonstrate that our model can successfully reconstruct the ground-truth generator and reproduce the graph properties of the synthetic network. We also show that the WMGM inference is robust up to a certain level of missing observations in the case of partially observed networks, and can handle noise by distinguishing between spurious and true links in the last case.

Network generator reconstruction

We first examine the network generator reconstruction accuracy and the ability of the WMGM to recover simulated graph properties. We use a synthetic graph Gsyn of 500 nodes generated by l(1) = [0.7, 0.3], p(1) = [0.8, 0.5; 0.5, 0.4] with hyperparameters M = 2, K = 3 and a predefined weight set \(\left\{w(r)=r\right\}\). We implement the variational expectation maximization (EM)-based estimation method (described in the Methods section Parameter estimation of WMGM) on the fully observed simulated network with the step length γ = 10−7 of the gradient methods in M-step. The algorithm stops when the increment of the objective function (lower bound of log-likelihood) after one EM iteration is smaller than 0.1. Figures 3a and b show the convergence of the lower bound of the log-likelihood \({{{{{{\mathscr{L}}}}}}}_{Q}(\theta ,R)\) and the reconstructed parameters as the EM iteration proceeds, respectively. We note that the lower bound exhibits a fast convergence within the first 20 EM iterations and later slightly increases and converges after 120 iterations. The relative absolute error of the reconstructed lower bound \({{{{{{\mathscr{L}}}}}}}_{Q}(\theta ,R)\) and the true log-likelihood \({{{{{{\mathscr{L}}}}}}}_{{{{\rm{syn}}}}}(R)\) of the synthetic graph Gsyn is \(| ({{{{{{\mathscr{L}}}}}}}_{Q}(\theta ,R)-{{{{{{\mathscr{L}}}}}}}_{{{{\rm{syn}}}}}(R))/{{{{{{\mathscr{L}}}}}}}_{{{{\rm{syn}}}}}(R)| =0.0022\), which shows the recoverability of the proposed WMGM framework. Meanwhile, the estimated parameters show a similar trend and converge to the ground-truth values. Figure 3c presents the mean relative absolute error (RAE) per parameters as a function of the EM iterations. It shows that the error decreases fast within the first 20 EM iterations, and when the small increment of the log-likelihood lower bound emerges at 100 EM iterations in Figure 3a, the error begins to drop sharply again. After 120 EM iterations, it achieves the minimum error of 0.013 (1.3% error per parameter), and when the algorithm terminates, the error is 0.032 (3.2% error per parameter). Figure 3b shows that the value of the recovered \({p}_{22}^{(1)}\) (yellow line) crosses the ground truth 0.4 at 120 EM iterations and then decreases by a small quantity. This suggests that we can early terminate the algorithm when the objective function starts to converge and achieve the best performance.

Fig. 3: Model reconstruction, graph metrics and estimation error.
figure 3

We investigate the estimation algorithm on a synthetic network of 500 nodes generated by the weighted multifractal graph model (WMGM) with model parameters: node attribute probability l(1) = [0.7, 0.3], linking probability p(1) = [0.8, 0.5; 0.5, 0.4], initial model size M = 2 and model iteration K = 3. We show the objective function (lower bound of log-likelihood) in a, the reconstructed parameters in b and the mean relative absolute error (RAE) per parameter in c as a function of the number of the expectation maximization (EM) iterations. The ground-truth parameters are shown in the legend of plot in b. The objective function converges and the parameters also converge to the ground-truth values. The mean RAE at the last iteration is 0.032 (3.2%). Clustering coefficients d, degree distribution e and weight distribution f of the simulated and the reconstructed graph. The reconstructed graph is generated by the recovered WMGM. Degree and clustering coefficients distributions are calculated after binarizing the simulated and the reconstructed graphs.

We also investigate the recoverability of the network structures through the proposed WMGM framework. In Fig. 3d–f, we compare the simulated and reconstructed network properties including the clustering coefficients, degree distribution, and weight distribution. The simulated distributions (blue lines) are directly calculated from the synthetic network Gsyn and the reconstructed results are calculated from a network Grecon generated by the recovered WMGM. We observe that the proposed estimation method can successfully reproduce the network properties of synthetic networks. In Supplementary Note 5 we further quantify the dissimilarity of the simulated and reconstructed graph properties, while in Supplementary Note 3 we include null models as comparison to show the efficiency of the estimation algorithm.

No guarantee exists that the EM algorithm converges to the maximum likelihood estimator. If the objective function is non-convex, the algorithm may terminate at or near a local optimum. The estimation accuracy is also related to the size of the network and its density. Consequently, we investigate the dependency between mean relative absolute error and the multifractal spectrum width of the model and other key properties in Fig. 4e. We select different levels of randomness in the model with varying positions of the multifractal spectrum, generate a graph of 200 nodes, then recover the model with same model initialization and measure the mean relative absolute error per parameter. We consider three cases: all parameters are randomly generated (in blue asterisks); \({p}_{12}^{(1)}={p}_{21}^{(1)}=0.5\) with random \({p}_{11}^{(1)}\), \({p}_{22}^{(1)}\), \({l}_{1}^{(1)}\), \({l}_{2}^{(1)}\) (in green dots); and \({l}_{1}^{(1)}={l}_{2}^{(1)}=0.5\), \({p}_{12}^{(1)}={p}_{21}^{(1)}=0.5\) with fixed center of multifractal spectrum (in red circles). The multifractal spectrum width is calculated as in Methods section Multifractal analysis of WMGM. Figure 4e shows that the random cases make such local minima particularly prominent.

Fig. 4: Model reconstruction from incomplete observations.
figure 4

a Reconstructed parameters of the network generator as a function of the number of nodes observed from the ground-truth network. The simulated graph with N0 = 100, 000 nodes is generated by parameters shown in the legend. N nodes are randomly chosen and only the links among them are used to reconstruct the weighted multifractal graph model (WMGM). We repeat the experiment 10 times with every N = 50, 100, …, 500. The mean value of each parameter and error is shown in solid lines and shaded area represents the standard deviation. b Mean relative absolute error (RAE) as a function of the number of observed nodes from the entire network. The network generative model can be correctly estimated with low variance when the number of observed nodes exceeds 200. c Fraction of observed nodes needed to achieve a mean RAE less than 0.035 (3.5%) and 0.05 (5%) with the full original network size N0 = 103, 5 × 103, 104, 5 × 104, 105. Dots and asterisks are the empirical results, and the solid lines are fitted by power law distributions. The curve is in log-log scale and shows a power law relationship. d 3D plot of the observation size N, the full network size N0 and the mean error of 10 experiments per (N, N0). It shows that the hidden generating rule can be recovered (error < 4%) with partial observation of 200 nodes or more, regardless of the original network size. The WMGM is non recoverable when observation size is smaller than 200 nodes (gray area). e The dependency between the mean RAE per parameter and multifractal spectrum width with different properties of the generative model (blue: all parameters are random; green: fixed linking probability \({p}_{12}^{(1)}={p}_{21}^{(1)}=0.5\) with all other parameters randomly generated; and red: fixed center of multifractal spectrum). In general, introducing randomness can increase the estimation error.

Reconstruction of network generator from partial observations

Complex networks are usually partially observed. This situation has many causes, including the following scenarios: (1) the network is still growing and new nodes can join in the future; (2) it is computationally expensive or technologically impossible to examine the whole network (e.g., all neurons in the human brain). Therefore, we investigate the ability of successfully reconstructing the ground-truth WMGM from partial observations.

For the partially observed experiments, we use a synthetic graph with N0 = 100, 000 nodes generated by l(1) = [0.7, 0.3], p(1) = [0.8, 0.5; 0.5, 0.4], M = 2, K = 3. At each time, we randomly select N of N0 nodes and take the connections among the selected N nodes as incomplete data. We repeat this process 10 times for each N and measure the recovered parameters. The mean and standard deviation of recovered parameters and error are shown in Fig. 4a, b against the number of nodes observed N. We find that the model is correctly recovered with low standard deviation at N = 200 or more nodes observed, where the mean relative absolute error (RAE) per parameter with N = 200 is 3.1% and the standard deviation is 0.5%. We conclude that the proposed WMGM and the inference method is robust against missing components in the system.

We repeat the experiments with different full original network sizes N0 = 103, 5 × 103, 104, 5 × 104, 105 and the same N. We calculate the mean RAE and report the minimum fraction of observed nodes f to achieve a small certain error in Fig. 4c. Both axes are in log scale. The blue dots represent mean RAE smaller than 0.035 (3.5%) and red asterisks represent mean RAE smaller than 0.050 (5.0%). They are well fitted by power laws (shown as solid lines). The regression for error smaller than 0.035 is \(f=332\times {N}_{0}^{-1.04}\) and for error less than 0.050 is \(f=95\times {N}_{0}^{-0.97}\). It shows that the required size of observation to achieve a certain small error is decreasing and follows a power law as the original network size grows. Figure 4d shows the relationship of the average error of 10 experiments with combinations of N = 50, 100, …, 500 and N0 = 103, 5 × 103, 104, 5 × 104, 105. The axis of N0 is set as log scale. The underlying generating model is recoverable when the partial observation contains more than 200 nodes, regardless of the original network size N0. This is critical, as in real-world complex systems, only partial observation without full monitoring and detection is possible. Since we use the WMGM as the generating rule and we assume the networks are partially but evenly observed, reconstruction with partial observation (a subgraph) can achieve good performance while saving on computational cost. It suggests that when dealing with very large networks, it is possible to correctly estimate the hidden generating rules even using a small subset of the network with only 200 nodes. We further perform more individual experiments to show the robustness in Supplementary Note 7.

Reconstructing the network generators from noisy observations and quantifying the reconstructed link reliability

We test the proposed WMGM and the estimation algorithm on noisy networks with spurious links. For this noisy setting, we first generate a synthetic binary graph with the same model as in section Reconstruction of network generator from partial observations. In the binary version, weights are neglected. Any edge in the weighted graph with euv ≠ 0 is considered as an edge in the binary version G0 of the graph. Next, spurious links are randomly added with probabilities p = 10−3, 3 × 10−3, 10−2, 3 × 10−2, 10−1, 3 × 10−1 between pairs of nodes where no edges exist in the original network, producing a noisy graph Gn. For each noise level p, we individually run the experiments 10 times. We call links that exist in the synthetic graph G0 and the noisy graph Gn as true positives, and links added in Gn are false positives. We aim at differentiating true positive and false positive links in noisy graphs. We first reconstruct the generative model θn from the noisy observation Gn. We define the link reliability of an edge euv in the noisy network with its likelihood given by the reconstructed model as \(L{R}_{uv}={{{{\mathrm{log}}}}}\,P({e}_{uv}| {G}_{n};{\theta }_{n})\). The link reliability of the link between node u and node v can be estimated as follows:

$$L{R}_{uv}\approx \mathop{\sum}\limits_{qh}{\tau }_{uq}{\tau }_{vh}{{{{\mathrm{log}}}}}\,{P}_{qh}^{(K)}({r}_{uv})+\mathop{\sum}\limits_{q}{\tau }_{uq}{{{{\mathrm{log}}}}}\,{l}_{q}^{(K)}+\mathop{\sum }\limits_{h}{\tau }_{vh}{{{{\mathrm{log}}}}}\,{l}_{h}^{(K)}.$$
(1)

Figure 5a shows the cumulative distribution of link reliability for different labels, true positive and false positive, with relative noise level p = 10−1. Spurious links (false positive) have lower reliability than true ones (true positives) in their distributions. The average link reliability of false positives is also smaller than the one with true positives. The distinctness implies that the WMGM is able to detect noise in observations and therefore can help to denoise graphs. We validate the ability of graph denoising and its application in Supplementary Note 1. Figure 5b, c shows the reconstructed model parameters and the estimation error for different noise levels p. More spurious links (noise) are added to the network when the noise level p increases. As a consequence, recovered parameter error also increases as the noise level grows. The curve shows that the estimation error is smaller than 0.05 (5%) with low noise level p < 10−1. The estimations are also robust (with low variance of reconstructed parameters and error) when p ≤ 10−1. Though the estimation error is relatively large (10%) when p = 10−1, Fig. 5a shows that even with relatively high-level noise, our model is still capable of distinguishing noise links and true links in the network. We also perform more repetitions to show the robustness in Supplementary Note 7.

Fig. 5: Model reconstruction from noisy observations.
figure 5

a Cumulative distribution of link reliability with different labels, true positive and false positive. Fake links are added to a synthetic graph with spurious linking probability p = 10−1 (noise level) and the model is estimated from this noisy graph. The portion of spurious links with low reliability is larger than the portion of true connections. b Reconstructed parameters with different noise level (spurious linking probability p = 10−3, 3 × 10−3, 10−2, 3 × 10−2, 10−1, 3 × 10−1). The estimations run 10 times with each noise level p and the mean and deviation are shown as lines and shaded areas respectively. The ground-truth parameters are presented in the legend. It shows that the estimated parameters increase as the noise level grows because more spurious links are added to the the original network. c Estimation errors with different noise level. Mean and standard deviation of the mean relative absolute error (RAE) are shown in the plot. The error is low (smaller than 5%) when the noise level p ≤ 3 × 10−2.

Learning the hidden network generators (rules) of biological networks

To demonstrate the capabilities and benefits of the proposed WMGM inference framework, we investigate and learn the network generators (rules) of the following three biological networks: (i) the neuronal connections in adult Drosophila central brain28, (ii) the chromatin interactions of yeast genome8, and (iii) the conformation maps of replicated human chromosomes29. We show that the WMGM enables us to reveal important properties of these biological datasets such as recovering their topological network properties, differentiating growing states, identifying specific features of brain structures in different regions, and detecting TADs. We also conduct experiments on various social networks. The results of social networks can be found in Supplementary Note 2.

Revealing the network generators of Drosophila brain connectome

We use the largest synaptic-level connectome obtained through a three photon microscopy from fruit fly brain28. Chemical synapses between neurons are detected and the numbers of synapses are calculated as the intensity of neuron connections. The original Drosophila connectome G0 of the left alpha lobe in the mushroom body consists of 10,790 neurons (nodes) and 6444 identified synaptic connections (edges). We delete neurons without connections and use the connected 693 nodes to construct a network G with the full 6444 connections and reconstruct the WMGM. We neglect the isolated nodes to avoid the extra computational cost and construct a relatively denser network to achieve better model estimation performance. When the network is large and sparse, the estimation tends to be unstable and inaccurate because we have very limited link observations. In Supplementary Note 9 we show the results with different node and edge sampling size. Note that the method of sampling nodes could influence the network topological structure and the reconstructed model. In the future, we will also investigate and develop strategies that allow us to select the minimum number of nodes required to accurately reconstruct the WMGM model obeying different network properties and for different network sizes. Also, we can always involve the sparsity to the recovered WMGM by adding a negative bias to each linking probability parameter p(1) when the variational EM algorithm process is finished. We discretize the network G with 2rw(r) < 2r+1 − 1 and then use it as the input to the proposed estimation algorithm.

Figure 6 shows the estimation and reconstruction results. Figure 6a, b show the convergence of the lower bound of log-likelihood and parameters with EM iterations. Figure 6c illustrates the reconstructed WMGM. The colors in the square represent the values of p(K) probabilities, and the interval lengths reflect l(K). The brain connectome in the alpha lobe is sparse, therefore most regions in the square model have small linking probability values. The exception is on the right-bottom diagonal, which has the value \({p}_{88}^{(K)}=0.5213\). Its presence is due to the appearance in the connectome of a group of very strong interactions among around 20 neurons. Figure 6d–f presents the clustering coefficients, degree distribution and the cumulative weight distribution of the empirical and reconstruced brain networks. The blue line is the empirical distribution directly extracted from the brain connectome. The red line is the result calculated from a synthetic network generated by the reconstructed WMGM. The yellow dash line represents a null network which is generated by the Erdos-Renyi model30 with average linking probability. We also show the distribution in log scale in Supplementary Note 8. The empirical and reconstructed distributions are close to each other, showing that the WMGM and the proposed inference approach can also reconstruct the statistical properties of real networks. In the scenarios of brain connectome and neuronal connections, it is extremely hard to detect and monitor all neurons or the full functional connectivity due to its complex three dimensional structure and the unknown physico-chemical interactions. In the Reconstruction of network generator from partial observations section, we discussed the recoverability of the WMGM with limited observations. Therefore, the proposed model can enable neuroscientists to estimate hidden rules and learn topological properties of brain networks even if only limited and partial observations are available.

Fig. 6: Network generator reconstruction for a brain connectome.
figure 6

a Convergence of lower bound of log-likelihood in the weighted multifractal graph model (WMGM) estimation with expectation maximization (EM) iteration. b Convergence of WMGM parameters. c Reconstructed network generator model θK visualized as a square. Here the model construction iteration is K = 3. The values of node attribute probability lK are presented as the side lengths of sub-block. Linking probability pK are visualized as the sub-block colors. The value can be read with the colorbar on the right. Empirical clustering coefficient d, degree distribution e and cumulative weight distribution f the in brain connectome (empirical), a synthetic network generated by the reconstructed network generator model (reconstructed), and a network generated by the Erdos-Renyi mode (null).

Brain networks in different brain regions have varying topological structures and features. We exploit our proposed WMGM inference framework to examine the structure and connectivity in four regions with different functionalities of the Drosophila optical lobe: Medulla, Accessory Medulla, Lobula and Lobula Plate. Recall that the brain connections are sparse and tend to appear in a small subset. Therefore, we select the most connected 200 nodes in these brain regions and binarize them as the input to the WMGM inference algorithm. For reconstruction accuracy, we run the inference algorithm 50 times on each brain network and calculate their mean and standard deviation. For the Medulla connectome, we obtain an average network generator with the following parameters l(1) = [0.63, 0.37], p(1) = [0.18, 0.26; 0.26, 0.92]. For the Accessory Medulla connectome, the parameters of the average network generator are as follows: l(1) = [0.48, 0.52], p(1) = [0.07, 0.14; 0.14, 0.34]. For the Lobula connectome, the parameters of the average network generator read: l(1) = [0.46, 0.54], p(1) = [0.41, 0.42; 0.42, 0.95]. For the Lobula Plate connectome, we obtain the following average network generator model: l(1) = [0.42, 0.58], p(1) = [0.12, 0.21; 0.21, 0.82]. The standard deviation for each parameter in each network is smaller than 10−10. The inference results p(1) and l(1) are visualized as colors and side lengths of the yellow-green squares in Fig. 7. We further show the clustering coefficients and degree distribution of the four brain networks in Supplementary Note 10. It is impossible to obtain a concise description for each network while encoding all their properties. We conclude that the reconstructed network generator models can be easily distinguished and our WMGM can be used to differentiate scale-dependent brain regions with different functionalities. Moreover, we can exploit the WMGM to define the regional cognitive functionality using the reconstructed generating rule θ = (p, l). This enables us to measure and quantify the neural behaviors and cognition divergence.

Fig. 7: Network generator inference for a Drosophila connectome in different regions of the optical lobe.
figure 7

The regions are a Medulla, b Accessory Medulla, c Lobula, and d Lobula Plate. The reconstructed initial models θ(1) are visualized as the squares. The inference results linking probability p(1) and node attribute probability l(1) are presented as color and side length of sub-blocks in the yellow-green squares. Yellower sub-block represents relatively larger linking probability. The reconstructed network generator models can be easily distinguished by comparing the color and side length of the sub-blocks.

Inferring the network generators of chromosomal interactions of yeast genome in different growing states

The chromosome conformation capture analysis (also known as Hi-C technique) reveals the topological structure of the genomic sequences29,31 and allows scientists to examine the chromatin’s 3D structure. It measures the contacts between any pair of genomic loci31. In the chromosomal interaction matrix built by the Hi-C technique, the nodes represent the genomic loci and pairwise edge indicates the interaction frequency between two loci in the genome32.

During the various growing states, the yeast genome exhibits a complex topological reorganization8. To mine this topological complexity, we infer the WMGMs emerging from the chromosome interaction data8,32. For each chromosomal interaction matrix, we first downsample the 12,048-by-12,048 matrix to 503-by-503 and then discretize it with 200 ≤ w(r) < 200 + 100r.

Figure 8a, b illustrates the reconstructed WMGM θ(K) from the chromosome interactions of the yeast genome in the exponential growth and quiescence states, respectively. We fix l(1) in both growing states to be identical such that we can compare the linking probabilities. The value of p(K) in the major sub-blocks are shown in the figures. Figure 8a shows that the linking probabilities on the diagonals of exponentially growing yeast cells are larger compared to cells in quiescence state shown in Fig. 8b, while the non-diagonal elements are relatively smaller than the ones in quiescence state. This suggests that when the yeast is growing, the inter-chromosomal interactions become weaker (\({\sum }_{i\ne j}{p}_{ij}^{(K)}{l}_{i}^{(K)}{l}_{j}^{(K)}\) changes from 0.2217 to 0.1758) and intra-chromosomal interactions become stronger (\({\sum }_{i = j}{p}_{ij}^{(K)}{l}_{i}^{(K)}{l}_{j}^{(K)}\) changes from 0.1201 to 0.1516). This conclusion is consistent with the analysis in8, where the authors measure the intra-chromosomal distances between two sites on one chromosome. Figure 8c shows the cumulative weight distribution of the chromosome contact maps. The difference in the recovered model is clearer when comparing with the statistical properties of the network weights. We note that the WMGM enables us to identify these properties and our model can therefore help to distinguish between different growth states.

Fig. 8: Weighted-multifractal graph model (WMGM) inference for the yeast genome in different growing states.
figure 8

a Reconstructed WMGM θK from chromosomal interactions of exponentially growing yeast with model iteration K = 3. We visualized the reconstructed model with the square block. The interval length on square side (x, y axis) is divided by node attribute probability lK and the linking probability value pK of each block is represented by the yellow-green color. Several linking probability values are shown on the block. b Reconstructed WMGM of yeast cell in quiescence state. c Cumulative weight distribution of the interaction maps.

Network generator inference and analysis for the conformation maps of replicated human chromosomes

Chromosome conformation capture analysis (also known as Hi-C technique) reveals topological structure of the genomic sequences29. TADs detection identifies the highly self-interacting chromatin regions. TADs emerge as square blocks whose centers locate at the diagonal of the interaction matrices. Though TADs emerge as critical features to characterize the high intradomain contacts, an unambiguous definition is still evolving31. We infer the WMGM from a binarized Cis sister contact maps from29 and show that our model can help to detect the TADs in Hi-C matrices.

In the variational EM based estimation method (see Methods section on Parameter estimation of WMGM), we introduce the variational parameters τuq to calculate the lower bound of the log-likelihood \({{{{{{\mathscr{L}}}}}}}_{Q}(\theta ,R)\) in Eq. (3). For each node u, \({\{{\tau }_{uq}\}}_{q = 1\ldots {M}^{K}}\) can be viewed as soft assignments regarding the probability that node u has attribute q. They are also estimators of node attribute distribution parameters l(K) and each node is assigned with one τ. We calculate the entropy of the variational estimator τu with each node u as \(H(u)=-\mathop{\sum }\nolimits_{q = 1}^{{M}^{K}}{\tau }_{uq}{{{{\mathrm{log}}}}}\,{\tau }_{uq}\). Figure 9a shows the binarized Hi-C interaction matrix of the human chromosome 21 Cis sister contacts. For the best reconstruction of the WMGM, we downsample it to a 483-by-483 matrix and apply the proposed inference algorithm. TADs are circled with orange and green rectangles. The node attribute entropy H(u) is shown in Fig. 9b, where low entropies are close to zero and are circled. We find that the positions (node index) with zero entropies are in correspondence with the region where TADs emerge. We conclude that nodes in TADs have high intra-interactions and tend to have low entropy (where the WMGM has high confidence). This suggests that our WMGM can also help to detect TADs in Hi-C data analysis.

Fig. 9: Conformation maps of replicated human chromosomes.
figure 9

a Visualization of the binarized chromosome 21 Cis sister contacts. X and Y axis are the locations of genomic loci. Topologically associating domains (TADs) are circled with orange and green squares. b Entropy of the soft assignments τu at each position. Small entropy are marked with orange and green squares in correspondence with the squares in a. We summarize that TADs are in correspondence with low entropy of node attribute distributions.

Discussion

Exploring topological features in complex networks has the potential to enhance our understanding of the behavior of natural and social phenomena. Among massive topological features, we focus on multifractality, an important property that widely exists in complex systems from numerous domains, including biology, sociology, neuroscience, and geology. Analyzing the multifractality of complex systems enables scientists to measure the multi-scale interactions among components in large-scale complex networks5.

Network multifractality is commonly studied and analyzed at the graph level, where the structure of connections among nodes are considered as self-similar6,19. However, past approaches suffer from a number of limitations. Renormalization group-inspired algorithms, capable of estimating the multifractality of graphs fail to explain the emergence and evolution of networks and cannot decipher the hidden generating rules. The stochastic Kronecker graph model25 captures self-similarity by building a probabilistic model with Kronecker products. However, it requires the graph and model size to be the same, limiting applications to arbitrary scale and partially observed networks. In20,23, the authors propose multifractal network generators, but they reconstruct the model parameters by fitting graph metrics via a simulated-annealing procedure. The simulated-annealing algorithm is unstable and can return various sets of unrelated parameters. This makes it difficult to interpret the generator’s physical meaning. The majority of network models also neglect the importance of weights characterizing the interactions in complex networks.

To decipher the hidden network generators (rules) governing the complex systems dynamics at the functional level, we proposed the weighted multifractal graph model (WMGM). It is capable of capturing their heterogeneity and varying degrees of self-similarity. The proposed WMGM serves as a function that maps and compresses the large graph onto a model. The network generating function can also provide high-level and condensed description of complex systems, which integrates varying graph metrics such as degree and clustering coefficient. To efficiently infer the model parameters, we develop a variational EM inference framework for reconstructing the underlying network generating function encoded in complex networks. We investigate the ground-truth recovery and the robustness of the model inference against incomplete data and noisy observations. The provided mathematical tools can help to investigate network (topological) features by describing large-scale complex systems through functional approaches. Uncovering the generating rules enables us not only to generate synthetic graphs with different properties, but also reproduce the topological complexity and heterogeneity of real-world systems. The proposed WMGM framework is applied to several real-world networks – neuronal connections and chromosomal interactions. The recovered WMGMs demonstrate the potential of the WMGM framework to capture and reproduce the topology structures of real networks. The probabilistic description of the generating function also helps to explore the validity of links in a noisy graph and denoise the system. The reconstructed generator is also able to distinguish different functional regions in the brain and yeast growing states, as well as to detect the boundaries of TADs.

In this work, we assume the distribution of node category is the same across nodes and communities. Generalizing the topological scales as the authors show in33 can improve the ability of the proposed method to capture the heterogeneity embedded in the topological structure of real-world complex systems such as brain networks. For example, instead of using one rule \(P\{{\phi }_{uq}=1\}={l}_{q}^{(K)}\) to generate the node attribute, we can introduce a heterogeneous distribution where the value of K is varied across nodes in different community. The category (community) of each node u is assigned by \({\phi }_{u} \sim categorical({l}^{({k}_{u})})\). The heterogeneity is introduced as the random variable ku (scale of node u, which is a positive integer between 1 and K under distribution f(ku). The linking probability between nodes u and v given the community ϕu, ϕv and scale ku, kv can be calculated as \({\sum }_{q,h}{p}_{qh}^{(K)}\), where the summation over q, h satisfies \({\sum }_{q}{l}_{q}^{(K)}={l}_{{\phi }_{u}}^{({K}_{u})}\) and \({\sum }_{h}{l}_{h}^{(K)}={l}_{{\phi }_{v}}^{({K}_{v})}\).

In the future, we will also investigate the effect of various methods to sample the networks (as we have performed in the Drosophila connectome and social networks case studies) and develop strategies that will provide consistent WMGM generating model across various subgraphs with different sizes and properties. Moreover, future applications of the proposed WMGM framework includes inferring the WMGM models that correspond to partially observed neuronal activity and quantifying how the identified WMGM models evolve and self-optimize during the observed cognition activities. A crucial question in neuroscience is, how can we measure, identify and compare higher order topological characteristics of neuronal behaviors and activities under different (cognitive) circumstances. On the spike train level, it is difficult to compare the spiking behavior of different recording lengths and varying number of neurons in the neuronal systems during non-stationary brain activity. Different network sizes are hard to compare at the neuronal network level. In the future, we will propose an approach to mine the neuronal activity that focuses on identifying the compressed WMGM models and quantifying their evolution and the distances among various WMGM models corresponding to cognition tasks. More precisely, we can first reconstruct the generating measure from observed neuronal networks exhibiting or performing different cognitive behavior and quantify the changes in generators programming the neural behaviors through modifications in the model parameters θ. Subsequently, we can calculate the distance between two cognitive behaviors as a distance between the parameters between two WMGM models. Another important future work includes an implementation aimed at networks that are very sparse and the application of the WMGM to detect hierarchical community structures in complex systems such as cyber-physical systems. With the aid of the proposed WMGM, we are also looking into quantifying cognition given neuronal behaviors and neuron-glia (astrocyte) metabolic coupling and information processing under different cognitive tasks. In the future, when real-time brain activity monitoring is available, the WMGM can be extended to analyze time-varying complex networks generators of label free real-time imaging of neuron-glia activity. This could represent a major step towards a comprehensive understanding of non-Markovian learning and decision making and other brain cognitive functions. With respect to the 4D Nucleome networks, future work will focus on constructing more robust strategies for identifying the TADs with applications to Hi-C analysis.

Methods

Parameter estimation of WMGM

In this section, we discuss how to recover the parameters for the WMGM via a variational approach. We first provide a probabilistic description of a weighted network within the WMGM framework. Let R denote the N-by-N adjacency matrix of the weight category in the graph. Recall that ϕ is the latent node attribute indicator. The probabilistic description of nodes and edges are given by \(p(\phi ;l)=\mathop{\prod }\nolimits_{u = 1}^{N}\mathop{\prod }\nolimits_{q = 1}^{{M}^{K}}{\left({l}_{q}^{(K)}\right)}^{{\phi }_{uq}}\) and \(p(R| \phi ;p)={\prod }_{u\,{ < }\,v}\mathop{\prod }\nolimits_{q,h = 1}^{{M}^{K}}{\left({p}_{qh}^{(K)}({r}_{uv})\right)}^{{\phi }_{uq}{\phi }_{vh}}\). Here, we focus on undirected graphs without self-loops. Directed graphs and graphs containing self-loops can also be expressed by changing the summation condition over u, v. In this work, we view M and K as hyperparameters; they are selected prior to the inference procedure.

Given an observed weighted network, we seek to estimate the underlying network generating function θ(1) by maximizing the likelihood function \({{{{{\mathscr{L}}}}}}(\theta )\) on the left of the following:

$${{{{{\mathscr{L}}}}}}(\theta )={{{{\mathrm{log}}}}}\,p(R;\theta )={{{{\mathrm{log}}}}}\,\mathop{\sum}\limits_{\phi }Q(\phi )\frac{p(R,\phi ;\theta )}{Q(\phi )}={{{{\mathrm{log}}}}}\,{{\mathbb{E}}}_{Q}\left\{\frac{p(R,\phi ;\theta )}{Q(\phi )}\right\}\ge {{\mathbb{E}}}_{Q}\left\{{{{{\mathrm{log}}}}}\,\frac{p(R,\phi ;\theta )}{Q(\phi )}\right\}$$
(2)

However, the summation over ϕ makes the log-likelihood \({{{{{\mathscr{L}}}}}}(\theta )\) intractable. Therefore, instead of maximizing the log-likelihood, we alternatively aim to maximize the evidence lower bound \({{\mathbb{E}}}_{Q}\left\{{{{{\mathrm{log}}}}}\,\frac{p(R,\phi ;\theta )}{Q(\phi )}\right\}\) (the right-hand side above). In order to minimize the gap between the log-likelihood and its lower bound, which is the Kullback–Leibler divergence from P(ϕR; θ) to Q(ϕ), we choose the distribution over ϕ to be \(Q(\phi )=\mathop{\prod }\nolimits_{u = 1}^{N}\mathop{\prod }\nolimits_{q = 1}^{{M}^{K}}{{\tau }_{uq}}^{{\phi }_{uq}}\), where the variational parameters τuq measure the soft assignments of node u, \(\mathop{\sum }\nolimits_{q = 1}^{{M}^{K}}{\tau }_{uq}=1\) for u = 1…N. This is known as the mean-field approach in variational inference34. Therefore, the lower bound of \({{{{{\mathscr{L}}}}}}(\theta )\) can be computed as

$${{{{{{\mathscr{L}}}}}}}_{Q}(\theta ,R)= \, \mathop{\sum}\limits_{\phi }Q(\phi ){{{{\mathrm{log}}}}}\,P(\phi ;l)+\mathop{\sum }\limits_{\phi }Q(\phi ){{{{\mathrm{log}}}}}\,P(R| \phi ;p)-\mathop{\sum }\limits_{\phi }Q(\phi ){{{{\mathrm{log}}}}}\,Q(\phi )\\ = \,\mathop{\sum }\limits_{u = 1}^{N}\mathop{\sum }\limits_{q = 1}^{{M}^{K}}{\tau }_{uq}{{{{\mathrm{log}}}}}\,({l}_{q}^{(K)})+\mathop{\sum }\limits_{u\,{ < }\,v}\mathop{\sum }\limits_{q,h = 1}^{{M}^{K}}{\tau }_{uq}{\tau }_{vh}{{{{\mathrm{log}}}}}\,\left({p}_{qh}^{(K)}({r}_{uv})\right)-\mathop{\sum }\limits_{u = 1}^{N}\mathop{\sum }\limits_{q = 1}^{{M}^{K}}{\tau }_{uq}{{{{\mathrm{log}}}}}\,({\tau }_{uq}).$$
(3)

Algorithm 1: Reconstructing the WMGM through a variational EM algorithm

 input: N, M, K and adjacency matrix R in weight category

 output: p(1), l(1)

 parameter sweep: M = 1, 2, 3 and K = 1, 2, 3, 4

    initialization: p(1), l(1), τ

    repeat

      E-step: update τ by p(1) and l(1)

      repeat

       \({\tau }_{uq}\leftarrow {\lambda }_{u}{l}_{q}^{(K)}\exp \left\{\mathop{\sum }\nolimits_{v\ne u}\mathop{\sum }\nolimits_{h}{\tau }_{vh}{{{{\mathrm{log}}}}}\,\left({p}_{qh}^{(K)}({r}_{u}v)\right)\right\}\)

      untilτconverges;

      M-step: update p(1) and l(1) by τ

      \({l}_{i}^{(1)}=\frac{1}{NK}\mathop{\sum }\nolimits_{u,q}{\tau }_{uq}\mathop{\sum }\nolimits_{k = 1}^{K}{\mathbb{1}}\left\{q(k)=i\right\}\)

      repeat

        

$$\frac{\partial {{{{{{\mathscr{L}}}}}}}_{Q}(\theta )}{\partial {p}_{ij}^{(1)}}\;= \;\frac{1}{{p}_{ij}^{(1)}}\mathop{\sum}\nolimits_{q,h}\mathop{\sum}\nolimits_{u\ne v}\Bigg\{{\tau }_{uq}{\tau }_{vh}\left({r}_{uv}-\frac{{p}_{qh}^{(K)}}{1-{p}_{qh}^{(K)}}\right)\\ \;\times \mathop{\sum }\nolimits_{k = 1}^{K}{\mathbb{1}}\left\{q(k)\;=\;i,h(k)\;=\;j\right\}\Bigg\}$$

        \({p}_{ij}^{(1)}\leftarrow {p}_{ij}^{(1)}+\gamma \frac{\partial {{{{{{\mathscr{L}}}}}}}_{Q}(\theta )}{\partial {p}_{ij}^{(1)}}\)

      untilp(1) converges;

     untillog-likelihoodconverges;

The inference procedure is then performed via variational expectation maximization (EM), as shown in Algorithm 1. In the E-step, given the parameters l(1) and p(1), we maximize Eq. (3) with respect to τ. We update τuq by a fixed point iteration following a similar strategy as in34. λu is the normalization factor to satisfy the constraint \(\mathop{\sum }\nolimits_{q = 1}^{{M}^{K}}{\tau }_{uq}=1\) in Algorithm 1. In the M-step, with τ obtained from the E-step, we maximize \({{{{{{\mathscr{L}}}}}}}_{Q}(\theta ,R)\) with respect to the parameters l(1) and p(1). In Eq. (3), the terms regarding l and p are written separately. Therefore, they are independently updated. \({l}_{i}^{(1)}\) can be analytically computed by setting the partial derivatives to zero. Moreover, the \({p}_{ij}^{(1)}\) are numerically computed by the gradient method with step length γ. q(k) and h(k) are the kth index of decomposing q and h (a ‘reverse’ process of taking the Kronecker products). \(q(k)=\left(\lfloor \frac{q-1}{{M}^{k-1}}\rfloor \,{{{{\mathrm{mod}}}}}\,\,M\right)+1\). To avoid confusion between θ(1) and θ(K), we use i and j for indices in θ(1) and q, h in θ(K). Finally, we also note that in this section, all summations over nodes u and v are taken from 1 to N. Summations of i, j are from 1 to M and summations of q, h are taken from 1 to MK. Note that M, K are considered as hyperparameters in the variational EM framework. We perform more analysis on the choice of best hyperparameters M, K in Supplementary Notes 4 and 6.

Multifractal analysis of WMGM

Next, we use our proposed WMGM to analytically compute the statistical physics inspired and multifractal metrics, such as the partition function, the Lipschitz-Hölder exponent, and the multifractal spectrum. For simplicity, we first reshape the linking probabilities \({\left\{{p}_{ij}^{(1)}\right\}}_{i,j = 1:M}\) in θ(1) as \({\left\{{p}_{i}\right\}}_{i = 1:{M}^{2}}\). We also reshape the area of each sub-rectangle in the unit square \({\left\{{l}_{i}^{(1)}{l}_{j}^{(1)}\right\}}_{i,j = 1:M}\) as \({\left\{{a}_{i}\right\}}_{i = 1:{M}^{2}}\). Following20,35, the partition function of the model at an average sub-block size \(\epsilon ={(\frac{1}{M})}^{2K}\) can be written as

$${Z}_{\epsilon }(q)=\mathop{\sum}\limits_{{\left\{{k}_{j}\right\}}_{j = 1:{M}^{2}}}\left(\begin{array}{cc}{K}\\ {{k}_{1}\ldots {k}_{j}\ldots {k}_{{M}^{2}}}\end{array}\right){\left\{\mathop{\prod }\limits_{i = 1}^{{M}^{2}}{[{a}_{i}{p}_{i}]}^{{k}_{i}}\right\}}^{q}={\left[\mathop{\sum }\limits_{i,j = 1}^{M}{({l}_{i}^{(1)}{l}_{j}^{(1)}{p}_{ij}^{(1)})}^{q}\right]}^{K}.$$
(4)

In Eq. (4), \(\left(\begin{array}{cc}{K}\\ {{k}_{1}\ldots {k}_{j}\ldots {k}_{{M}^{2}}}\end{array}\right)\) is the number of sub-blocks which have the same area \(\mathop{\prod }\nolimits_{i = 1}^{{M}^{2}}{{a}_{i}}^{{k}_{i}}\) and linking probability \(\mathop{\prod }\nolimits_{i = 1}^{{M}^{2}}{{p}_{i}}^{{k}_{i}}\). \({\left\{{k}_{i}\right\}}_{i = 1:{M}^{2}}\) is subjected to \(\mathop{\sum }\nolimits_{i = 1}^{{M}^{2}}{k}_{i}=K\). \(\mathop{\prod }\nolimits_{i = 1}^{{M}^{2}}{[{a}_{i}{p}_{i}]}^{{k}_{i}}\) is the proportion of edges which are generated under linking probability in those sub-blocks.

In the multifractal analysis, multifractal metrics are calculated based on the partition function Zϵ(q), where q is the order of the moment. The mass exponent is given by

$$\tau (q)=\frac{{{{{\mathrm{log}}}}}\,{Z}_{\epsilon }(q)}{{{{{\mathrm{log}}}}}\,\epsilon }=\frac{K}{{{{{\mathrm{log}}}}}\,\epsilon }{{{{\mathrm{log}}}}}\,\left\{\mathop{\sum }\limits_{i,j=1}^{M}{({l}_{i}^{(1)}{l}_{j}^{(1)}{p}_{ij}^{(1)})}^{q}\right\}$$
(5)

The Lipschitz–Hölder exponent (refer to coarse Hölder exponent or singularity index in some other scientific works) is defined as \(\alpha (q)=\frac{{{{\rm{d}}}}\tau (q)}{{{{\rm{d}}}}q}\). The multifractal spectrum reads f(α) = α(q)q − τ(q). Here, we provide the expression for the Lipschitz–Hölder exponent:

$$\alpha (q)=\frac{1}{{Z}_{\epsilon }(q){{{{\mathrm{log}}}}}\,\epsilon }\left\{K{\left\{\mathop{\sum }\limits_{i,j = 1}^{M}\left[{\left.{l}_{i}^{(1)}{l}_{j}^{(1)}{p}_{ij}^{(1)}\right]}^{q}\right)\right\}}^{K-1}\mathop{\sum }\limits_{i,j=1}^{M}{\left[{l}_{i}^{(1)}{l}_{j}^{(1)}{p}_{ij}^{(1)}\right]}^{q}{{{{\mathrm{ln}}}}}\,\left[{l}_{i}^{(1)}{l}_{j}^{(1)}{p}_{ij}^{(1)}\right]\right\}.$$
(6)

When the values of the order of the moment q takes q = −q0: dq: q0, the width of the multifractal spectrum can be defined and calculated as dα = α(q)max − α(q)min = α(q0) − α(−q0). The center of the multifractal spectrum is located at αcenter = α(0).