Abstract
In this paper we devise a generative random network model with core–periphery properties whose core nodes act as sublinear dominators, that is, if the network has n nodes, the core has size o(n) and dominates the entire network. We show that instances generated by this model exhibit power law degree distributions, and incorporates smallworld phenomena. We also fit our model in a variety of realworld networks.
Introduction
Many complex networks exhibit the socalled core–periphery structure. The core–periphery structure of networks considers a network that is comprised by a core and a periphery. The core of the network is a small subset of the node set which are tightly connected with one another and the periphery “lies around” the core and nodes within the periphery are sparsely connected with one another. In the context of a social network, the core of the network refers to the individuals that possess a celebrity status in society, such as famous politicians, actors, and athletes, and the rest of the users constitute the periphery of the network.
Core–periphery networks have existed awhile in literature^{1,2,3,3,4}. The intuition behind core–periphery networks has its roots in political economy. Wallerstein in his seminal work “Worldsystems theory”^{2} theorized that the globe can be divided into core nations, which focus on “highlyskilled labor” and “capitalintensive” production whereas peripheral countries focused on “lowskilled labor” and “laborintensive” production. Moreover, trade and diplomatic ties between countries seem to follow this structure, backed by Krugman’s theory^{5} which argues that core–periphery structures emerge due to the core regions’ low centralizedproduction costs and the supplyoriented peripheral regions. Avin et al. present an axiomatic approach towards coreperiphery networks and draw strong conclusions^{6}. Generative models for core–periphery networks have also been studied at Ref.^{3,7,8,9}. The closest model to ours is the stochastic blockmodel of Ref.^{3} which assumes that core–core nodes are connected with probability \(p_{CC}\), periphery–periphery nodes are connected with probability \(p_{PP}\) and core–periphery nodes are connected with probability \(p_{CP}\), with \(p_{CC}> p_{CP} > p_{PP}\), and its recent extension to directed graphs in Ref.^{9}.
The dominating set is a wellstudied component of networks. More specifically, a subset of the nodes of an undirected network is a dominating set if and only if every node in the network has at least one neighbor belonging to the dominating set. The interesting question from an algorithmic perspective is finding the minimum dominating set which is shown to be an NPHard problem^{10}. Multiple previous works have investigated dominating sets in the context of social and biological networks^{11,12,13,14}. The work of Ref.^{15} shows that the geometric protean model exhibits a sublinear dominating set, both in theory and in practice.
However, this previous modeling work used a generative framework that was quite complex, and lacked a connection with core–periphery structure. Here we present a much simpler generative model for networks whose minimum dominating set is sublinear in size. We associate the resulting minimum dominating set with the core of the network and its neighborhood (without including nodes of itself) to the periphery of the network. The main concept behind exploiting the core–periphery structure of networks to speed up computational tasks is based on the general idea that intense computational tasks can be performed within the sublinear core and then the results can be aggregated to the periphery with relatively low query complexity. So, leveraging the connection between dominating sets and the core–periphery structure from an algorithmic viewpoint can be used in many problems such as allpairsshortestpaths computation, community detection, embedding generation, and many more.
We call our model the InfluencerGuided Attachment Model (IGAM). The IGAM model is built onto a hierarchical substructure, also known as a communitieswithincommunities (fractallike) model^{16,17}, that is a tree of fanout b and height H. Based on the tree skeleton, nodes are associated with prestige (equivalently “coreness”) values and between any two nodes, the logprobability of connection depends on the most prestigious node. The novelty of IGAM concentrates on the existence of a sublinear minimum dominating set which can be seen as defining the core of the network, with the rest of the nodes being the periphery of the network. We validate our hypothesis by efficiently fitting the IGAM model to realworld data and show almost perfect correlation between the construction of an almost dominating set based on the IGAM model and the construction of an almost dominating set via the maximum coverage greedy algorithm of Ref.^{18}. IGAM follows a power law distribution, and exhibits smallworld phenomena, which are evident in social networks. We compare the IGAM model with the logistic models of Ref.^{8,19} and conclude that IGAM is able to produce smaller almost dominating sets than the logistic models of Ref.^{8,19}. Finally, we give a generalized model that can incorporate core–periphery properties similar to the stochastic blockmodel of Zhang et al.^{3} .
The influencerguided attachment model
Realworld networks usually exhibit power laws together with selfsimilar artifacts. Selfsimilar structures are similar to a part of themselves and are common properties of fractals^{16,20}. Selfsimilar structures have been long observed in networks such as computer networks, patent networks, social networks^{21,22,23}. A model that is able to describe the communitieswithincommunities structure can be a tree structure. Moreover, we want a way to quantify that nodes have a higher affinity to be connected with more prestigious nodes that are located higher in the tree rather than other nodes below their level, which refers to a common property of core–periphery networks^{2,3,7}. This property is given by something which we call, similarly to Ref. ^{16}, a difficulty function. We want the graph to follow a power law degree distribution as well as experience smallworld phenomena^{17}.
We are ready to describe the generative model formally: The model starts with a hierarchical structure of a perfect bary tree T of height H and fanout \(b \ge 2\) where b is a constant. Every node v of the tree is associated with a height \(0 \le h(v) \le H\) which is defined to be the inverse prestige of the corresponding node. The root has a higher prestige and as we go down on the tree the nodes have lower prestige up to the leaves. Two nodes u and v are linked with a probability equal to f(u, v). We want f(u, v) to depend on the node with the higher inverse prestige, and be scalefree. For the former property we can assume that f(u, v) depends on \(\min \{ h(u) , h(v) \}\). For f to be scalefree we need f to be levelindependent (or translationinvariant). Namely, for two nodes u, v at levels h(u), h(v) and for two nodes \(u', v'\) with \(h(u') = h(u)  1\) and \(h(v') = h(v)  1\) we must have that f(u, v) and \(f(u', v')\) to be levelindependent and, thus, a constant multiplicative factor apart. Formally, if we let \(\tilde{h} = \min \{ h(u), h(v) \}\) then \(\min \{ h(u'), h(v') \} = \tilde{h}  1\) that means \(f(\tilde{h}) / f(\tilde{h}  1) = c\), and subsequently, \(f(\tilde{h}) \propto c^{h}\) for some constant \(c > 1\). This analysis yields a law of the form
where \(c \in (1, b)\) is a constant. The requirement that \(c \in (1, b)\) will become evident as we go through this paper. After the generation of the random edges according to the law f(u, v), we delete the auxiliary tree edges of T. For instance, if \(b = 3\) and \(c = 2\) then the root is connected with every leaf with probability 1/2, the first level is connected with every leaf with probability 1/4, and so on. The network has \(n = \Theta (b^H)\) nodes.
Basic definitions
We say that a subset S of the vertex set of a graph is a \(\kappa\) almost dominating set (\(\kappa\)ADS) if the set S dominates at least \(\kappa n\) of the nodes present in the graph. In other words, at least \(\kappa n\) of the nodes of the graph have a neighbor in S.
We say that an event E(n) holds with high probability (w.h.p.) if \(\Pr [E(n)] = 1  O(1 / n)\), with extreme probability (w.e.p.) if \(\Pr [E(n)] = 1  O(e^{n})\), and asymptotically almost surely (a.a.s.) if \(\Pr [E(n)] \rightarrow 1\) as \(n \rightarrow \infty\).
Sublinear domination and core–periphery structure
We show that the core of the network consists, as one should expect, from a sublinear number of nodes located at the top levels of the tree. To observe this phenomenon, we calculate the probability \(q_{h \tau }\) of a node at height h not being dominated by any node between levels 0 and \(\tau\) where \(\tau \le h\), which equals
where \(a \lesssim b\) denotes that there exists a constant \(C > 0\) independent of b such that \(a \le C \cdot b\) (i.e. inequality up to a constant factor), and the first inequality holds since \(1  t \le e^{t}\) for all \(t \in \mathbb R\). Note that \(q_{h \tau }\) does not depend on the height of the node in question, as long as \(\tau \le h\). Now the probability that there is at least one node uncovered below level \(\tau + 1\) is given by Markov’s Inequality and is at most
To assert an w.h.p. guarantee we force the above probability to be \(\Theta (b^{H})\), therefore, solving for \(\tau\) we arrive at a dominating set of size
with probability at least \(1  \Theta (b^{H})\). Consequently, a sublinear fraction of nodes \(C = \{ v : h(v) \le \tau \}\) located on a logarithmic height \(\tau\) from the root of the skeleton tree T dominate the whole periphery \(P = \{ v : h(v) \ge \tau + 1 \}\) with \(\tau = \Omega (\log H)\).
Degree distribution
To fit the model to realworld data, we infer the degree distribution of IGAM. The average degree of a node u at level h is
and the total expected number of edges at level h is \(\bar{m}_h = b^h \bar{d}_h = \Theta (b^{h + H + 1} / c^{h + 1})\). The asymptotics of the previous equation yield a power law with exponent
If the rank of u, with \(h(u) = h\) is given as \(r_h = c^{h}\), which is an increasing function of h, then the expected degree depends on the inverse rank \(1 / r_h\), yielding a Zipfian power law. The trials for connecting every node are independent Bernoulli variables, and therefore by the multiplicative Chernoff bound with probability at least \(1  \Theta (b^{ H})\) we have that the average degree at height h, \(\hat{d}_h\) is \(\Theta (1 / r_h) \pm O(\sqrt{H \log b / (2b^h)})\). By a union bound, we have that
Thus, with probability \(1  O(H b^{H})\) (i.e. w.h.p.) the degree histogram follows Zipf’s Law. The exponent of the degree distribution can be altered, if the same model is generated with parameters \(b' = b^\alpha , c' = c^\alpha\) for some \(\alpha \ge 0\). The expected number of edges \(\bar{m}\) is given as
and is superlinear with respect to the number of nodes.
Insights from data
We describe a fitting algorithm for the IGAM model (Algorithm 1). The fitting process considers of being given a sample of m edges \(\mathcal D = \{ e_i \}_{1 \le i \le m}\) on a network of n nodes where n is known. Our goal is to find the optimal fanout \(b^\star\), the optimal height function \(h^\star\) and the optimal scale factor \(c^\star\) that maximize the loglikelihood of the model, that is
where the likelihood equals
Directly optimizing the likelihood is very hard since there are O(n) possible fanouts, each fanout can generate an exponential number of possible trees, and thus height functions, and given the fanout and the height function the remaining problem consists of finding the optimal c that explains \(\ell (c  \mathcal D, b, h).\)
To optimize the loglikelihood of IGAM efficiently, we first calculate the sample degrees of each node, that is \(\bar{y}_u = \sum _{i = 1}^m \mathbf{1} \{ u \in e_i \}\), and then order the nodes on decreasing order of their sample degrees. After that, we fix a fanout b from the interval \(\{ 2, \dots , n  1 \}\), and according to that fanout we start by attributing heights of a hypothetical bary tree on the nodes according to their descending order. For example, for \(b = 2\) the first node gets a height of 0, the next two a height of 1, and so on. Then, for each height \(0 \le h \le \lceil \log n / \log b \rceil\), we form the logdegrees \(\bar{z}_h = \log \left( \sum _{u : h(u) = h} \bar{y}_u \right)\), and fit linearleastsquares with xvalues being the range of heights and yvalues being the logdegrees \(\bar{z}_h\). The optimal slope a yields c to be \(c = b \cdot e^{a}\). If \(c \ge b\) then the current fit is rejected. We can then calculate the likelihood function \(\ell\) and keep the best parameters \((b^\star , h^\star , c^\star )\). Each step is dominated by the calculation of the likelihood that costs \(O(n^2)\) time, since exactly computing the loglikelihood requires summing over all pairs of nodes (regardless of whether an edge exists or not), and thus the total complexity is \(O(n^3)\). Note that since the values of f(u, v) are small (i.e. close to 0) and realworld networks are sparse (i.e. m is of the order of n or \(n \log n\)) the loglikelihood can be approximated in time O(m) by ignoring the networkindependent term, i.e. the term that sums on \(V \times V\), which yields an algorithm with runtime O(nm) instead of \(O(n^3)\). If a full bary tree does not cover the network, we allow the last level to be incomplete.
We fit the IGAM model to networks examined in Ref.^{9}. More specifically, we examine the worldtrade network from Ref.^{24} (\(n = 76, \; m = 845\)), the faculty datasets from Ref.^{25} (csfaculty: \(n = 205, \; m = 2861\); historyfaculty: \(n = 145, \; m = 2334\); businessfaculty: \(n = 113, \; m = 3027\)), the polblogs dataset from Ref.^{26} (\(n = 852, \; m = 15,956\)), the airports dataset from Ref.^{27} (\(n = 210, \; m = 2429)\), the celegans dataset from Ref.^{28} (\(n = 279, \; m = 1.9 \text K\)), the openairlines dataset from Ref.^{8} (\(n = 7.2 \text K, \; m = 18.6 \text K\)), and the Londonunderground dataset from Ref.^{8} (\(n = 315, \; m = 270\)); treating the networks as undirected. Figure 1 presents the (total) degree distribution fits for the IGAM model, where the parameters b, c and the height function have been determined. Observe, that the total degree at each constructed level is linearly correlated (\(R^2 \ge 0.93\) except for the airports dataset) with the coreness value of each group of nodes (per level). Moreover, in Fig. 2 we do a log–log plot between the construction of the dominating set as in Section 3 and the construction of the dominating set using the maximum coverage greedy algorithm. The former algorithm treats the nodes as IGAM would do in the construction of the dominating set, i.e. by traversing the levels of the hierarchy from top to bottom. The latter algorithm picks the node with the largest active degree at each step, adds it to the set, and removes itself and all the nodes connected to it from the network up to a certain number of steps or if there are no more nodes left. Markers on the plot represent subsequent iterations of both algorithms. We observe almost perfect correlation between the two algorithms and slightly superlinear relations of the form \(y \propto x^\gamma\) for \(\gamma \in [1, 1.21]\), which is a phenomenon that we should not expect in more general networks, since choosing the nodes with the highest degrees shall not yield good coverage in general. Moreover, note that a sublinear number of iterations, denoted by the number of x markers outside the \([1.9, 2.0]^2\) box (the mapping is increasing), suffices to dominate \(10^{1.9} \% \approx 80 \%\) of the nodes. A visualization of the IGAM fitting process can be found in Fig. 7 for the small datasets whereas the various levels h of the IGAM model are colorcoded.
Qualitative insights
In this Section, we highlight the following structures that emerge from fitting the IGAM model to the real world data. We examine the first three levels of the hierarchy, devised by the height function h, for the datasets that contain labeled nodes. The analytical form of the core nodes can be found in the “Methods” section.

1.
Faculty networks In the faculty networks of each one of the three disciplines (computer science, history, and business) the core consisted of nodes referring to highly ranked universities in the United States (in each discipline), as well as an (aggregate) node referring to faculty coming from all nonUS academic institutions. To elaborate, the csfaculty network contains MIT, CMU, Stanford, UT Austin, Purdue, and UIUC in its core, together with the aggregate node. The historyfaculty core consists, for instance, of Harvard, Yale, University of Chicago, Columbia, Stanford, Johns Hopkins, and Cornell. Finally, the businessfaculty network has, for instance, the University of Michigan, UT Austin, Penn State, and the University of Pennsylvania at its core. These findings are consistent with the body of research on faculty hiring networks^{25,29} where it is stated that, for the computer science discipline, a very small percentage (9%) of departments is responsible for 50% of academic hires in faculty position.

2.
Openairlines The openairlines network has a core that consists of very large and central international airports such as AMS, FRA, CDG, IST, MUC, ATL, and PEK.

3.
Worldtrade The worldtrade dataset contains data about the trade of metals among 80 countries in 1994. The nodes represent countries who have available entries in the Commodity Trade Statistics released by the United Nations. In this network the core consists of, for instance, from Finland, Hungary, Slovenia, Singapore, Chile, and so on.

4.
Londonunderground In the Londonunderground dataset, we recover a core that consists of busy train stations such as Bank, Baker Street, Canning Town, and so on, all of which are cardinal to the British underground system.
Relation to logistic core–periphery models
We compare the IGAM model with two logistic models introduced by Jia and Benson^{8} and Tudisco and Higham^{19}. In detail, we fit both models and give empirical answers to the following question: Are the logistic core–periphery models able to explain the domination structure of core–periphery networks?
The model of Jia and Benson assigns a coreness score \(\theta _v \in \mathbb R\) for every node v in the vertex set V. The simple version of the model produces edges (u, v) randomly and independently with probability
Intuitively what this model describes is that a node with \(\theta _v \ge 0\) is considered to be a core node and a node with \(\theta _v < 0\) to be a peripheral node. That is, for a pair (u, v) if both nodes are peripheral, i.e. have \(\theta _u < 0\) and \(\theta _v < 0\), then the link probability \(\rho (u, v)\) is less than the case when one of \(\theta _u, \theta _v\) is nonnegative that represents a core–periphery link. Similarly, when both \(\theta _u \ge 0\) and \(\theta _v \ge 0\), which corresponds to a core–core node, then the edge creation law attributes a larger connection probability. When spatial features \(x: V \rightarrow \mathbb R^d\) are provided, as well as a kernel function K(u, v) (for example, \(K(u, v) = \Vert x_u  x_v \Vert _2\)), and a hyperparameter \(\varepsilon\), then (LogisticJB) is generalized to an edge law
The model of Tudisco and Higham^{19} is based on a logistic probability law determined by a ranking \(\pi\) of the nodes. The more prestigious a node v is the higher the value \(\pi _v\) is. The edge creation law is given by
where \(\sigma _{s, t} = 1 / (1 + e^{s (x  t)})\) is the smooth approximation of the Heaviside step function \(H_t(x)\) that is 1 if \(x \ge t\) and 0 otherwise. We use \(s = 10\), and \(t = 1/2\). Again, the model intuitively says that nodes tend to be associated with more prestigious nodes rather with less prestigious nodes. Finally, the authors propose an iterative method to infer the ranking \(\pi\) which has an O(m) perstep cost.
We evaluate how well can LogisticCP, LogisticJB and LogisticTH capture the domination properties of the core–periphery structure compared to IGAM. For the logistic models of Jia and Benson we fit the LogisticCP model when there are no spatial data available and the LogisticJB when spatial data are available (i.e. in the celegans, openairlines, and Londonunderground datasets). We use the optimal parameters \(\theta _v^*\) of the logistic models to build a ranking for the nodes by sorting them in decreasing order of the scores \(\theta _v^*\). For the LogisticTH model, we use the iterative method provided in their paper to infer the ranking by sorting the entries of the fixed point that their algorithm produces. Then, for all models, we report the domination curves in Figs. 2, 5 and 4. To give better visual insights on how the models perform, we visualize the outcome of fitting the models for the celegans dataset on Fig. 6 for a core set of size \(\lfloor n^{0.7} \rfloor\). For each dataset and figure we report the exponent \(p \in [0, 1]\) of a set that dominates 80% of the network (i.e. and 0.8ADS). Namely, if a fraction \(\varpi \in [0, 1]\) suffices to cover at least 80% of the network, then \(p = \log (\varpi \cdot n) / \log n\).
Key takeaway
The IGAM model can better explain the sublinear domination phenomenon in core–periphery networks than LogisticJB, LogisticCP, and LogisticTH. Also LogisticCP and LogisticJB achieve better coverage compared to LogisticTH. Perhaps the most characteristic are the faculty (csfaculty, historyfaculty, businessfaculty) and the worldtrade datasets where IGAM produces an almost dominating set with an exponent \(p \le 0.16\) whereas LogisticTH finds a similar set with \(p \ge 0.54\), and LogisticCP finds an 0.8ADS with \(p = 0.15\) in the case of businessfaculty and with \(p \ge 0.32\) in the rest of the datasets. In the polblogs dataset, IGAM is able to find an 0.8ADS with \(p = 0.27\) whereas LogisticCP finds one with \(p = 0.64\) and LogisticTH finds a much larger one with \(p = 0.81\). In the openairlines dataset the 0.8ADS corresponds to \(p = 0.61\) for IGAM and to \(p \ge 0.82\) for the logistic methods. Finally, the smallest variation between the methods exhibits the Londonunderground dataset where p ranges from \(p = 0.75\) (IGAM) to \(p = 0.85\) (LogisticTH). Concluding, the ADS constructed by IGAM are consistently smaller than the ones produced by LogisticCP and LogisticJB which are smaller than the ones produced with LogisticTH, which suggests that IGAM is able to explain the sublinear domination phenomenon where other logistic models fail to do so.
Miscellaneous properties
Smallworld behaviour
To determine the diameter (the diameter of a disconnected network is taken to be the diameter of its giant connected component) of the network, we build an Erdös–Renyi (ER) network W with n nodes and edge probability \(f^* = \min _{u, v} f(u, v) = c^{H1}\). It follows from a standard coupling argument, i.e. a “tossbytoss” comparison, that we can relate the two networks as one being subgraph of the other, in our case the ER network W being subgraph of the IGAM network, say G. The coupling is constructed as follows: \(\Pr \left[ (u, v) \in E(G) \;  \; (u, v) \in E(W) \right] = 1\), \(\Pr \left[ (u, v) \in E(G) \;  \; (u, v) \notin E(W) \right] = \tfrac{f(u, v)  f^*}{1  f^*}\), so that \(\Pr \left[ (u, v) \in E(G) \right] = f(u, v)\), and \(\Pr [(u, v) \in E(W)] = f^*\). Then it follows that the diameter of the IGAM network is at most the diameter of W. Using a result from Ref.^{30,31}, we have that since the average degree of W is \(\Theta ((b / c)^H) \rightarrow \infty\) as \(H \rightarrow \infty\), the diameter of W is close to \(\log n / \log (n f^*) = \Theta (\log b / \log (b / c)) = O(1)\) a.a.s. From that we can deduce that G has a diameter close to \(O(\log b / \log (b / c)) = O(1)\) a.a.s. This result can follow from intuition also, since all the nodes at a logarithmic height of the root dominate the periphery and a worst case path should roughly be between two peripheral nodes which are connected via a node at the core, with this node being a common dominator of them.
Global clustering coefficient (GCC)
The probability of uvw being a triangle given that \(h(u) \le h(v) \le h(w)\) is \(\beta _{uvw} = f(u, v) f(u, w) f(v, w) = c^{3  2 h(u)  h(v)}\), thus the expected total number of closed triangles \(T_C\) is
The calculation has been deferred to the “Methods” section (Eq. 1). The probability \(\gamma _{uvw}\) of uvw being a triplet (open or closed) is given as \(\gamma _{uvw} = f(u, v) f(u, w) + f(u, v) f(v, w) + f(u, w) f(v, w)\). Conditioned on the event that \(h(u) \le h(v) \le h(w)\) we can deduce that \(3 c^{22h(v)} \le \gamma _{uvw} \le 3 c^{22h(u)}\). Similarly to \(T_C\), the expected number of open triplets \(T_R\) is \(\Theta \left( \frac{b^{3H}}{c^{2H + 2}} \right)\) (see Eq. (2) in the “Methods” section). By McDiarmid’s Inequality^{32}, since \(T_C\) and \(T_R\) are \(\Theta (b^H)\)Lipschitz we have that \(\Pr \left[  T_C  {\mathbb {E}}_{} \left[ T_C \right]  = \Omega (b^{H}) \right] = O \left( e^{b^H} \right)\), and \(\Pr \left[  T_R  {\mathbb {E}}_{} \left[ T_R \right]  = \Omega (b^{H}) \right] = O \left( e^{b^H} \right)\) and therefore we can deduce that the GCC \(T_C / T_R\) is \(O(c^{H} + b^{H}) = O(c^{H})\) with probability \(1  O \left( e^{b^H} \right)\) by combining the two concentration bounds. Therefore, w.e.p. clustering coefficient is \(O(c^{H})\).
Core–periphery conductance
The expected conductance of a set \(\emptyset \subset S \subset [n]\) is given as \(\bar{\phi }(S) = {\mathbb {E}}_{} \left[ e(S, \bar{S}) \right] / \min \{ S, \bar{S} \}\), where \(\bar{S} = [n] \setminus S\). Letting \(S_\tau\) to be the nodes at the first \(\tau < H\) levels where \(S_\tau  \le \bar{S}_\tau \) yields \(\min \{  S_\tau  ,  \bar{S}_\tau  \} = b^{\tau + 1}  1\), and
\(\bar{\phi }(S_\tau ) = \Theta ( b^H / c^\tau )\). Letting \(\tau = \log (2 c H \log b) / \log (b / c)\) be the core’s height we deduce that \(\bar{\phi }(C) = \Theta ( b^H / H )\).
Model generalizations
IGAM2
We fully align with the stochastic blockmodel definition of core–periphery networks presented in Ref.^{3} by defining the following generalization of IGAM, which we call IGAM2, parametrized by \(b> c_2> c_1 > 1\). In this context, we start with the same skeleton tree of fanout b and then the law g(u, v) for generating the edges is
where \(0< H_0 < H\) is the core’s threshold. The probability g(u, v) of an edge between two nodes with \(\max \{ h(u), h(v) \} \le H_0\) (i.e. core–core edges) is greater than the probability between two nodes whose heights satisfy \(\min \{ h(u), h(v) \} \le H_0\) and \(\max \{ h(u), h(v) \} > H_0\) (core–periphery edges), which is greater than the probability of the case that \(\min \{ h(u), h(v) \} > H_0\) (periphery–periphery edges). Figure 3 presents the adjacency matrix of a sampled IGAM2 network with parameters \(c = 1.5, c_2 = 2.5, b = 3, H_0 = 2\), and \(H = 6\).
We analyze the mathematical properties of IGAM2, which are similar to the properties of IGAM, in the “Methods” section. Most of our proofs are based on a construction of a coupling of an IGAM2 network with two (simple) IGAM networks with parameters \((b, c_1, H)\) and \((b, c_2, H)\). The coupling is constructed such that the three graphs form an ordering based on the subgraph relation.
Directed and continuous versions
The IGAM model has a natural directed extension: for two nodes u, v with heights h(u) and h(v) with \(h(u) \le h(v)\) we create an edge from u to v with probability \(\xi (u, v) = c^{1h(v)}\) and a directed edge from v to u with probability \(\xi (v, u) = c^{1h(u)}\). This edge creation law corresponds to the following philosophy: a nonfamous node wants to connect to a prestigious node and a famous node does not want to connect to a nonfamous one but it has better affinity for the nodes near its prestige \(h(\cdot )\). This version of IGAM has also a sublinear dominating set, that is every node in the periphery has a directed edge to at least one node in the core w.h.p. The proof of this fact is identical to the case of the simple model.
In the continuous version of IGAM the height of a node v is allowed to be any real number \(h(v) \in [0, H]\) and the edge creation law remains the same as the simple version of IGAM. Moreover, similarly to Ref.^{19} the edge creation law f(u, v) can be approximated by the limit as \(\delta \rightarrow  \infty\) of a law \(f_{\delta }(u, v)\) that involves the generalized mean of h(u) and h(v), i.e.
The model given by (\(\delta\)IGAM) can be treated as the scalefree version of the logistic model of^{19} where the reverse ranking is replaced by the height function. The network has \(n = b^H  1\) nodes. If the heights h are latent variables drawn independently from a distribution with Cumulative Density Function (CDF) equal to
then we can easily show that the continuous model has a sublinear dominating set by partitioning [0, H] to intervals of the form \([t_i, t_i + \Delta t]\) and generalizing the analysis of the discrete model as \(\Delta t\) becomes infinitesimal.
Finally, a mathematical and empirical study and a comparison between the extensions of IGAM and logistic core–periphery models are interesting questions to be addressed in future work, and lie beyond the scope and length of the current paper.
Conclusions
The present paper observes a connection between the core–periphery structure of networks and dominating sets. We devise a simple generative model which facilitates this connection and fit it to realworld data validating our observations. We believe it is worthwhile to explore the algorithmic implications of this connection further.
Methods
Reproducibility
Code and data needed to exactly reproduce are provided in the form of a Jupyter notebook and is available here^{33}. The software has been developed in Python by the author and uses the following opensource libraries: numpy^{34}, scipy^{35}, networkx^{36}, matplotlib^{37}, pandas^{38}, and seaborn^{39}.
Qualitative results addendum
The analytical results which are briefly presented in “5.1” section can be found below for the first three levels of the hierarchy. Groups enclosed in parentheses correspond to separate levels. In the faculty hiring networks the “All others” node represents all nonUS institutions:

Worldtrade (Finland) (Hungary, Slovenia, Singapore, Chile) (Salvador, Iceland, Kuwait, Rep., Belgium, Poland, Moldava., Austria, Germany, Indonesia, Guatemala, Bolivia, Paraguay, Australia, Africa, Of).

Londonunderground (Bank) (Baker Street, Canning Town) (Kings Cross St. Pancras, Stratford, Willesden Junction, Earls Court).

Openarilines (AMS) (FRA, CDG) (IST, MUC, ATL, PEK)

csfaculty (All others) (University of Illinois, Urbana Champaign, MIT) (Purdue University, University of Texas, Austin, Carnegie Mellon University, Stanford University)

Historyfaculty (All others) (Harvard University, Yale University, University of Chicago, University of Wisconsin, Madison, Columbia University) (UC Berkeley, UCLA, Princeton University, University of Michigan, University of Pennsylvania, Stanford University, Johns Hopkins University, Rutgers University, University of Virginia, Cornell University, University of Texas, Austin, New York University, Indiana University, Northwestern University, Ohio State University, University of Illinois, Urbana Champaign, University of North Carolina, Chapel Hill, Duke University, Brown University, University of Minnesota, Minneapolis, Michigan State University, UC San Diego, UC Santa Barbara, Brandeis University, University of Washington).

Businessfaculty (All others) (University of Michigan, University of Texas, Austin) (Ohio State University, Indiana University, Pennsylvania State University, University of Pennsylvania).
Global clustering coefficient of IGAM
For the number of closed triplets (i.e. triangles) we have
For the number of open triplets we have that
Similarly, \({\mathbb {E}}[T_R] \ge 3 \sum _{w : h(w) = h(v) + 1}^H \sum _{v: h(v) = h(u) + 1}^{h(w)} \sum _{u: h(u) = 0}^{h(v)} b^{h(u) + h(v) + h(w)} c^{22h(v)} = \Theta \left( \frac{b^{3H}}{c^{2H + 2}} \right)\). Therefore, \({\mathbb {E}}[T_R] = \Theta \left( \frac{b^{3H}}{c^{2H + 2}} \right)\).
Properties of IGAM2
We describe the mathematical properties of IGAM2. First, we construct a coupling between IGAM2 and IGAM which we can use a proxy for the behaviour of IGAM2.
Remark
Throughout the proofs we use the following remark: For every two positive integers s, t with \(s < t\) and for a positive integer constant \(b \ge 2\) we have that \((1  1 / b) b^t \le b^t  b^{t  1} \le b^t  b^s \le b^t\). Therefore \(b^t  b^s = \Theta (b^t)\) with constants \(C_1 = 1  1 / b\) and \(C_2 = 1\).
Coupling construction
We consider a randomly generated network \(G \sim \text {IGAM2}(b, c_1, c_2, H_0, H) \equiv \text {IGAM}(b, c_1, H)\) for \(1< c_1 \le c_2 < b\) and \(0 \le H_0 \le H\) with edge law g. We also consider a network \(G' \sim \text {IGAM2}(b, c_2, c_2, 0, H) \equiv \text {IGAM}(b, c_2, H)\) and a network \(G'' \sim \text {IGAM2}(b, c_1, c_1, 0, H)\) with edges law \(g'\) and \(g''\) coupled with G as follows:

\(\Pr \left[ (u, v) \in E(G)  (u, v) \in E(G') \right] = 1\) and \(\Pr \left[ (u, v) \in E(G)  (u, v) \notin E(G') \right] = \tfrac{g(u, v)  g'(u, v)}{1  g'(u, v)} \in [0, 1]\).

\(\Pr \left[ (u, v) \in E(G'')  (u, v) \in E(G) \right] = 1\) and \(\Pr \left[ (u, v) \in E(G'')  (u, v) \notin E(G) \right] = \tfrac{g''(u, v)  g(u, v)}{1  g(u, v)} \in [0, 1]\).
Under this coupling, which we denote as \(\nu\), we have that \(\Pr \left[ (u, v) \in E(G) \right] = \Pr \left[ (u, v) \in E(G) \right (u, v) \in E(G') ] \Pr [(u, v) \in E(G')] + \Pr \left[ (u, v) \in E(G)  (u, v) \notin E(G') \right] \Pr [(u, v) \notin E(G') ] = g'(u, v) \cdot 1 + (1  g'(u, v)) \cdot \tfrac{g(u, v)  g'(u, v)}{1  g'(u, v)} = g'(u, v) + g(u, v)  g'(u, v) = g(u, v)\) and, similarly, \(\Pr \left[ (u, v) \in E(G'') \right] = g''(u, v)\). The coupling also satisfies that \(G'\) is a subgraph of G (\(G' \subseteq G\)) since \((u, v) \in E(G')\) implies \((u, v) \in E(G)\). Moreover, G is a subgraph of \(G''\) (\(G \subseteq G''\)) since every edge of G belongs to the edge set of \(G''\).
Sublinear dominating set
We let \((G', G, G'') \sim \nu\). We know that \(G'\) is generated from a simple IGAM model therefore it has a dominating set of size \(b^{O(\log (2 c_2 H \log b) / \log (b / c)} = b^{o(H)} = o(n)\). Since \(G' \subseteq G\), the dominating set of G has size at most the dominating set of \(G'\). Therefore, G has a dominating set of size \(b^{O(\log (2 c_2 H \log b) / \log (b / c)} = b^{o(H)} = o(n).\)
Degree distribution
We fix a node \(u \in V\). We have the following

1.
If \(h(u) > H_0\) then the law that is obeyed is \(g(u, v) = c_2^{1\min \{h(u), h(v) \}}\). From the simple IGAM model we have calculated the degree in this case to be \(\Theta \left( b^{H + 1} / c_2^{h(u) + 1} \right)\).

2.
If \(h(u) \le H_0\) then
$$\begin{aligned} \bar{d}_h&\approx \sum _{r = 0}^{H_0} b^r c_1^{\min \{ h(u), r \}  1} + \sum _{r = H_0 + 1}^H b^r c_2^{\min {h(u), r \}  1}} \\ &= \Theta \left( \frac{b^{H_0 + 1}}{c_1^{h(u) + 1}} \right) + \sum _{r = H_0 + 1}^H b^r c_2^{1h(u)} \\ &= \Theta \left( \frac{b^{H_0 + 1}}{c_1^{h(u) + 1}} \right) + \Theta \left( \frac{b^{H + 1}}{c_2^{h(u) + 1}} \right) . \end{aligned} $$
Therefore, every node, parametrized by its height h has average degree
To bound the average number of edges we refer to the coupling \(\nu\) and deduce that the average number of edges \(\bar{m}\) of G is at most the average number of edges of \(G'\), say \(\bar{m}' = \Theta (b^{2H} / c_2^H)\) (as we showed in the main part of the paper) due to the subgraph relationship. Therefore, the average number of edges is \(\bar{m} = O(b^{2H} / c_2^H)\). A better bound can be obtained by calculating the expected value analytically using the form of \(\bar{d}_h\) we derived above. Namely,
We still observe that \(\bar{m} = O(b^{2H} / c_2^H)\). Moreover, using the fact that the edges \(\bar{m}''\) of \(G''\) are \(\Theta (b^{2H} / c_1^H)\) we get, in the same logic, that \(\bar{m} \ge \bar{m}''\), and, thus \(\bar{m} = \Omega (b^{2H} / c_1^H)\). Note that setting \(c_1 = c_2\) and \(H_0 = 0\) recovers the result for the simple IGAM model.
Smallworld behaviour
We let \((G', G, G'') \sim \nu\). Since \(G' \subseteq G\), the diameter of G is at most the diameter of \(G'\) because every path between two nodes in \(G'\) is a path in G. Since the diameter of \(G'\) is close to \(\Theta (\log b / \log (b / c_2)) = O(1)\) a.a.s., then the diameter of G is also close to O(1) a.a.s.
Global clustering coefficient
Let \((G', G, G'') \sim \nu\). Let uvw be a triplet in G such that \(h(u) \le h(v) \le h(w)\). The probability that uvw is a triangle in \(G'\) is \(\beta _{uvw}'\), \(\beta _{uvw}\) if uvw is a triangle in G and \(\beta _{uvw}''\) if uvw is a triangle in \(G''\). From the subgraph relationship we have that \(c_2^{32h(u)h(v)} = \beta _{uvw}' \le \beta _{uvw} \le \beta _{uvw}'' = c_1^{32h(u)  h(v)}\). Therefore, the number of triangles \(T_C\) (respectively \(T_C'\) for \(G'\) and \(T_C''\) for \(G''\)) satisfies \({\mathbb {E}}[T_C'] \le {\mathbb {E}}[T_C] \le {\mathbb {E}}[T_C'']\). Using Eq. 1 we deduce that \({\mathbb {E}}[T_C''] = \Theta \left( \frac{b^{3H}}{c_1^{3H + 3}} \right)\) and \({\mathbb {E}}[T_C'] = \Theta (b^{3H} / c_2^{3H + 3})\).
The probability \(\gamma _{uvw}\) of uvw being a triplet in G (respectively \(\gamma _{uvw}'\) in \(G'\) and \(\gamma _{uvw}''\) in \(G''\)) satisfies \(3c_2^{22h(v)} \le \gamma _{uvw}' \le \gamma _{uvw} \le \gamma _{uvw}'' \le 3c_2^{22h(u)}\). The expected number of triplets is denoted by \({\mathbb {E}} [T_R]\) (\({\mathbb {E}}[T_R']\) for \(G'\) and \({\mathbb {E}}[T_{R}'']\) for \(G''\)) can be found by using Eq. 2. If we execute the sum mutatis mutandis, we arrive at the fact that \(\Omega (b^{3H} / c_2^{2H + 2} = {\mathbb {E}}[T_R] = O (b^{3H} / c_1^{2H + 2})\). McDiarmid’s Inequality^{32} states that \(\Pr \left[ T_C \le {\mathbb {E}}[T_C] + O(b^H) \right] = 1  O \left( e^{b^H} \right)\), and \(\Pr \left[ T_R \ge {\mathbb {E}}[T_R]  O(b^H) \right] = 1  O \left( e^{b^H} \right)\). because \(T_C, T_R\) are \(\Theta (b^H)\)Lipschitz functions [In general, for a graph G with n nodes the number of triangles of G as a function of the edge variables is a 3nLipschitz per edge, since deleting or adding an edge can change the number of triangles by 3n, and, similarly, the number of triplets is a 2nLipschitz function since each edge is part of at most 2n paths on 3 vertices]. Thus, with probability \(1  O \left( e^{b^H} \right)\) we have that \(\tfrac{T_C}{T_R} \le \tfrac{{\mathbb {E}} [T_C]}{{\mathbb {E}} [T_R]} + O (b^{H}) = O \left( \tfrac{c_2^{2H + 2}}{c_1^{3H + 3}} + b^{H} \right)\).
Core–periphery conductance
Let \((G', G, G'') \sim \nu\). Let the partition \((S_\tau , \bar{S}_\tau )\) be at level \(\tau\), i.e. all nodes with height \(h \le \tau\) and the periphery \(\bar{S}\) with \(h \ge \tau\). From the subgraph relationship we get that \(e'(S_{\tau }, \bar{S}_{\tau }) \le e(S_{\tau }, \bar{S}_{\tau }) \le e''(S_{\tau }, \bar{S}_{\tau })\), and subsequently \({\mathbb {E}} [e'(S_\tau , \bar{S}_{\tau })] \le {\mathbb {E}}[e(S_\tau , \bar{S}_{\tau }) \le {\mathbb {E}} [ e''(S_\tau , \bar{S}_{\tau })]\). Thus \(\bar{\phi }'(S_\tau ) \le \bar{\phi }(S_\tau ) \le \bar{\phi }''(S_\tau )\). Using the fact about the core periphery conductance we proved for the simple IGAM model, since \(G', G''\) are equivalently produced from the simple IGAM model, we get that, on expectation, \(\Omega \left( \tfrac{b^H}{c_2^\tau } \right) = \bar{\phi }(S_{\tau }) = O \left( \tfrac{b^H}{c_1^\tau } \right)\). If we take \(\tau = H_0 = O(\log H)\) to be the core, we can deduce that the core conductance is \(\Theta (b^H / H)\) as in the case of the simple IGAM model.
Data preprocessing
We have ignored directionality in the examined networks and have removed nodes with degree less than or equal to 4 (except in the Londonunderground network where almost all degrees are very small). The removal of nodes with degree less than or equal to 4 is done (i) to remove outlier nodes and, (ii) to refer to the removal of nonengaged nodes (Supplementary Information).
Ethical standard
The current paper proposes a theoretical model, its properties and fits it to realworld data. Thus, there are no ethical concerns.
Code availability
The data used in this study are publicly available and are located in the following resources http://casos.cs.cmu.edu/computational_tools/datasets/external/world_trade/index2.html^{24} \(^{[24]}\). http://advances.sciencemag.org/cgi/content/full/1/1/e1400005/DC1^{25}. http://networkrepository.com/polblogs.php^{26}. https://toreopsahl.com/datasets/#usairports^{27}. https://www.cs.cornell.edu/%7earb/data/spatialOpenFlights/^{28}. https://www.cs.cornell.edu/%7earb/data/spatialCelegans/^{28}. https://www.cs.cornell.edu/%7earb/data/spatialundergroundLondon/^{8}. The Jupyter notebook for reproducing the results of this study is also available in Ref.^{33}.
References
Nemeth, R. J. & Smith, D. A. International trade and worldsystem structure: A multiple network analysis. Review (Fernand Braudel Center) 8, 517–560 (1985).
Wallerstein, I. WorldSystems Analysis. Social Theory Today3 (1987).
Zhang, X., Martin, T. & Newman, M. E. Identification of coreperiphery structure in networks. Phys. Rev. E 91, 032803 (2015).
Snyder, D. & Kick, E. L. Structural position in the world system and economic growth, 1955–1970: A multiplenetwork analysis of transnational interactions. Am. J. Sociol. 84, 1096–1126 (1979).
Krugman, P. Increasing returns and economic geography. J. Polit. Econ. 99, 483–499 (1991).
Avin, C., Lotker, Z., Peleg, D., Pignolet, Y. A. & Turkel, I. Coreperiphery in networks: An axiomatic approach. Preprint at http://arxiv.org/abs/1411.2242 (2014).
Borgatti, S. P. & Everett, M. G. Models of core/periphery structures. Soc. Netw. 21, 375–395 (2000).
Jia, J. & Benson, A. R. Random spatial network models for coreperiphery structure. Proc. Twelfth ACM International Conference on Web Search and Data Mining 366–374 (2019).
Elliott, A., Chiu, A., Bazzi, M., Reinert, G. & Cucuringu, M. Core–periphery structure in directed networks. Proc. R. Soc. A 476, 20190783 (2020).
Garey, M. R. & Johnson, D. S. A Guide to the Computers and Intractability (1979).
Bonato, A., Lozier, M., Mitsche, D., PérezGiménez, X. & Prałat, P. The domination number of online social networks and random geometric graphs. In International Conference on Theory and Applications of Models of Computation 150–163 (Springer, 2015).
Molnár, F., Sreenivasan, S., Szymanski, B. K. & Korniss, G. Minimum dominating sets in scalefree network ensembles. Sci. Rep. 3, 1–10 (2013).
Cooper, C., Klasing, R. & Zito, M. Lower bounds and algorithms for dominating sets in web graphs. Internet Math. 2, 275–300 (2005).
Nacher, J. C. & Akutsu, T. Dominating scalefree networks with variable scaling exponent: Heterogeneous networks are not difficult to control. New J. Phys. 14, 073005 (2012).
Bonato, A., Janssen, J. & Prałat, P. Geometric protean graphs. Internet Math. 8, 2–28 (2012).
Leskovec, J., Kleinberg, J. & Faloutsos, C. Graph evolution: Densification and shrinking diameters. ACM Trans. Knowl. Discov. Data 1, 2 (2007).
Kleinberg, J. M. Smallworld phenomena and the dynamics of information. Adv. Neural Inf. Process. Syst. 1, 431–438 (2002).
Nemhauser, G. L., Wolsey, L. A. & Fisher, M. L. An analysis of approximations for maximizing submodular set functionsI. Math. Program. 14, 265–294 (1978).
Tudisco, F. & Higham, D. J. A nonlinear spectral method for coreperiphery detection in networks. SIAM J. Math. Data Sci. 1, 269–292 (2019).
Schroeder, M. & Herbich, M. Fractals, chaos, power laws. Pure Appl. Geophys. 147, 601 (1996).
Watts, D. J., Dodds, P. S. & Newman, M. E. Identity and search in social networks. Science 296, 1302–1305 (2002).
Menczer, F. Growing and navigating the small world web by local content. Proc. Natl. Acad. Sci. 99, 14014–14019 (2002).
Leskovec, J., Chakrabarti, D., Kleinberg, J., Faloutsos, C. & Ghahramani, Z. Kronecker graphs: An approach to modeling networks. J. Mach. Learn. Res. 11, 2 (2010).
De Nooy, W., Mrvar, A. & Batagelj, V. Exploratory Social Network Analysis with Pajek: Revised and Expanded Edition for Updated Software Vol. 46 (Cambridge University Press, 2018).
Clauset, A., Arbesman, S. & Larremore, D. B. Systematic inequality and hierarchy in faculty hiring networks. Sci. Adv. 1, e1400005 (2015).
Adamic, L. A. & Glance, N. The political blogosphere and the 2004 us election: Divided they blog. In Proc. 3rd International Workshop on Link Discovery 36–43 (2005).
Colizza, V., PastorSatorras, R. & Vespignani, A. Reactiondiffusion processes and metapopulation models in heterogeneous networks. Nat. Phys. 3, 276–282 (2007).
Kaiser, M. & Hilgetag, C. C. Nonoptimal component placement, but short processing paths, due to longdistance projections in neural systems. PLoS Comput. Biol. 2, e95. https://doi.org/10.1371/journal.pcbi.0020095 (2006).
Lee, E., Clauset, A. & Larremore, D. B. The dynamics of faculty hiring networks. Preprint at http://arxiv.org/abs/2105.02949 (2021).
Bollobás, B. Random Graphs 73 (Cambridge University Press, 2001).
Chung, F. & Lu, L. The diameter of sparse random graphs. Adv. Appl. Math. 26, 257–279 (2001).
Doob, J. L. Regularity properties of certain families of chance variables. Trans. Am. Math. Soc. 47, 455–486 (1940).
Papachristou, M. Supplementary Source Code (2021). https://colab.research.google.com/drive/1xb8cT_1Y9hcJP04VaZpUAQIjaJEOi80W#scrollTo=yJlGboymTLfA&uniqifier=2.
Van Der Walt, S., Colbert, S. C. & Varoquaux, G. The numpy array: A structure for efficient numerical computation. Comput. Sci. Eng. 13, 22–30 (2011).
Virtanen, P. et al. Scipy 1.0: Fundamental algorithms for scientific computing in python. Nat. Methods 17, 261–272 (2020).
Hagberg, A., Swart, P. & S Chult, D. Exploring network structure, dynamics, and function using network. In Tech. Rep., Los Alamos National Lab.(LANL) (2008).
Hunter, J. D. Matplotlib: A 2d graphics environment. IEEE Ann. Hist. Comput. 9, 90–95 (2007).
McKinney, W. et al. Pandas: A foundational python library for data analysis and statistics. Python High Perform. Sci. Comput. 14, 1–9 (2011).
Waskom, M. L. Seaborn: Statistical data visualization. J. Open Source Softw. 6, 3021 (2021).
Acknowledgements
The author would like to thank Jon Kleinberg for his useful suggestions during the preparation the present manuscript. The author would like to thank the anonymous referees for their valuable feedback. Supported in part by a Vannevar Bush Faculty Fellowship.
Author information
Authors and Affiliations
Contributions
M.P. is the sole author of the manuscript and undertook the entirety of the research.
Corresponding author
Ethics declarations
Competing interests
The author declares no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Papachristou, M. Sublinear domination and core–periphery networks. Sci Rep 11, 15528 (2021). https://doi.org/10.1038/s41598021941058
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598021941058
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.