A simple model clarifies the complicated relationships of complex networks

Real-world networks such as the Internet and WWW have many common traits. Until now, hundreds of models were proposed to characterize these traits for understanding the networks. Because different models used very different mechanisms, it is widely believed that these traits origin from different causes. However, we find that a simple model based on optimisation can produce many traits, including scale-free, small-world, ultra small-world, Delta-distribution, compact, fractal, regular and random networks. Moreover, by revising the proposed model, the community-structure networks are generated. By this model and the revised versions, the complicated relationships of complex networks are illustrated. The model brings a new universal perspective to the understanding of complex networks and provide a universal method to model complex networks from the viewpoint of optimisation.

The supplementary material is divided into three parts: the theoretical analysis, the relationships and the experiments.

Part Part Part Part 1: 1: 1: 1: Theoretical Theoretical Theoretical Theoretical Analysis Analysis Analysis Analysis
This model is actually a multi-objective optimisation problem with constraints and a random variable. The model can be written as Equation (S1), which is the same as Equation (1)   Here, x(i) is the degree of node i of the complex network A, y is the average shortest path, and c/xmin/a/b/N are non-negative constants. N is the number of nodes, and xmin is the minimum degree. The function δij indicates whether there is a link between node i and node j or not.
When considering multi-objective optimisation problems, the solutions are quite different from those of single-objective optimisation problems. We require the concept of 'the Pareto front' to discuss multi-objective optimisation.
For the convenience of discussion, we assume that all of the functions are to be minimised.
For the maximised functions, we use a transform function to obtain the minimised function.

On On On On the the the the Pareto Pareto Pareto Pareto front front front front
Multi-objective optimisation problems 1 differ from single-objective optimisation problems because the different objectives may conflict with each other. As Fig. S1 shows, the solution with the smallest value for the first objective exhibits one of the worst values for the second objective.
Therefore, the "best" solutions to a multi-objective optimisation problem can be defined as "none better is the best". All of the best solutions form a set, which is called the "non-dominated set" (NDS). None of the elements in the NDS are dominated by a feasible solution, and they form the Pareto front. For convenience, the multi-objective optimal problems are commonly written as multiple maximum objectives or minimum objectives. Here, F2 is larger than 1, so we rewrite equation S1 as S1'. In Fig.S1 and Fig.S5, the horizontal axis F2 and vertical axis F1 is the form of S1'.
[ ] The Pareto front is called "the skyline" in the field of database management systems 2 .

Detailed Detailed Detailed Detailed Analysis Analysis Analysis Analysis
Equation (S1) can be rewritten as Equation (S4).
Because x(i) and x(j) come from the same random variable X, assuming that the independently and identically distributed (iid) condition is satisfied, we use x(i) to approximate x(j), so Equation (S4) can be rewritten as Equation (S5).  According to the constraint that the network should be connected, we have x(i) ≥1. Therefore,   (1 ) Here, C satisfies Equation (S11).
Note that Equation (S12) and Equation (S13) always hold, x(i) can be regarded as a sample of the random variable X, and p is a function defined on the sample space; therefore, p is the probability density function.
This section proves that the solutions of this model are scale-free networks. We refer to these scale-free networks as the optimal scale-free networks.
Note that F1 is dependent on xmin and γ based on Equation (S15).
When X=N-1, p(X) is very small, and we can truncate it to a reasonable value of xmax that has a larger probability of occurring. Thus, Equation (S15) can be rewritten as Equation (S16).
On On On On the the the the non-optimal non-optimal non-optimal non-optimal scale-free scale-free scale-free scale-free networks networks networks networks From a community structure perspective, a hub tend to connect many of the other nodes belonging to the same community, but the hubs of different communities are not linked together.
To obtain optimal solutions, hubs will link together to obtain the largest F2, so the community-structure networks are non-optimal solutions of this model.
The other non-optimal scale-free networks can be regarded as transitional forms between optimal scale-free networks and community-structure scale-free networks.
On On On On the the the the revised revised revised revised model model model model for for for for community-structure community-structure community-structure community-structure scale-free scale-free scale-free scale-free networks networks networks networks Based on the Lagrangian relaxation method, the model can be rewritten as Equation (S17). When considering a network with community structure, we should consider the other distances in addition to the topological distance. We assume that the members of each community are categorized by the remainders of the modular function. Suppose that there exist two communities, where the nodes of odd numbers belong to the first community, and the nodes of even numbers belong to the second community. We can then rewrite the average distance of the network as Equation (S18).
Here, η is the distance penalty factor, and u is the number of communities.
Correspondingly, we define the constant of the average distance as Equation (S19).
Here, c is still the topological distance, and Δ represents other distances. Therefore, the model for a community-structure network can be rewritten as Equation (S20). ' 2 1 Using a structure that is similar to that of S1, the model for a community-structure network can be rewritten as Equation (S20'). According to Equation (S18), because the model is similar to S1, we can easily prove that it can generate optimal solutions for community-structure scale-free networks in the revised form. When xmin=1, the maximum of the average shortest path (maxASP) is (N+1)/3 because the network must satisfy the constraint that the network be connected.
When xmin=2, the maximum of the average shortest path is This proposition is very simple to prove. Note that the network tends to become isolated (k+1)-complete graphs, and the network also need to be a connected network. Thus each of (k+1)-complete graphs must lend edges to connect the others. The network will be connected in the linear form, which means the average shortest path is the largest.

. . A A A A schematic schematic schematic schematic network network network network for for for for a a a a linear linear linear linear network network network network when when when when xmin xmin xmin xmin=4 =4 =4 =4
In S3 The network can be divided into some components with 5 nodes. Each node in the previous component have a distance of 3 hops to the corresponding node in the next component.
For example, node 11 has 3 hops away from node 16. This rule also holds between node 12 and node 17, node 13 and node 18, node 14 and node 19, and node 15 and node 20.
Therefore, this network can be regarded as a linear network with xmin=1, but the weights of the edges are approximately 3. Thus, the maximum of the average shortest path satisfies Equation (S19).
Moreover, Fig. S2 pre-assumes a smaller N. When N becomes larger, the schematic map may change slightly.

The The The The relationships relationships relationships relationships when when when when γ γ γ γ=3 =3 =3 =3
Here we draw the schematic map of the relationships of complex networks with the exponential value 3 as Fig. S4. The relationships for γ = 3 are similar to those for γ = 2. However, xmin larger than 1. When γ = 3 and xmin=1, the network cannot be connected because there is a large number of nodes with only one neighbor.

On On On On the the the the histogram histogram histogram histogram method method method method
In general, when dealing with a non-convex and discontinuous Pareto front, the histogram method does not guarantee that the obtained solutions are on the Pareto front. However, for the model in this paper, the histogram method is feasible.

. . A A A A schematic schematic schematic schematic map map map map of of of of the the the the histogram histogram histogram histogram method method method method
As Fig. S5 shows, the histogram method fixes F1 and then optimizes F2. The feasible solutions move toward the Pareto front under the optimal objects, first C, then B and finally A.
increase, F2 decrease. Therefore the solutions obtained by this method are on the Pareto front.
By using the histogram method for all of the different F1 values, we can obtain the entire Pareto front.

On On On On the the the the optimi optimi optimi optimis s s sation ation ation ation algorithm algorithm algorithm algorithm
This algorithm is designed to address a bi-objective optimisation problem with random variables.
To satisfy the iid condition, Equations (S15) and (S18) must be slightly modified. A weak constraint is added to ensure that the x(i) are independent and identically distributed. This constraint should be very weak to ensure that the average shortest path can govern the optimisation process.
According to the results of the theoretical analysis, we use the standard power law distribution (stP) as a benchmark, and the degree distribution of the optimising network (P) must be approximated as close to stP as possible. Thus, Equation (S15) can be rewritten as Equation Here, ψ is a constant that is much smaller than θ because the constraint of the average shortest path is much stronger than the constraint of distribution.
When using the histogram method, according to the iid, the bi-objective optimisation problem S22 can be transferred to a single objective optimisation problem as S23.
( 1) 2 Though benchmark is not a necessary condition, the optimisation process is much slow to converge if we do not use the benchmark to guide the algorithm.

� � � � Idea Idea Idea Idea
According to the equation of the power law distribution, which can be written as Equation (S16), we calculate the number of edges. Next, we attempt to delete one edge at random under the conditions of the xmin constraint and the connected network constraint; furthermore, we randomly choose two nodes without a link between them and then add a link between them. If g(A) becomes smaller, then we accept these changes. We repeat these attempts until a satisfactory result is obtained.

� � � � Performance Performance Performance Performance Analysis Analysis Analysis Analysis
The algorithm is very simple. The time-consuming part is the calculation of the average shortest path. In our algorithm the time complexity of the average shortest path is O(ElogV) by Dijkstra algorithm with heap structure 4 , where V is the number of nodes and E is the number of the edges..

The The The The Experimental Experimental Experimental Experimental Results Results Results Results
We use 100 CPU kernels in personal computers to perform the experiments.
We set 24 parameter settings to illustrate the exact networks. For each parameter setting, the algorithm was performed 10 times. As the experimental results are very robust, we show only the results for the first run in Fig. S6-Fig. S29.
In all of the experiments, we set η=10 and θ=10. When obtaining the Delta-distribution networks, we set ψ =10 -7 ; when obtaining the random networks, we set ψ =0; in the other circumstances, we set ψ=10 -5 .
For different parameter settings, the numbers of edges are different. According to Equation (S21), we list the numbers of edges that we have used in Table S1. Table Table Table Table S1. S1. S1. S1 The upper boxes demonstrate the degree distributions, and the fitting results xmin, y,γare given.
To reduce the noise, the fitting line in the upper boxes are plotted by log-bins method. Because the information of degree distributions will be lost by bining the data, so the fitting results(xmin, y,γ) are fitted from the original distributions.
When the degree distribution is power law, the degree distributions with blue circle fit well with the red line, and the fitting result xmin equals the parameter xmin.
The fractal networks checked by the box-covering method 6 are shown in Fig. S44-Fig. S47. Figures Figures Figures Figures for for for   � Figures Figures Figures Figures for for for for γ=3 =3 =3 =3 -22 - x m in =3,γ=3.5, y=2.9
In Table S2, the value of the third column "power-law" is the testing value of power law's plausibility, and the power law is considered a plausible statistical hypothesis for the distribution when p≥0.1, or the distribution is considered as non-power-law distribution. The values "LR" and "p" in 4-7 columns are used to compare the power law to alternative heavy-tailed distributions( "exp.", "the stretched exp.", "the stretched exp." and "the stretched exp.") via a likelihood ratio test. "LR" is the log-likelihood ratio versus the alternative, if the LR is positive then the distribution is power law, if the LR is negative then the data is favored to be alternative distributions. Moreover, the significance of LR depends on the responding value p, when p<0.1, the LR is significant, or LR is not reliable, i.e., LR cannot be used to test whether power-law distribution is favored over the other one. The column "exp." represents the exponential distribution; "the stretched exp." represents the stretched exponential distribution; "log-normal" represents the log-normal distribution; "power law + cutoff" represents the power law distribution with a cutoff; "status" represents the quantitative results supporting the power law distribution, and "expected distribution" represents the optimised degree distribution.
With regard to the statuses, "none" means that the degree distribution of the network was not in accordance with the power-law; "weak" means that the degree distribution fit the power law distribution well, but the alternatives are better; "moderate" means that the degree distribution fit the power law distribution very well, but the alternatives are also plausible; and "good" means that the degree distribution fit the power law distribution very well, and none of the alternatives are plausible. For example, the power-law value of group Fig.S16(the first row) is a plausible fit because the p value larger than 0.1, but the stretched exponential, log-normal and power-law + distributions are still plausible. Table Table Table Table S2. S2. S2. According to Table S2, for lower values of N, all of the networks that were expected to present the power-law distribution were given a satisfactory quantitative estimation. In group 1, the network was estimated as the power-law distribution, which is the delta distribution that is very similar to the power law distribution.

On On On On the the the the Fast Fast Fast Fast Algorithm Algorithm Algorithm Algorithm
The hill-climbing algorithm wastes time on the validation of possible solutions, for example, to assure that the solution networks are connected and the degree of every node is greater than xmin. Moreover, the edge-exchanged strategy focuses on the edges; thus, the size of the search space is approximately N 2 . Thus, a significant amount of time would be consumed under this strategy.
If the degree distribution is known, a new algorithm will be more efficient. This proposed new algorithm first initialises a network with a known degree distribution, assures that the network has a large average shortest path length, and then exchanges the nodes to optimise the objectives, until a satisfactory solution is obtained.
Here, we depict the pseudocode and introduce the network initialisation method and node exchange method.

� � � � Network Network Network Network Initialisation Initialisation Initialisation Initialisation Method Method Method Method
The fast algorithm needs to initialise the solution with large average shortest path length and the demanded degree distribution.
Because linear networks have a large average shortest path length, the satisfactory initial networks can be the linear networks with a specific degree distribution. Here we use the scale-free linear networks as an example.
First we generated the samples based on the power law. Second, every node was given xmin neighbors, and they were chained together to form a linear network. Finally, additional links were added into the linear network, and make the degree distribution was allowed to follow the power law distribution.
Because the initial network was based on a linear network, its average shortest path length was quite large. Because F2 was maximised, when the additional links were added, the hub nodes were allowed to link together.

� � � � Node Node Node Node Exchange Exchange Exchange Exchange Method Method Method Method
The node exchange method randomly exchanges the links of two nodes. If node A and node B exchange their edges, the minimal possible exchange number, i.e., the smaller number of the degrees of two nodes, can be calculated. We set m edges to exchange. Here m is smaller than the minimal possible exchange number. For every exchange, we let the neighbour of node A, which was identified by a certain edge move to the neighbour of node B, and correspondingly, let the neighbour of node B move to the neighbour of node A, except in neighbourhood that existed previously. In the following simulations, m was set to 1.
According to the node exchange method, the degree distribution will never change in the process of optimisation. Therefore, we need to know the degree distribution in advance.
Moreover, if the initial network is connected , the node exchange method will never lead to the disconnection of the solution network. Therefore, the proposed method saves time by avoiding validation.
We used the proposed fast algorithm to obtain the solutions when the degree distribution was scale-free. We performed six groups of experiments to explore the network structure when N=1500. Every group was carried out 10 times to check the robustness of the algorithm. The experimental parameters are listed in Table S3. Table Table Table Table S3 For each group, we chose the first solution network as an example, as shown in Fig. S30 -Fig.   S35.
-32 - For all the demonstrated networks in Fig. S30-Fig. S35, the scale-free pattern is tested statistically 7 , and the corresponding results are listed in Table S4. Table Table Table Table S Table S4, the scale-free properties were obvious for all of the demonstrated networks and the results were satisfactory.

On On On On Arbitrary Arbitrary Arbitrary Arbitrary Traits Traits Traits Traits and and and and their their their their Combinations Combinations Combinations Combinations
The optimisation algorithm used in the present study can be generalised to deal with arbitrary   We used the proposed fast algorithm to solve the optimisation problem defined by Equation (S26). Table Table Table Table S  For every group of parameters, we performed the algorithm ten times to check the robustness of the fast algorithm. The experimental results showed that the topology of the networks was similar. Therefore, we selected the first run, as shown in Fig. S36 -Fig, S39.   (S28) Therefore, the fractality or self-similarity described by Song et al. is a type of similarity over the box diameter, which can also be described as structural self-similarity over diameter.
Additionally, scale-free networks have a probabilistic similarity over varying degree, which we refer to as degree self-similarity.
As to the other measures, such as the clustering coefficient, different self-similarity can be defined. However, for a specific network, a new method such as the box-covering method described by Song et al. can be used to test the self-similarity of the generated fractal networks.
Because the average shortest path length is also a measure of the length, the fractality over the box diameter can be approximated when applying the box-covering method to the demonstrated networks, as shown in Fig. S44-Fig. S47.
Moreover, there are some fractal and high assortative networks in the real world, such as the Movie actors, and the explanation on fractality and assortativity are shown in the paper "A fractal and scale-free model of complex networks with hub attraction behaviors", arxiv: 1311.3087.
On On On On the the the the community-structure community-structure community-structure community-structure networks networks networks networks with with with with multiple multiple multiple multiple communities communities communities communities To satisfy community networks with multiple communities, similarity distance is termed as the distance in real world, such as geographical distance, interests and preferences. We formulate the similarity distance between two nodes i and j as gij, and gij=k when the two nodes belong to different categories, while gij=t for the two nodes in the same category. Here a and b are constants, and a>b. We only consider the simplest form that a and b are constants, though we can also define gij in a complex form. According to the definition of the similarity distance between nodes, the similarity distance of a network l can be defined as  As S1 can generate optimal scale-free networks with proper c, and Equation S32 is similar to S1, so it is easily to prove that the revised model S32 can generate optimal networks with community-structure and scale-free property.
In Fig. S49-S53, we present five community-structure and scale-free networks with multiple communities. In the five figures, the similarity distance between nodes is set with k=10 and t=1, the other parameters are shown in Table S6. Table Table Table Table S