Priority Attachment: a Comprehensive Mechanism for Generating Networks

We claim that networks are created according to the priority attachment mechanism. We introduce a simple model, which uses the priority attachment to generate both synthetic and close to empirical networks. Priority attachment is a mechanism, which generalizes previously proposed mechanisms, such as small world creation or preferential attachment, but we also observe its presence in a range of real-world networks. In this paper, we show that by using priority attachment we can generate networks of very diverse topologies, as well as re-create empirical ones. An additional advantage of the priority attachment mechanism is an easy interpretation of the latent processes of network formation. We substantiate our claims by performing numerical experiments on both synthetic and empirical networks. The two main contributions of the paper are: the development of the priority attachment mechanism, and the design of Priority Rank: a simple network generative model based on the priority attachment mechanism.


Introduction Motivation
Many generative models of network formation have been proposed in the scientific literature, 1 and some of them have gained significant notoriety, for instance the random network model of Erdös and Rényi, 2 the small world model of Watts and Strogatz, 3 the cumulative advantage model of de Solla Price, 4 the scale-free model of Albert and Barabási, 5 or the forest fire model of Leskovec. 6Each of these generative network models is based on some phenomenon which (as is often claimed) explains the underlying process of network formation.For instance, in the case of the small-world model, the alleged phenomenon is the tendency of many systems to form tightly connected groups (small worlds) with incidental connections between groups serving as long distance bridges.In the case of the preferential attachment model the phenomenon which purportedly fuels the network formation process is the strong preference of vertices to connect to already well-connected vertices.Some of the network generators do not attempt to model real-world processes directly, but rely on some mathematical formalism, like the sequence of Kronecker products applied to a small seed set of networks. 7Recently, an interesting proposal has been formulated to model complex networks using a stochastic sequence of predefined base actions 8 and turning the recreation of an empirical network into an optimization problem.
For a long time we have suspected that these individual phenomena are specialized instances of a more general mechanism of network creation.The main reason for this forefeeling was the fact that the generative network models seemed to be narrowly defined and each of them covered only a specific class of possible network topologies.A closer inspection of the generative network models further revealed that all of them were using, sometimes inadvertently, some type of prioritization when choosing target vertices during edge formation.The mechanism of prioritization using priority queues has a very long presence in almost all disciplines of science.][17] In the domain of complex networks, the idea of network growth based on rankings was introduced by Fortunato, Flammini, and Menczer.? Tey have shown that substituting vertex feature distributions with global rankings 1

of these features (both
There are four main reasons, which make the Priority Rank model valuable: • Priority Rank is a universal generative network model which can produce a very wide spectrum of network topologies, including the most popular network models.
• Priority Rank offers insights into the generative processes behind modeled networks.When machine learning algorithm is applied in order to find the most fitting distance function to be used in the priority attachment mechanism, in most cases the distance function is easily interpretable and provides explanations for the latent network formation process.
• Priority Rank allows us to generate multiple instances of networks with the same characteristics and distributions, because the model discovers the main generative process of network formation.It should be noted that this is incomparable with network sampling which oftentimes distorts the characteristics of sampled networks.Instead, the Priority Rank model allows to multiply networks for the purpose of A/B testing, statistical inference, simulations, etc.
• Priority Rank does not require any hyper-parameters to be set a priori such as edge creation probability in the random network model or edge rewiring probability in the small world network.This feature of the Priority Rank model is very important because, contrary to popular belief, generative network models are very sensitive to the initialization of these parameters.

Priority attachment
The idea behind the priority attachment mechanism is fairly simple.Consider a new vertex which joins a network.The primary issue is the selection of target vertices to which the new vertex creates edges.Previously, several mechanisms have been proposed to model this selection process.For instance, in the Erdös-Rényi model the vertex selects target vertices randomly using the uniform distribution.In the Albert-Barabási model the vertex selects target vertices with the probability proportional to the current degree of each vertex.According to the priority attachment mechanism each vertex has a local ranking which arranges all possible target vertices by their "importance" from the point of view of the new vertex.The new vertex selects target vertices with the probability proportional to their position in the local ranking.One should regard this local ranking as the priority queue which orders all of the vertices of the network from the point of view of a single vertex.This means that each vertex uses its own local ranking when creating edges.In other words, the main mechanism of network formation is the attachment of vertices driven by their individual perception of priority of other vertices, hence the name "priority attachment".The power of the priority attachment mechanism stems from the fact that local rankings can be computed by arbitrarily complex distance functions which can either model real-world phenomena, or model adjacency matrices of empirical networks.Interestingly, the topology of the generated network depends almost exclusively on the properties of the function D which is used to generate local rankings.Figure 1 presents four different networks, each consisting of n = 50 vertices, generated by the priority attachment mechanism.Random network is generated for D(v i , v j ) ∼ N(µ, σ ), i.e., when local rankings are random permutations of the set of vertices.Small-world network is generated for D(v i , v j ) = a(v i ) − a(v j ) for an attribute which value is randomly chosen from a uniform distribution, a(v) ∼ U(0, 1).In other words, local rankings arrange vertices by the distance defined by the attribute a, resulting in a strong preference for vertices in the local neighborhood.Preferential attachment network is generated when D(v i , v j ) = 1 deg(v j )+ε , i.e., when local rankings simply represent the global ranking of vertex degrees.Finally, cosine similarity network is generated for . ., a m i is a vector of numeric values.As can be seen in Figure 1, each of these networks has a different topology and different degree distribution.
The main advantage of these simple distance function definitions is the fact that they are easily interpretable.For instance, when modeling a network of disease spreading it is likely that disease vectors would infect vertices in their close physical proximity.Thus, a distance function based on the physical distance between vertices (which generates small-world structures) would be a sound choice for a simple model of disease spreading.In addition, if one would like to differentiate the probability of edge formation based on additional factors (e.g., the transfer of a sexually transmitted disease is more likely between vectors of similar age), the incorporation of this factor into the distance function would be trivial.Similarly, when trying to model semantic relationships between words embedded in multidimensional space (a standard tool in the contemporary NLP), it is reasonable to assume that the proximity of word embeddings is an indication of some semantic relatedness of the words.A simple way to model these relationships would be to use cosine distance to define the distance function D.
Let us now formally define the priority attachment mechanism and present the Priority Rank model.In this paper we will use the following notation.
• G = V, E is a network with the set of n vertices V = {v 1 , . . ., v n } and the set of edges . ., a m i , vertices are vectors of attributes, • D(v i , v j ) : V ×V → R is the generic distance function which computes the distance between vertices v i and v j , such that D(v i , v j ) > 0 ⇐⇒ i = j and D(v i , v i ) = 0. Distance function D does not have to be symmetrical.
is a permutation of V \ {v i } representing the local ranking of vertices for the vertex v i According to the priority attachment mechanism, the probability of selecting a vertex v j as the target vertex for an edge originating from the vertex v i is inversely proportional to the position of the vertex v j in the ranking of vertices for v i .The probability mass function of selecting the ith element of the ranking is given by where H n is the nth harmonic number, serving as the normalizing constant so that Equation 1 presents a proper probability mass function, i.e., ∑ n i=1 P(i) = 1.We will use Euler's formula to approximate the nth harmonic number as H n ≈ ln(n) + 1 2n + γ, where γ is the Euler-Mascheroni constant, γ = 0.57722.Algorithm 1 presents the pseudo-code for generating networks using the Priority Rank model.Procedure sample([1, . . ., m] , P) samples an integer from the range 1, . . ., m without replacement using the probability mass function from Equation 1.
In order to better illustrate the idea of priority attachment, let us consider an example of a simple network formation.Let us suppose that there are five people described by name, age, and sex.Let us also suppose that the distance function is defined as follows: Algorithm 1 Priority Rank generative network model Require: In other words, the social distance is defined in terms of the absolute difference of age, and the fact that two people share the same sex compensates for 10 years of age difference.In this example people tend to form relationships with other people of similar age, and given two people of the same age, there is a preference to form relationships with people of the same sex.This model could be applicable for instance to the process of self-selecting students to form pairs in a large study group where the participants have no prior acquaintances.Figure 2 presents one possible instance of the network formation process driven by the priority attachment, where each vertex creates k = 2 edges.Individual priority rankings for each vertex, along with the value of the distance function and the probability of creating an edge to a vertex (computed using Eq.1), are presented in Table 1.Edges created at a given step are marked with solid lines.
The process starts with Alice computing her distance to all other vertices.The most similar vertex to Alice is Eve and she occupies the first position in the local ranking for Alice.Analogously, the most dissimilar vertex to Alice is Bob and he is placed at the end of the ranking.Inserting ranking positions into Equation 1 yields the final probabilities of selecting vertices as target vertices for newly created edges.For the sake of simplicity we have assumed that for each vertex the first two most probable targets have been randomly chosen when forming the network.When two or more vertices are equi-distant from the given vertex, they receive the same position in the local ranking, which may contain gaps.As the result, probabilities of selecting vertices at certain positions of the local ranking may change (compare local rankings of Alice, Bob, and Cecil).

Recreating empirical networks
Given an empirical network G, we are interested in finding the distance function D such that this distance function, when used inside the Priority Rank model, generates a network which is "similar" to the empirical network G.The problem of defining a robust and flexible measure of network similarity has been studied for many years and several network similarity measures have been proposed in the literature. 20,21 owever, these measures tend to be computationally exhaustive and difficult to apply to really large networks.For this reason we have decided to use a simple and well-understood network similarity measure.In order to measure the degree of network similarity we compare the distributions of centrality measures using the Kolmogorov-Smirnov non-parametric two-sample test of the equality of continuous one-dimensional distributions.The KS test computes the maximal distance between cumulative distribution functions and provides rejection thresholds for the null hypothesis that the compared samples are drawn from the same distribution.The question remains, how to find the distance function D which produces networks that minimize the KS statistic for centrality measure distributions.Let δ (v i , v j ) be the set of shortest paths between vertices v i and v j in the network G, and let δ k (v i , v j ) be the set of shortest paths between vertices v i and v j which pass through the vertex v k .Finally, let ∆(v i , v j ) denote the length of the shortest path between vertices v i and v j .A centrality measure is a function C : V → R which assigns to each vertex a value representing the "importance" of the vertex in the network G. Four the most popular centrality measures include: 22 • degree centrality C D (v i ) = d(v i ) simply measures the number of vertices adjacent to the vertex v i .The assumption here is that a vertex is important if it is directly connected to many vertices in the network.
) measures the number of shortest paths between any pair of vertices which pass through the vertex v i .This interpretation of importance highlights the influence of a vertex on communication pathways through the network.
measures the average distance from the vertex v i to all other vertices in the network.According to this definition, a vertex is important if it can quickly communicate with all remaining vertices in n measures the importance of a vertex as a recursive sum of importances of vertices adjacent to v i .According to this definition, a vertex is important if it connects to other important vertices in the network.Sometimes it is possible to "guess" the distance function D given the description of the empirical network.Table 2 presents a list of simple distance functions that can be used when recreating empirical networks.Some of these distance functions are self-explanatory (like degree, betweenness, closeness, and page rank distances which are simply the expressions of preferential attachment to vertices with high values of these centrality measures).If the empirical network consists of vertices with attributes, euclidean distance can be used to generate local rankings of vertex priority (for the sake of simplicity we include only oneand two-dimensional euclidean distance).If vertices are described by numerical vectors, cosine distance can be used.The aggregate distance computes the distance on each pair of attribute values of compared vertices v i , v j , and then applies weights w k to distances computed on each attribute.
However, in most cases it is impossible to approximate the generative process of a network using a single, simple distance function.Given an empirical network, it is often precisely the aim of a researcher to deduce the guiding generative principle of a network.The main advantage of the Priority Rank model is its ability to derive the proper distance function D from the existing network, which, if applied to the model, would generate the network most similar to the original one.The learning task can be defined as follows.Consider a network G = V, E , and in particular, consider an edge (v i , v j ) ∈ E. For the Priority Rank model to re-generate this edge it is required that the distance D(v i , v j ) be minimized, and at the same time, the distance The network G provides the training data for a machine learning algorithm under the form of the adjacency matrix, which can be interpreted as a function: The positive cases in the training set consist of tuples (v 1 i , . . ., v m i , v 1 j , . . ., v m j ) for all pairs of vertices (v i , v j ) which are adjacent in G, and the negative cases consist of tuples for all pairs of vertices (v i , v k ) which are not adjacent in G.The training set can be fed into a classification algorithm, such as logistic regression, naive Bayes classifier, or SVM, to find patterns in vertex attribute co-occurrences which influence the probability of edge's presence or absence.The model resulting from a classification algorithm can be interpreted as a condensed representation of the underlying principle of network formation.The last two distance functions presented in Table 2 represent the learning procedure.The linear regression distance uses the least-squares method to fit the linear regression equation to the training set, and the naive Bayes classifier distance uses the well-known naive Bayes classifier to predict the probability of existence of an edge between two vertices.

Results
The main result reported in this paper is the development of the attachment priority mechanism.In this section we present the results of conducted experiments.The aim of these experiments was to verify if the priority attachment mechanism could explain the underlying network formation process.We have used four popular generative network models to produce synthetic networks, and we have collected 18 empirical networks from various domains to test the ability of the priority attachment to recreate these networks.The experimental protocol was as follows.Since, for a given network G, we cannot guess which distance function will be able to best reproduce G, we have applied all distance functions presented in Table 2.Then, for the best 3 distance functions we have run the generation process 20 times and we have aggregated the results.Of course, most of the distance functions require values of attributes describing vertices.These values were not always available, for instance, in the case of synthetic networks generated from theoretical network models.In these cases we have created synthetic attributes for each vertex, generating four attributes (one ordinal, one categorical, two continuous).These attributes were generated from four different distributions: the normal distribution, the uniform distribution, the log-normal distribution, and the exponential distribution.For empirical networks we have generated synthetic attributes only when no vertex attributes were present in the data, otherwise we have used only the real attributes of vertices.When comparing synthetic and empirical networks with networks generated by the Priority Rank model we have tested the conformity of centrality measure distributions using the Kolmogorov-Smirnov two sample test.Recall that the null hypothesis of the two sample KS test states that the compared samples are drawn from the same distribution.We reject the null hypothesis for p-values less than the significance level α = 0.05.In Tables 4 and 6 we mark the results which pass the KS test (i.e.instances where the null hypothesis holds) in boldface.

Synthetic networks
We have tested the ability of the Priority Rank model to recreate networks using both synthetic networks obtained from theoretical network models, and empirical networks representing various domains.Generative network models used in our experiments include the following: • Erdös-Rényi random model: 2 an empty network with n = 50 vertices is created, and then, for each pair of vertices an edge is formed with the probability p = 0.4.
• Watts-Strogatz small world model: 3 initially, n = 50 vertices are connected in a ring topology, with each vertex connecting to its k = 3 neighbors, and then, each edge is randomly rewired with the probability p = 0.01.
• Albert-Barabási scale free model: 5 initial topology of the network consists of n 0 vertices forming a complete graph K n 0 , remaining vertices are added to the network sequentially until the desired number n = 50 of vertices is reached, and each newly added vertex creates k = 3 edges to existing vertices, choosing target vertices with the probability proportional to their degrees, hence the alternative name of the model: preferential attachment model.The regular, linear preferential attachment is achieved for α = 1 .
• forest fire model: 6 vertices are added sequentially to the network, each out of n = 50 new vertex provides k edges to uniformly selected targets, and then adds more edges to direct neighbors of selected targets with the burning probability p = 0.  4.

Erd ös-R ényi random network
The Priority Rank model is able to recreate Erdös-Rényi random network very well.As can be seen in Table 4 the network generated by the Priority Rank model reproduces all three centrality distributions of degree, betweenness, and closeness, it also retains the reciprocity, the average shortest path length, and Freeman's centralization of degree distribution.The only network statistic which is not duplicated is the network diameter.As expected, random distance function works very well as in the Erdös-Rényi model vertices have no preference for other vertices and edges are created randomly.

Watts-Strogatz small world network
The initial topology of the ring is formed in one dimension.The results are less encouraging than in the case of the random network.None of the considered distance functions were able to recreate the degree distribution, although the euclidean distance function managed to generate networks with similar betweenness and closeness distributions (although the standard deviation of the KS statistic is quite large, which weakens the result).The euclidean distance function used two synthetic attributes generated from exponential and normal distributions.The failure to re-generate the degree distribution is caused by the fact that Priority Rank does not have anything akin to random edge re-wiring, which is essential for the small-world model.The ring structure can be very easily reproduced, but the presence of a few randomly re-wired edges (which reduce average path lengths in the network) is very difficult to mimic using any distance function.Apart from this weakness, the Priority Rank model generates networks which are very similar to the original small world network in terms of network diameter, density, and average shortest path lengths (these characteristics are recreated almost flawlessly).

Albert-Barab ási scale free network
Priority Rank successfully recreates degree and betweenness distributions, while struggling to preserve the distribution of the closeness centrality (the linear regression distance function barely manages to pass the KS test).The Priority Rank model also builds networks with larger diameters than the original network, but the density and the centralization of the degree distribution are very close to original values.We also note that the Priority Rank model generates networks in which average shortest paths are slightly longer than in the original Albert-Barabási model.Nevertheless we conclude that synthetic scale-free networks can be mimicked by the Priority Rank model sufficiently.

Leskovec forest fire network
The Priority Rank model can recreate forest fire networks very precisely, Table 4.The best results are obtained when using euclidean distance based on one attribute (with the value drawn randomly from the normal distribution), this distance function produces networks with similar centrality measure distributions (degree, betweenness, closeness), and with very similar diameter, density, average shortest path lengths and centralization of the degree distribution.
The Priority Rank model can easily mimick synthetic networks produced by popular network generative models.A simple substitution of the distance function allows the Priority Rank model to produce instances of random networks, small world networks, scale-free networks, and forest fire networks.In addition, one can easily introduce different variations of these generative models by modifying the distance function used to compute local priority rankings.This flexibility and ability to generalize multiple models is a unique feature of the Priority Rank model.

Empirical networks
As we have shown in the previous section, the Priority Rank model can recreate networks produced by popular generative network models.In addition, when provided with a custom distance function, the Priority Rank model can generate networks with topologies not available through traditional generative network models.However, the most interesting and valuable property of the Priority Rank model is its ability to learn the generative processes of empirical networks and to generate multiple instances of these networks.In this section we present the results of the experimental evaluation of this feature.Networks presented in this section are all available through The Colorado Index of Complex Networks 23 and The Network Repository. 24rief descriptions of these networks are presented in Table 5 and in the supplementary material.Metrics used to compare empirical networks and networks generated by the Priority Rank model are presented in Table 3. Empirical network statistics and the corresponding statistics of networks generated by the Priority Rank model are shown in Table 6.
• American bisons: The Priority Rank model correctly captures the underlying network structure and can recreate the network to a sufficient extent.All centrality measure distributions are retained in generated networks, and these networks exhibit densities, and average shortest path lengths similar to the original network.The Priority Rank model slightly underestimates the diameter and the centralization of the degree distribution.Since the original network resembles the Erdös-Rényi random network, we do not find it surprising that the random distance works well when recreating the network.Let us once again stress the importance of the above result.Since Priority Rank is capable of recreating the American bison network, this means that one could generate multiple instances of this network and hope that the generated instances reflect the same principle guiding the formation of the original network.These networks could be interpreted as observations of another herd of bisons, or observations of the same group during a different time period.Such multiple observations can be very useful.For instance, if one would like to simulate the transmission of a disease between animals living in the wild, network instances generated with Priority Rank could be used as alternative scenarios for the transmission of the disease.
• Bighorn sheep: The Priority Rank model very convincingly recreates the topology of the original network, generating networks which maintain distributions of all centrality measures, as well as the density and the average shortest path lengths.Generated networks have slightly smaller diameters than the original network, and the reciprocity is overestimated.This experiment, however, supports our claim that the Priority Rank model can discover the latent generative principle of an empirical network.The page rank distance aggregates the importance of all animals in the herd.Intuitively page rank reflects the true importance of animals given the history of dominance relationships.For the sake of brevity we report only on the best fitting of the distance function for each network, but during the experiments we have analyzed several different distance functions for each network.In the case of the bighorn sheep network very similar results were obtained for two distance functions which all made use of the age attribute.Naive Bayes classifier distance and linear regression distance functions are machine learning algorithms which map the relationship between two animals' ages onto the probability of the existence of the dominance relationship between these animals.These two distance functions were close competitors of the page rank distance function.Thus, the interpretation of these distance functions allows us to draw some conclusions as to what is the principle of this particular network formation.
• C.elegans: The Priority Rank model generates networks which are very similar in terms of degree and betweenness distributions to the C.elegans network, but we were unable to recreate a similar distribution of the closeness centrality measure (this could be due to a large number of vertices in the network).Also the distance, the density and the average path lengths are faithfully recreated with Priority Rank.One missing characteristic of the original network is the reciprocity.The Priority Rank model cannot capture this feature using the closeness distance function.We note, however, that adding reciprocity to the distance function is straightforward, it is sufficient to include a component that would diminish the estimated distance between vertices v i and v j if an edge (v j , v i ) is already present in the network.
• Mouse visual cortex: This small network is very difficult to recreate, most probably due to the fact that its generative process may be complex.Unfortunately, the source network does not contain any additional attributes and we have to rely only on topological features of the network.In this case, the best result is obtained for the page rank distance which creates networks similar to the scale free model.The Priority Rank model is able to produce networks with very similar degree and betweenness distributions.
• Enzyme 108: The Priority Rank model using euclidean distance can re-generate all centrality measure distributions and obtain very similar values of the network diameter, the density, and the average shortest paths.The euclidean distance measure uses two synthetic attributes drawn from the uniform distribution, a distance function which tends to produce networks similar to the small world model, and the enzyme 108 network definitely belongs to this family of networks.This small-worldliness of the enzyme 108 network is best manifested in the density and the average shortest path length (which are the largest among all analyzed networks).Interestingly, although the Priority Rank model had problems recreating the synthetic small-world network, it manages to approximate the real world example of a small-world network very faithfully (we will see the same behavior in the case of the Illinois high school network).This might indicate that the original small-world model proposed by Watts and Strogatz over-simplifies the reality and the priority attachment based on the similarity of attributes is a better representation of network formation phenomena.
• Cage5: The degree distance function reproduces distributions of degree and betweenness, but cannot recreate the original distribution of the closeness centrality.As for the topology of the generated network, the Priority Rank model generates very similar networks in terms of diameter, density, average shortest path length and degree centralization.Our model underestimates the reciprocity of the network, most probably due to the fact that the degree distance function alone cannot account for the increased probability of reciprocal relationships.
• Political books: This network is easily recreated with the Priority Rank model using the euclidean distance function based on a single discrete attribute (which reduces the function to a simple binary flag comparison).This result agrees very well with our intuition.One can interpret the synthetic discrete attribute as an indicator of a broad book category, and two books are purchased together if they belong to the same category.The Priority Rank model recreates degree and betweenness distributions, but fails to retain the original distribution of the closeness centrality.Also the diameter, the density, the average shortest path length, and the centralization of degree are very similar to the original network.The difference in the reciprocity estimation is the result of the fact that the original network is undirected, and the Priority Rank model produces inherently directed networks.This example illustrates well the ability of the Priority Rank model to discover the latent process of network formation and provide simple, interpretable explanations of the underlying network structure.
• Primary school: The main challenge that this network poses is the density of social interactions and very short average shortest path lengths.The Priority Rank model recreates this network reasonably well using the euclidean distance function on a single discrete attribute.Similarly to the political books network, a simple comparison of a discrete attribute is sufficient to produce a good approximation of the original network.Although the Priority Rank model cannot reproduce original distributions of betweenness or centrality, the degree distribution is very well preserved, and the remaining topological characteristics of the network are also retained (except for the reciprocity, for the same reasons as with political books network).The interpretation of the distance function is similar to the political books network, namely, the discrete attribute serves as a label of a coherent social group and students have a strong preference to create relationships within the social group to which they belong.
• Vickers 7th graders: The Priority Rank model recreates this network perfectly using the closeness distance (where the closeness of the vertex is computed on incoming edges).With the exception of the reciprocity (which is slightly underestimated), all the remaining network characteristics are recreated precisely.One reason for this result is the fact that multiple types of edges have been flattened to a single layer in the original network, making the guessing of edge existence a bit easier.Incoming closeness seems to be a very reasonable explanation for edge existence, popular students who have been nominated by many peers have indeed high value of inbound closeness centrality, so using this measure to • Freeman researchers: An interesting feature of this network is the fact that it represents communication patterns in the pre-internet era.As expected, the network is characterized by very high reciprocity and low diameter and average shortest path length.The best distance function uses betweenness centrality as the basis for computing ranking lists for the Priority Rank model.Our model not only re-generates identical distributions of centrality measures, but captures all the remaining network characteristics.Again, only the reciprocity has not been matched, due to the fact that the betweenness distance function is not capable of taking the existence of an edge into consideration.Since the network represents the flow of communication between people, it is not surprising that the best distance function uses the betweenness centrality, which ranks network vertices according to their impact on the communication pathways through the network.
We find this result to be a strong indication that the Priority Rank model really can provide a viable explanation for the phenomenon driving the network formation.
• 9/11 terrorists: Similarly to the network of researchers discussed above, the Priority Rank model recreates the 9/11 terrorists network precisely.Although the distribution of closeness centrality is not preserved, the remaining network characteristics are almost identical to the original network.The choice of the distance function is also not surprising since the network has been gathered post hoc in a way to outline the associations of a selected group of terrorists, and, as the result, those terrorists have prominent degrees in the network.Again, the interpretation of the distance function is straightforward and the result supports our claim about the explanatory power of the Priority Rank model.
• Karate club: The Priority Rank model is capable of recreating the structures of the original network, with the notable exception of the closeness distribution and the average shortest path length (which is overestimated).The best distance function is the closeness distance, which is a perfect explanation for this network.Recall that the network represents social contacts of the members of a university karate club, which has split in half due to a conflict between the two principal members, an administrator and an instructor.Almost all of the members of the original club have links to one of the leaders of the club.Thus, one may conclude that the members had a strong preference to prioritize social contacts to these leaders (who are the two vertices with the shortest average paths to the remaining vertices).
• Illinois high school: The most distinguishing feature of this network is its very large diameter and relatively high average shortest path length.Theoretically, this network tends to follow the small world network model of Watts and Strogatz.The Priority Rank model re-generates this network almost perfectly, retaining the distributions of centrality measures and all other network characteristics.As expected, the best fitting is obtained for the aggregate distance function.This function compares the values two attributes randomly drawn from the normal distribution.As the result, vertices with many similar attribute values tend to form communities (small worlds), with very few edges interconnecting these communities, hence the large network diameter.One may conclude that the evolution of these types of small world networks is primarily driven by the homogeneity of vertices constituting communities.We find it surprising that the Priority Rank model struggled to recreate the synthetic small world network, whereas it recreates real world examples of small world networks very convincingly.In our opinion this indicates that the assumptions behind the small world model of Watts and Strogatz are not supported by empirical networks.
• Marseille high school: Surprisingly, the Priority Rank model has struggled to reconstruct this network, not being able to recreate distributions of betweenness and closeness.However, despite the problem with retaining the original centrality measure distributions, the networks generated by the Priority Rank model are very similar w.r.t. the remaining network characteristics.The best distance function is the linear regression function, which indicates that the attributes of vertices (class, sex) are crucial for inferring the existence of a relationship.This is even more encouraging than previous examples where networks have been recreated using synthetic attributes.The linear regression distance function can be easily interpreted and it even provides quantitative estimations of the latent network generative process by the means of regression coefficients.
• CAG: The main focus of this experiment was to check if the Priority Rank model is capable of capturing the generative process of highly atypical network induced by integer matrices used in computations of characteristic polynomials.The Priority Rank model recreates degree and betweenness distributions, as well as most of the characteristics of the original network, with the exception of reciprocity.Since the best fitting is obtained for the degree distance, we are tempted to conclude that the CAG network is generated primarily by the preferential attachment mechanism.
• Power network: This dataset representing the structure of power stations connections is challenging due to its very low density, large diameter and lack of degree centralization.Despite this, the Priority Rank model successfully reproduces 13/16 the network, preserving the distributions of all centrality measures and all the remaining network characteristics.The best fitting is obtained for the random distance function, suggesting a haphazard setup of the network.
• Football: The Priority Rank model produces networks very similar to the original network, with slightly over-estimated average shortest path length and degree centralization (the former is probably caused by the fact that re-generated networks have larger diameters than the original network).As expected, the best fitting function is the degree distance function.Given the socio-economic reality of the process underlying the network formation (poorer countries are exporting best players to a few countries where very wealthy football clubs reside), we conclude that the Priority Rank model correctly identifies the main generative process for the network.Here we see that the Priority Rank model provides an interpretation of the latent network formation mechanism, even if the mechanism involves complex socio-economical processes.
• St.Marks ecosystem: In this experiment we were particularly interested in discovering the underlying generative process of network formation.We can see that the Priority Rank model re-generates this network very accurately, preserving distributions of degree and betweenness and producing very similar diameters, densities, and average shortest path lengths.The best fitting is obtained for betweenness distance, which is not surprising given the fact that the network focuses on modeling flows within the ecosystem.Again, we believe that the Priority Rank model discovers the main generative process of network formation.

Discussion
In this paper we have developed the priority attachment, a plausible principle of network formation.We have shown that priority attachment can generalize previously proposed mechanisms of network formation, such as the small world phenomenon or the preferential attachment.The second major contribution of this paper is the introduction of the Priority Rank model, a universal generative network model which utilizes the priority attachment principle.The Priority Rank model is capable of mimicking both synthetic networks produced by popular generative network models, as well as recreating empirical networks.The priority attachment mechanism requires a distance function to compute local rankings for each vertex.The interpretation of the distance function allows us to infer the properties of generated networks (in case of synthetic networks), but more importantly, it captures the nature of the generative process imprinting formation of empirical networks.In our opinion this is the most important feature of Priority Rank.The ability to discover the guiding principle of network formation facilitates generation of multiple similar but different realizations for a given network, resulting in the whole population of networks.The stochastic nature of the priority attachment mechanism provides a small degree of randomization required to introduce variance into such population of networks, while at the same time preserving distributions of centrality measures and network profile.Many different applications of the Priority Rank model can be enumerated.Apart from the obvious substitution of multiple generative network models by a single model, the Priority Rank model can be used for A/B testing of networks, for conducting statistical inference on networks (due to the presence of the population of networks where only one data point has been originally available), to simulate various scenarios of network formation and growth, and to predict network evolution given a model of network formation principle (under the form of the distance function).Using the Priority Rank model, series of networks with a given profile may be generated.In particular, the evolution of the network starting from a set of isolated nodes to a full network may be achieved.Priority Rank can also be used to model the evolution of temporal networks, where snapshots of the network can be used to learn the network generating procedure, and the Priority Rank model can predict future topology of the network.
We note that the Priority Rank model does not always recreate empirical networks fully.It seems that the closeness distribution is notoriously difficult to reproduce.Also, we note the lack of a mechanism which would allow to easily account for the reciprocity.Of course, this can be easily introduced into the model by modifying the priority attachment mechanism to put more weight to existing edges.Another possibility is to replace currently used distance functions (in particular, distance functions derived by applying machine learning algorithms to the adjacency matrix) by a single universal distance function discovery module.We believe that for larger networks, where adjacency matrices contain sufficient amount of information, the application of neural networks to form distance functions is a viable direction of future research.

(a ) Figure 1 .
Figure 1.Different network topologies and their degree distributions generated by the priority attachment mechanism

Figure 2 .
Figure 2. Priority attachment process v n , a set of vertices Require: k, the number of edges each vertex creates 1: for i = 1 to n do

Table 1 .
Priority attachment local rankings with distances and vertex selection probabilities A visualization of the Priority Rank model mechanics is available online at https://priorityattachment.ml.5/16

Table 3 .
symbol meaning p D p-value of the Kolmogorov-Smirnov test comparing the degree distributions of the two networks p B p-value of the Kolmogorov-Smirnov test comparing the betweenness distributions of the two networks p C p-value of the Kolmogorov-Smirnov test comparing the closeness distributions of the two networks d network diameter (the longest of the shortest paths in the network) ρ network density (the ratio of the number of existing edges to the maximum number of possible edges) Metrics used to compare networks

Table 4 .
Recreation of synthetic networks (in bold results that hold K-S test) Metrics used to compare synthetic networks and networks generated by the Priority Rank model are presented in Table3.Synthetic network statistics and the corresponding statistics of networks generated by the Priority Rank model are shown in Table 3, upon successful creation of a new edge the process continues recursively.The model produces networks of low density and relatively large diameter, but with average shortest path lengths significantly lower than in the case of small world networks.

Table 5 .
Empirical networks used in the experiments

12/16 generate
ranking lists in the Priority Rank model yields good approximations of the original network.Again, we see how the Priority Rank model provides simple and interpretable explanations of the latent network generation process.