On neighbourhood degree sequences of complex networks

Network topology is a fundamental aspect of network science that allows us to gather insights into the complicated relational architectures of the world we inhabit. We provide a first specific study of neighbourhood degree sequences in complex networks. We consider how to explicitly characterise important physical concepts such as similarity, heterogeneity and organization in these sequences, as well as updating the notion of hierarchical complexity to reflect previously unnoticed organizational principles. We also point out that neighbourhood degree sequences are related to a powerful subtree kernel for unlabeled graph classification. We study these newly defined sequence properties in a comprehensive array of graph models and over 200 real-world networks. We find that these indices are neither highly correlated with each other nor with classical network indices. Importantly, the sequences of a wide variety of real world networks are found to have greater similarity and organisation than is expected for networks of their given degree distributions. Notably, while biological, social and technological networks all showed consistently large neighbourhood similarity and organisation, hierarchical complexity was not a consistent feature of real world networks. Neighbourhood degree sequences are an interesting tool for describing unique and important characteristics of complex networks.

complexity was developed in the context of electroencephalogram functional connectivity, which, in contrast to ordered and random systems, was found to have inordinately high levels of heterogeneity amongst its neighbourhood degree sequences 8 . This concept has since been utilised to help understand how best to binarise EEG functional connectivity for topological analysis 12 and has been validated in structural MRI networks 13 . However, the prevalence of such topology amongst complex networks in general is unknown. In pure mathematics, Barrus & Donovan independently initiated study of neighbourhood degree lists as a topological invariant more refined than both the degree sequence and joint degree graph matrix 14 , while Nishimura & Subramanya proposed to study neighbourhood degree lists for the combinatorial problem of changing a graph into one with given neighbourhood degrees 15 .
That is as far as has been done with neighbourhood degree sequences to date. Yet, the intriguing insights provided by hierarchical complexity in brain networks makes a broader study of neighbourhood degree sequences across a broader range of domains worthwhile. This work comes after work done involving neighbouring degrees and centralities such as the eigenvector centrality, a centrality index which is larger depending on the centralities of the nodes a node is connected to 16 ; assortativity, an index of degree-degree correlation between connected nodes 17 ; and network entropy, a measure of edgewise node degree eccentricity 18 . Neighbourhood degree sequences, however, are a completely separate consideration of networks. Most notably, rather than comparing nodes which are connected to each other, we compare nodes which have the same degree, irrespective of whether they are connected or not, regarding such nodes as hierarchically equivalent within the network topology.
In this study a number of ways to analyse neighbourhood degree sequences are proposed. Notably, indices of node heterogeneity and neighbourhood similarity are introduced. We also consider a new notion of multi-orderedness in a network. This is based on the observation that nodes of a given degree in an ordered network may have several distinct neighbourhood degree sequences. This gives rise to another index defined as neighbourhood organisation which measures the extent to which such multi-orderedness is present in the network. We then show that the existence of multi-ordered degrees can artificially raise the network's hierarchical complexity. Thus, we utilise the formulation of neighbourhood organisation to provided a version of hierarchical complexity which corrects for multi-ordered degrees. We also described how neighbourhood degree sequences have clear links with powerful and efficient subtree kernels for graph classification. The proposed indices are then applied to a range of network models and compared with existing classical network indices, the aim of which is to ascertain to what extent these indices explain unique topological properties in complex networks. They are also applied to 215 real world networks from various disciplines of study in order to assess the characteristics of neighbourhood degree sequences in the world around us and the insights these new indices offer.

Neighbourhood Degree sequences
For k i the degree of node i, the neighbourhood degree sequence, where the k j i s are the degrees of the nodes to which i is connected and such that ≤ ≤ … ≤ k k k i i k i 1 2 i . For example, the graph in Fig. 1A has four degree 4 nodes (yellow) all with neighbourhood degree sequence {3, 3, 5, 8} and four How can we efficiently capture the organisation of this graph mathematically without reference to node placements on the plane? We can note that the neighbourhoods of nodes of a given degree are equivalent with respect to the degrees of nodes they connect to-e.g. all yellow nodes (degree 4) connect to the same number of green (degree 3), orange (degree 5) and red (degree 8) nodes. Thus neighbourhood degree sequences appear as a promising avenue. (B) Illustration of a multi-ordered degree graph whose equal-degree nodes are organised into two distinct classes with different degree sequences. (C) Illustration of a subtree of height 2 for node i in panel A. The number of nodes at height 1 is the degree of i, while for a node at height 1, its degree is the the number of nodes at height 2 extending from it, all captured by i's neighbourhood degree sequence, s i . Node heterogeneity. One way to characterise neighbourhood degree sequences would be to employ the same methods to characterise degree distributions and then average over all nodes. As a pertinent example of this, a common index of graph heterogeneity is the degree variance v = var(k) 19 . We can then define node heterogeneity, V n , as the average variance of neighbourhood degree sequences of a graph for all nodes of degree greater than 1: . . > Of course, it is then interesting to understand how average node heterogeneity compares to graph heterogeneity, i.e. comparing local and global heterogeneities of a graph. To do this we can simply divide (3) by v, giving ∑ = . .
High values of this measure tell us that nodes tend to be connected to nodes of homogeneous degrees, given the degree distribution, and low values tell us the opposite. Specifically, if this value is below 1, the degree variance within the neighbourhoods is on average less than the global degree variance, indicating that the nodes have more homogeneous neighbourhood degrees. It is worth highlighting the distinction between this and assortativity, which seeks to measure the similarity of degrees of connected nodes. Node heterogeneity is a measure of the similarity of the degrees of all neighbouring nodes, irrespective of the degree of the node itself.
Note that v is clearly minimal for regular graphs and is known to be maximal for quasi-star and quasi-complete graphs for any given number of nodes and edges 20 . On the other hand V n is zero for regular graphs but is also small for quasi-star and quasi-complete graphs. For instance, the star graph consists of one node connected to all other nodes and no other edges. Thus it has one n − 1 degree node with degree sequence {1, 1, …, 1} and n − 1 1 degree nodes with degree sequence {n − 1}. Clearly, these all have zero variance, giving V n = 0 for the star graph. This is interesting because, while some believe star graphs should have maximum heterogeneity 21 , V n points at a possible different view. The degree distribution of a star graph is just 1 node away from being completely regulartake the dominant node out and you have an empty graph (redundantly regular). Heterogeneity could perhaps be alternatively formulated in the sense that removing or adding nodes does not relegate the graph to being regular.
Neighbourhood similarity. The other way of characterising neighbourhood degree sequences we shall consider is to compare all neighbourhood degree sequences of equal length. Indeed, this is the perspective employed to formulate hierarchical complexity, looking at the element-wise variance of equal-length neighbourhood degree sequences. Another, fairly more simple characteristic can be posed by considering the number of nodes in the network whose neighbourhood degree sequence matches that of another node in the graph. We call this neighbourhood similarity (reflecting the concept of geometric similarity) and, using the Kronecker delta function δ(x, y) which is 1 if x = y and 0 otherwise, write i n j n i j 1 1 Notice, this uses the δ function twice. The first time is to find the number of matching neighbourhood degree sequences for node i. The second delta is used to determine if there are any matching sequences, i.e. seeing if the sum of the first δ s is different from 0. Since this is a negation (δ returns 0 if there are any matches), we then have to subtract the answer from 1 to provide the answer to whether any match exists for node i. Summing over all i and dividing by n provides the proportion of nodes which have at least one matching neighbourhood degree sequence. It is clear that 0 ≤ S ≤ 1 for all graphs, since it concerns a fraction of the network nodes. It certainly attains 1 for regular graphs. However, we prove the following result with respect to graph symmetry on the plane, establishing the link between neighbourhood similarity and graph symmetry. Proposition 1: Let G be a graph which can be arranged on the plane such that G has mirror or rotational symmetry whose axis does not pivot on any node. Then S(G) = 1.
Proof: Let s i be a neighbourhood degree sequence for general node i. Then the node, j, in the position symmetric to i with respect to the axis of symmetry has neighbourhood degree sequence s j and has the same degree as i. Further, each node in the neighbourhood of i, p i , also has a node in position symmetric to p i with respect to the axis of symmetry, p j , and these nodes are connected to j and such that k k p p i j = , by symmetry. Thus s i = s j and since s i was arbitrary and no nodes lie on the axis of symmetry itself, S(G) = 1, as required.
Thus, neighbourhood similarity of a graph is indeed related to the planar symmetry of a graph. That being said, the opposite is not true-not all values S(G) = 1 are attained by planar symmetric graphs, as can be quickly seen by regarding non-symmetric regular graphs such as the Frucht graph 22 .
Hierarchical complexity: oversights of multi-ordered degree graphs. Hierarchical complexity is an index developed with the aim to be low for all highly ordered graphs and graphs with simple generative www.nature.com/scientificreports www.nature.com/scientificreports/ mechanisms. Simple in the sense that one needs only a few rules to compute the graph such as in random graphs (edges exist with uniformly random probabilities) or random geometric graphs (nodes are randomly sampled on a n-D Euclidean space and then connected based on distances in the space). In this sense, one can describe precisely how one can expect the graph and subsamples of the graph to behave. On the other hand, attempts to model real world networks indicates that a larger and more a complicated set of rules would be required to generate complex network-like topologies where subsamples of the graph (such as node neighbourhoods) would be less likely to show similar behaviours 13 . The hypothesis is that nodes of a given degree in highly ordered graphs play equivalent roles in the topology, which implies that they have the same or similar neighbourhood degree sequences. However, what fails to be taken account of in its formulation is the possibility to have a high degree of order in which nodes of a given degree can be split into different groups of identical sequences. For example, Fig. 1B shows a graph with degree 1 and 6 nodes. The six-degree nodes fall into one of two sequences {1, 1, 6, 6, 6, 6} and {6, 6, 6, 6, 6, 6}, as illustrated by the green and orange nodes, respectively. One-degree nodes are connected to either one-or six-degree nodes, as illustrated by the grey and yellow nodes, respectively. We call such a graph here a multi-ordered degree graph.

Definition 1: Let q p be the number of all p-length neighbourhood degree sequences and
Neighbourhood organisation. We can pose a measure for this sense of multi-ordered degrees using neighbourhood degree sequences. We could simply divide the number of unique p-length sequences by the total number of p-length sequences, giving p p σ | | however this is the same no matter how many unique degree sequences occur more than once. Consider the following. Let c pj denote the number of neighbourhood degree sequences of length p in G that have equivalency to s j ∈ σ p . Then, for example, take q p = 5 and |σ p | = 3. We could have c p1 = 1, c p2 = 1 and c p3 = 3 or c p1 = 1, c p2 = 2 and c p3 = 2. Both of these options would have the same value of (5), yet the latter has better qualities of being multiply ordered than the former since there are two distinct sequences which occur more than once, rather than just the one in the former case. We can offset (5) by considering the differences between the number of p-length sequences, q p , and the number of occurrences of each (unique) neighbourhood degree sequence in σ p . Then This is maximal, q p (q p − 1), when all p-length neighbourhood degree sequences are unique and zero (i.e. minimal) when all p-length neighbourhood degree sequences are equal. We can thus normalise this term as Just taking (6) would also not reflect the multi-order requirement. It is really the combination of (5) and (6) that is required to realise a measure of multi-ordered degrees-elements of σ p should occur frequently and at the same time the number of unique sequences should be as large as possible. Combining (5) and (6), then, we get Taking the mean of this over all degrees and subtracting from 1, we have the neighbourhood organisation coefficient Updated hierarchical complexity. Given the above consideration of multi-ordered degrees and the neighbourhood organisation index, we can formulate an update to hierarchical complexity that takes into account multi-ordered degrees. In the terminology of this paper, hierarchical complexity can be written www.nature.com/scientificreports www.nature.com/scientificreports/ where p  is the set of nodes of degree p and μ p (j) is the mean of the j th entries of all p length neighbourhood degree sequences.
To correct for multi-ordered degrees in this index, we can implement the term ω p inside the first summand in to give When ω p is small, multi-orderedness is present in the p degree nodes and thus the value of hierarchical complexity for these degrees is suppressed and vice versa. Computing this for the example in Fig. 1A we obtain R Ω = 0.0029-a 65 fold decrease from R and a more reasonable expected value of neighbourhood degree sequence diversity.
Link to the graph isomorphism problem. The Weisfeiler-Lehman graph isomorphism test 23 is a powerful method for distinguishing labelled graph topologies which holds for almost all graphs 24 . Based on this test, subtree kernels have been produced for assessing graph similarity in machine learning approaches which are highly efficient compared to other successful kernels 25 . Indeed, these subtree kernels have been shown to outperform the competition when implemented into a graph neural network approach while mapping similar graph topologies to similar embeddings in a low-dimensional space 26 .
The subtree of node i of height h constructs a tree rooted at i which extends out to i's neighbours and then out again to i's neighbours' neighbours and so on for h steps, see Fig. 1C. The kernel is a reduction of these subtrees to identifying labels which are then compared between two graphs to check their similarity. Subtrees of height h = 2 or 3 have been shown to achieve best performance in most cases 25 .
The link to neighbourhood degree sequences then can be established by realising that the information in a subtree of height 2 in an unlabelled graph is completely captured by the node's neighbourhood degree sequence. The length of the neighbourhood degree sequence tells us how many nodes are at height 1 of its subtree kernel (i.e. the degree of the node), while the entries of the sequence tell us how many nodes at height 2 are linked to each node at height 1 (the degrees of each neighbouring node).

Methods
Real-world networks. Thirty networks were obtained from the network repository 27  Biological networks. The macaque cortex network freely available from the BCT was used 33 . This comes as a binary, directed network. To make this undirected we simply took all connections as undirected connections to signify whether or not any connection exists between two regions. We also look at the undirected c. elegans metabolic network 34 ; bioGRID protein networks of the fruitfly, mouse and a plant; a yeast protein interaction network 35 ; and a mouse brain network 36 .
Ecological networks. The everglades, florida and mangwet ecosystems networks 37 .
Economic networks. The global city network is a network of economic ties between cities 38 . This is a weighted network which was binarised at 20% density (20% of largest weights kept) for our analysis. We also used the beacxc and beaflw economic networks. Technological networks. A router network.
In addition, we study a benchmark dataset of 406 real world networks used in 47 from the Colorado Index of Complex Networks 48 . This includes 186 static networks of which just 3 overlap with the above (dolphin social network, Macaque cortex and the uni email network). It also includes two temporal networks relating to the same data of organisation affiliations each with 111 samples taken monthly from May 2002 until August 2011 49 . The first of these is a network of organisation co-affiliations of directors while the other is a network of co-directorship among organisations.

Models. Configuration models.
Random graphs with fixed degree distributions 7 were generated using a freely available algorithm in the Brain Connectivity Toolbox 33 . Fifty randomisations were computed for each real world network. Degree variance. The degree variance, v = var(k), is a measure of network heterogeneity 19 . Here we use the normalised version 50 .
Characteristic path length. The characteristic path length, L, is the average of the shortest paths existing between all pairs of nodes in the network. It is known as a measure of network integration.
Assortativity. Assortativity, r, is a correlation of the degrees of nodes which are connected in the network. It is positive if similar degree nodes are generally connected to one another, negative if similar degree nodes are generally not connected to one another and zero if there is no pattern of correlation 17 .
Modularity. Modularity, Q, measures the propensity of nodes to form into highly connected communities which are less connected to the rest of the network 51 .

experiments
The supplementary material contains results of indices of a variety of different models-random graphs 3 , random geometric graphs 4 , small-world models 5 , scale-free models 52 and random hierarchy models 8 . The main article shall focus on experiments using the most relevant data of all-over 200 real world networks.

Index correlations. Spearman correlations were computed between the proposed indices alongside classical
network indices across all real networks, Fig. 2. We used Spearman's correlation since the values clearly did not follow a normal distribution (i.e. Pearson's correlation would not have been valid). The red box contains all correlations between neighbourhood degree sequence indices and classical network indices. It is clear that there are no observable high correlations between proposed indices and classical indices, providing strong evidence that indeed these new indices explain previously unrealised properties of network topology. Unsurprisingly, R and R Ω were highly correlated, although the correlation between Ω and R Ω was only low to moderate. But the fact there were no strong correlations other than between R and R Ω (>0.8) suggests there is a rich amount of information to be obtained from neighbourhood degree sequences.
On the other hand, among classical network indices, strong correlations were found to exist between the L, V and Q, indicating that these indices all pointed mostly towards a single topological property of the networks. We suggest that this property is likely to be about the dominance of hub nodes, since these nodes are those which enable general short path lengths, while Newman's modularity is known to be confounded by hubs 53 .
Although high correlations which are above the standard of 0.8 have been highlighted, there are notable moderate correlations between L and S (0.6477), Q and S (0.6419) and V and R (0.6274). However, the average correlation across all metric pairs has a magnitude of 0.4283, which would be regarded as a low-to moderate correlation. We then have to expect that measurements of a network will likely have some degree of correlation simply due to the fact that they are enacted in measuring the same topologies and since complex networks tend to show broadly consistent features in comparison with random null models. Nonetheless, the standard deviation of the metric Figure 2. Absolute values of index correlations (Spearman's correlation coefficient) for combined values of small-world, scale-free, random, random geometric, and random hierarchy network models and the thirty real world networks considered (a total of 150 samples). Within the red square are the correlations between neighbourhood degree indices S-neighbourhood similarity, V n -node heterogeneity, Ω-neighbourhood organisation, R-hierarchical complexity, R Ω -hierarchical complexity corrected for multi-ordered degrees, and classical network indices C-transitivity, v-degree variance, L-characteristic path length, r-assortativity, and Q-modularity.
www.nature.com/scientificreports www.nature.com/scientificreports/ correlation magnitudes is 0.2269, putting one standard deviation above the mean at 0.6552 of which none of the moderate correlations previously mentioned lie above. Thus, although in usual terms these are moderate correlations, with respect to complex network metrics they appear to be within reasonable limits to suggest they broadly measure different network properties.
It is also worth recalling that correlation does not mean causation. This means that the general tendency of complex networks to exhibit correlated metrics does not necessarily mean they are measuring the same or similar property in the network, as it may be that networks which have greater modularity have greater characteristic path lengths by virtue of an underlying joint causation.
Characteristics of real-world networks. All proposed indices were applied to the thirty real-world networks of the Network Repository and the 181 non-overlapping static networks of the ICON, alongside median values taken over the two temporal networks. In addition, ten realisations of configuration models with fixed degree distributions were generated for each real-world network and we compared the neighbourhood indices of the real networks with the average values obtained from configuration models. The results are described for each Network Repository network in Table 1. Scatter plots of all real network values against configuration model values are show in Fig. 3.
Although all indices found significant differences between real networks and configuration models, Table 2, first row, the greatest general differences found were in neighbourhood similarity, p = 4.74 × 10 −28 with a paired ranked effect size of 0.5320, and in neighbourhood organisation, p = 1.66 × 10 −23 with a paired ranked effect size of 0.4841. This was clearly observed in Fig. 3, first and centre plots, respectively. On the other hand, hierarchical complexity was only weakly greater in real networks than their configuration models. This was even less convincing when we took account of multi-orderedness, increasing the p-value to just below 0.05. This is interesting in light of the work done on hierarchical complexity of the human brain function and structure. Hierarchical www.nature.com/scientificreports www.nature.com/scientificreports/ complexity was not a consistent feature of real world networks and can thus be conjectured as a special feature of brain networks, where a great diversity of functional roles is present 13 .
Tentatively, hierarchical complexity also appears to be a strong property of ecological networks. We only studied three such networks here, but all had substantially higher hierarchical complexity than expected for their degree distributions, while other characteristics are not notably different from the expected values, Table 1.
We then looked at neighbourhood degree sequence properties among different network classes. We applied Wilcoxon sign rank tests, as before, but this time restricted to classes and subclasses of networks, see 47 for more details. Results are shown in Fig. 4. Greater neighbourhood organisation and similarity were found consistently among all classes with a high enough statistical power. On the other hand, technological networks, including digital circuit networks failed to find any difference in neighbourhood heterogeneity between real networks and their configuration models, suggesting a general topological difference between technological networks and biological and social networks, particularly. Interestingly, technological networks (including digital circuit networks) were found to have less hierarchical complexity than their configuration models. We expect that this is to do with a higher degree of order present in digital circuit networks, where different components connect in limited ways, constricted by the logical ordering of electronics. It was also very noticeable that the difference of hierarchical complexity in biological and social networks dropped away when updating for multi-orderedness, suggesting that   www.nature.com/scientificreports www.nature.com/scientificreports/ multi-orderedness is a distinct feature of biological and social networks. In biological networks, this appeared to be driven by protein networks, where food webs and connectomes were not found to be more hierarchically complex than configuration models even from the original definition. The fact that connectomes of animals (3 cat, 5 primate, 2 macaque, 2 nematode, 2 visual cortical neuron level networks in human) were not found to have a general property of hierarchical complexity again suggests the specialness of this feature in the macro-level human brain particularly 13 and hints towards possible links with intelligence.
Neighbourhood organisation in Norwegian director co-affiliation temporal networks. In a specific example of revealing new insights into networks using these methods, we undertook an analysis of the two temporal networks included in the ICON corpus. These were monthly sampled social networks of Norwegian company directors, where edges between directors appeared where the two were affiliated with at least one company, and concurrently sampled Norwegian company networks where edges existed where those companies shared a director 49 . Both spanned the same time period from May 2002 to August 2011 and the significance of the data was that, during this time period, legislation was passed to ensure proportional representations of women in directorships to counteract structural inequalities 49 . From an organisational standpoint, it stands to reason that this may cause a fairly dramatic disruption to these networks. Figure 5 shows neighbourhood organisation over time for both networks alongside that of their configuration models constructed at each time point.
It is striking that while the company network maintained similar levels of neighbourhood organisation throughout the period, the neighbourhood organisation of the director network steadily decreased throughout the period from roughly 0.8 down to around 0.4 (coinciding with company network levels) by mid 2008 where it stayed until the end of the sampling. No particular trends were notice in either of the configuration models. Looking more closely at the director network trend, it was apparent that the decrease in neighbourhood organisation appeared almost stepwise in two year cycles with steps down around May 2004, 2006 and 2008. This validates the hypothesis that the overhaul in directorships in a short space of time contributed to a substantial disruption to the neighbourhood organisation of the network. Although it is beyond the scope of this study, it would be of interest to seek out explanations for this trend as well as possible correlations with this phenomenon and other factors.

Limitations and Future Work
There is significant scope to extend and improve on these proposed methods. A lot of the methods developed here depend on comparing nodes of the same degree, however it would be of great relevance to have this property more relaxed so that comparisons can be done across nodes of similar but not necessarily identical degrees. This is particularly the case for real-world and configuration models where the greater spontaneity of connections means that nodes which exhibit similar properties may differ in degree by one or two connections. Furthermore, this may help to create more reliable indices with less variability within populations.
We demonstrated a link between neighbourhood degree sequences and Weisfeiler-Lehman graph subtree kernels 25 which provide powerful graph learning results 26 based on long-standing graph isomorphism results 23 . It would be of high interest to undertake a detailed study of the relevance of the neighbourhood degree sequence analyses for interpreting the embedding space of these graph classification approaches as network phenomena. At the same time, this link hints that analysing the diversity and structure of neighbourhood degree sequences within a network-such as hierarchical complexity and neighbourhood organisation-is indeed a very powerful and efficient way to describe the topological similarity within a network. Further detailed work is required to substantiate this conjecture.

Conclusion
We introduced several methods to understand complex networks through neighbourhood degree sequences. These targeted key concepts such as similarity and symmetry, organisation, complexity and heterogeneity. The developed network indices were not found to be strongly correlated with each other nor with classical network indices over 215 real world networks, indicating that neighbourhood degree sequences offer a rich and unique branch of analysis. We found that neighbourhood similarity and neighbourhood organisation were consistent general characteristics of complex networks. Evidence suggested that the hierarchical complexity evident in the human brain was not a general property of animal connectomes. Also, neighbourhood organisation was found