In physical, biological, technological and social systems, interactions between units give rise to intricate networks. These—typically non-trivial—structures, in turn, critically affect the dynamics and properties of the system. The focus of most current research on complex networks is, still, on global network properties. A caveat of this approach is that the relevance of global properties hinges on the premise that networks are homogeneous, whereas most real-world networks have a markedly modular structure. Here, we report that networks with different functions, including the Internet, metabolic, air transportation and protein interaction networks, have distinct patterns of connections among nodes with different roles, and that, as a consequence, complex networks can be classified into two distinct functional classes on the basis of their link type frequency. Importantly, we demonstrate that these structural features cannot be captured by means of often studied global properties.
The structure of complex networks1,2 is typically characterized in terms of global properties, such as the average shortest path length between nodes3, the clustering coefficient3, the assortativity4 and other measures of degree–degree correlations5,6, and, especially, the degree distribution7,8. However, these global quantities are truly informative only when one of two strict conditions is fulfilled: (1) the network lacks a modular structure9,10,11,12,13,14, or (2) the network has a modular structure but (2.1) all modules were formed according to the same mechanisms, and therefore have similar properties, and (2.2) the interface between modules is statistically similar to the bulk of the modules, except for the density of links. If neither of these two conditions is fulfilled, then any theory proposed to explain, for example, a scale-free degree distribution needs to take into account the modular structure of the network.
To our knowledge, no real-world network has been shown to fulfil either of the two conditions above; this implies that global properties may sometimes fail to provide insight into the mechanisms responsible for the formation or growth of these networks. Alternative approaches that take into consideration the modular structure of real-world complex networks are therefore necessary. One such approach is to group nodes into a small number of roles, according to their pattern of intra- and inter-module connections11,12,13. Recently, we demonstrated that the role of a node conveys significant information about the importance of the node, and about the evolutionary pressures acting on it11,13. Here, we demonstrate that modular networks can be classified into distinct functional classes according to the patterns of role-to-role connections, and that the definition of link types can help us understand the function and properties of a particular class of networks.
Modularity of complex networks
We analyse four different types of real-world networks—metabolic networks11,15,16, protein interactomes17,18,19,20, global and regional air transportation networks13,21,22 and the Internet at the autonomous system (AS) level5,23 (Table 1 and Supplementary Information). To determine and quantify the modular structure of these networks, we use simulated annealing24 to find the optimal partition of the network into modules11,12,25 (see the Methods section). We then assess the significance of the modular structure of each network by comparing it with a randomization of the same network25. We find that all networks studied have a significant modular structure (Table 1). Modules correspond to functional units in biological networks11,20 and to geo-political units in air transportation networks13 and, probably, in the Internet26.
To assess whether global average properties are appropriate to describe the structure of these networks, we compare global average properties of the networks with the corresponding module-specific averages; specifically, we focus on the degree, the clustering coefficient and the normalized clustering coefficient. We find that the average degree of the network is not representative of individual-module average degrees for air transportation networks (Table 2). Most importantly, the global clustering coefficient is not representative of individual-module clustering coefficients for any network (except, maybe, for one out of 18 metabolic networks).
Role-based description of complex networks
As an alternative to the average description approach, we determine the role of each node according to two properties11,12 (see the Methods section): the relative within-module degree z, which quantifies how well connected a node is to other nodes in their module, and the participation coefficient P, which quantifies to what extent the node connects to different modules. We classify as non-hubs those nodes that have a low within-module degree (z<2.5). Depending on the fraction of connections they have to other modules, non-hubs are further subdivided into11,12: (R1) ultra-peripheral nodes, that is, nodes with all their links within their own module; (R2) peripheral nodes, that is, nodes with most links within their module; (R3) satellite connectors, that is, nodes with a high fraction of their links to other modules and (R4) kinless nodes, that is, nodes with links homogeneously distributed among all modules. We classify as hubs those nodes that have a high within-module degree (z≥2.5). Similar to non-hubs, hubs are divided according to their participation coefficient into: (R5) provincial hubs, that is, hubs with the vast majority of links within their module; (R6) connector hubs, that is, hubs with many links to most of the other modules and (R7) global hubs, that is, hubs with links homogeneously distributed among all modules.
Although the full rationale for this particular definition of the roles has been given elsewhere12, it is important to highlight a few properties of our classification scheme. Nodes in real and model networks, especially non-hubs, do not fill uniformly the zP-plane; our role classification scheme arises from the fact that nodes tend to congregate into a small number of densely populated regions of this space, with boundaries between these regions having a low density of nodes. In addition, especially for hubs, boundaries coincide with well-defined connectivity patterns; for example, nodes at the boundary between connector hubs (R6) and global hubs (R7) would have approximately half of their links in one module, and the other half perfectly spread in other modules. Importantly, other definitions of the roles do not alter the results we report below (see the Supplementary Information).
We investigate how our definition of roles relates to global network properties, and to what extent global network properties are representative of nodes with different roles. As some simple properties such as the degree and the clustering coefficient trivially depend on a node’s role, we focus on degree–degree correlations4,5,6,19,27,28. Specifically, we address two questions: (1) whether nodes with the same degree but different roles have the same or different correlations and (2) to what extent the observed degree–degree correlations are a by-product of the modular structure of the network.
To answer these questions, we start by considering the Internet at the AS level (Fig. 1). Nodes with degree k=3 can be either ultra-peripheral (R1, if they have all connections in the same module), peripheral (R2, if they have two connections in one module and one in another) or satellite connectors (R3, if the three connections are to different modules). A separate analysis for each role reveals that the average degree knn(k) of the neighbours of a node5 with degree k=3 strongly depends on the role of the node. For an instance of the 1998 Internet, for example, knn(k=3)=43±8 for ultra-peripheral nodes, knn(k=3)=196±12 for peripheral nodes and knn(k=3)=290±20 for satellite connectors. We observe a dependence of knn on the nodes’ role for all the networks studied here (Fig. 1a–d).
Regarding the second question, initial research showed5 that for the Internet at the AS level knn(k)∝k−0.5. It was later pointed out27,28 that any network with the same degree distribution as the Internet should exhibit a similar scaling. In other words, the degree distribution of the network is responsible for most of the observed correlations. However, the degree distribution alone does not account for all the observed correlations28 (Fig. 1e). In contrast, the modular structure of the network does account for most of the remaining degree–degree correlations observed in the topology of the Internet (Fig. 1i). Similarly, the modular structure accounts for the degree–degree correlations in metabolic networks and the air transportation network, and for most of the correlations in protein interaction networks (Fig. 1i–l).
Role-to-role connectivity profiles
The findings we reported so far suggest that, once the degree distribution and the modular structure are fixed, real networks have no additional internal structure. This, however, contradicts our intuition that networks with different growth mechanisms and functional needs should have distinct connection patterns between nodes playing different roles. To investigate this possibility, we systematically analyse how nodes connect to one another depending on their roles.
For each network, we calculate the number rij of links between nodes belonging to roles i and j, and compare this number to the number of such links in a properly randomized network (see the Methods section). As in previous work19,28,29,30, we use the z-score to obtain a profile a of over- and under-representation of link types (Fig. 2), which enables us to compare different networks. We quantify the overall similarity between two profiles, a and b, by the scalar product between these profiles (see the Methods section). In Fig. 2, we show that networks of the same type have highly correlated profiles, whereas networks of different types have weaker correlations and, at times, even strong anti-correlations (Fig. 2c).
The networks considered fall into two main classes, one comprising metabolic and air transportation networks, and another comprising protein interactomes and the Internet. The main difference between the two groups is the pattern of links between: (1) ultra-peripheral nodes (links of type R1–R1) and (2) connector hubs and other hubs (links of types R5–R6 and R6–R6). These link types are over-represented for networks in the first class (except links of type R6–R6 in metabolic networks), and under-represented for networks in the second class.
We denote the first class as the stringy-periphery class (Fig. 3a,b). In networks of this class, ultra-peripheral nodes are more connected to one another than would be expected from chance, which results in long ‘chains’ of ultra-peripheral nodes. In metabolic networks, these chains correspond to loop-less pathways that, for example, degrade a complex metabolite into simpler molecules. In the air transportation network, owing to the higher overall connectivity of the network, chains contain short loops and resemble ‘braids’. Stringy-periphery networks also have a core of hubs, which we call the hub oligarchy, that are directly reachable from one another (links of type R5–R6 in metabolic and air transportation networks, and R6–R6 in air transportation networks). Moreover, connector hubs are less connected to ultra-peripheral nodes (R1) than expected by chance alone.
We denote the second class as the multi-star class (Fig. 3c,d). The multi-star class comprises the protein interactomes and the Internet, and has the opposite signature to the stringy-periphery class. Links of type R1–R1 (between ultra-peripheral nodes) are under-represented, whereas links of type R1–R5 (between ultra-peripheral nodes and provincial hubs) are over-represented, giving rise to modules with indirectly connected ‘star-like’ structures. Similarly, connector hubs are less connected to one another than would be expected, which means that these networks depend on satellite connectors to bridge connector hubs and modules.
Our findings confirm and clarify previous results in the literature. For example, the under-representation of R6–R6 links in protein interactomes is consistent with previous results suggesting a tendency for hubs to ‘repel’ each other in these networks6,19. Similarly, the role-to-role connectivity profile of the Internet is consistent with the existence of a hierarchy of types of nodes28. This hierarchy comprises end users, regional providers and global providers, which we hypothesize correspond to roles R1–R2, R5 and R6 respectively. The role-to-role connectivity profiles are consistent with a scenario in which end users connect mostly to regional providers, and in which global providers connect with each other indirectly through satellite connectors (R3), with few connections but probably large bandwidth.
By considering the modular structure of the networks and the extra dimension introduced by the participation coefficient, however, our approach provides novel insights into the relationship between structure and function in complex networks. For example, by considering the absolute degree alone, nodes with roles R5 and R6 in protein interactomes are indistinguishable from each other: in Saccharomyces cerevisiae, 〈k〉R5=14.0±1.7 and 〈k〉R6=17.1±1.9, whereas the average degree for the whole network is 〈k〉=2.67±0.09. Still, links R5–R5 between provincial hubs, unlike R6–R6 links, are not under-represented. In general, the different connection patterns of R5 and R6 (or R1 and R2) proteins enables us to hypothesize that they play distinct biological roles, with R6 proteins probably being much more important31.
A closer look at the air transportation network also helps to show that important structural properties may be left unexplained by focusing on degree alone, as well as to stress the importance of the relative within-module degree as opposed to the degree. Johannesburg, in South Africa, has degree k=84, which is 23% smaller than the degree of Cincinnati in the US, k=109. Still, it is possible to fly from most capitals in the world to Johannesburg but not to Cincinnati. There are two main reasons for this. First, although Johannesburg is the most connected city in its region (sub-Saharan Africa), Cincinnati (North America) is not; this effect is captured by the within-module relative degree, which is 9.3 for Johannesburg and 4.3 for Cincinnati. Second, Johannesburg has many connections to other regions, whereas Cincinnati does not; this effect is captured by the participation coefficient, which is 0.52 for Johannesburg and 0.05 for Cincinnati. As a result, Johannesburg is a global hub (R6) in our classification, whereas Cincinnati is a provincial hub (R5). Thus, it can be understood why R6–R6 connections are over-represented in air transportation networks (most global hubs are connected to one another), whereas R5–R5 are not (most provincial hubs are poorly connected to provincial hubs in other regions). In general, our approach shows why the behaviour of R5 and R6 nodes is so different in air transportation networks, which cannot be understood from the degree of the nodes alone.
We have shown that global properties that do not take into account the modular organization of the network may sometimes fail to capture potentially important structural features; although all networks (except, maybe, the protein interactomes) show no degree–degree correlations when compared with the appropriate ensemble of random networks, they all have clearly distinctive properties in terms of how nodes with certain roles are connected to each other. Our results thus call attention to the need to develop new approaches that will enable us to better understand the structure and evolution of real-world complex networks.
In addition, our findings demonstrate that networks with the same functional needs and growth mechanisms have similar patterns of connections between nodes with different roles. Attempts to divide complex networks into ‘classes’ or ‘families’ have been made before, for example, in terms of the degree distribution8 and in terms of the relative abundance of certain subgraphs or motifs29,30. Our work here complements those attempts, and is the first one to build on the crucial fact that most real-world networks exhibit a markedly modular structure.
Although we cannot put forward a theory for the division of the networks into two classes, we hypothesize that it might be related to the fact that networks in the stringy-periphery class are transportation networks, in which strict conservation laws must be fulfilled. Indeed, for transportation systems it has been shown that, under quite general conditions, a hub oligarchy is the the most efficient organization32. Conversely, both protein interactomes and the Internet can be seen as signalling networks, which do not obey conservation laws.
The modularity of a partition of a network into modules is10 where NM is the number of non-empty modules (smaller than or equal to the number N of nodes in the network), L is the number of links in the network, ls is the number of links between nodes in module s and ds is the sum of the degrees of the nodes in module s. The objective of a module identification algorithm is to find the partition that yields the largest modularity . Note that NM is only constrained to be NM≤N, but is otherwise selected by the optimization algorithm so that is maximum. The problem of identifying the optimal partition is analogous to finding the ground state of a disordered system with hamiltonian (ref. 25).
As the modularity landscape is in general very rugged, we use simulated annealing to find a close to optimal partition of the network into modules11,12,25. This method is the most accurate to date11,14.
We determine the role of each node according to two properties11,12: the relative within-module degree z and the participation coefficient P. The within-module degree z-score measures how ‘well-connected’ node i is to other nodes in the module compared with those other nodes, and is defined as where κsi is the number of links of node i to nodes in module s, si is the module to which node i belongs, and the averages 〈⋯〉j∈s are taken over all nodes in module s.
The participation coefficient quantifies to what extent a node connects to different modules. We define the participation coefficient Pi of node i as where κsi is the number of links of node i to nodes in module s, and is the total degree of node i. The participation coefficient of a node is therefore close to one if its links are uniformly distributed among all the modules and zero if all its links are within its own module.
We classify as non-hubs those nodes that have a low within-module degree (z<2.5). Depending on the amount of connections they have to other modules, non-hubs are further subdivided into11,12: (R1) ultra-peripheral nodes, that is, nodes with all their links within their own module (P≤0.05); (R2) peripheral nodes, that is, nodes with most links within their module (0.05<P≤0.62); (R3) satellite connectors, that is, nodes with a high fraction of their links to other modules (0.62<P≤0.80) and (R4) kinless nodes, that is, nodes with links homogeneously distributed among all modules (P>0.80). We classify as hubs those nodes that have a high within-module degree (z≥2.5). Similar to non-hubs, hubs are divided according to their participation coefficient into: (R5) provincial hubs, that is, hubs with the vast majority of links within their module (P≤0.30); (R6) connector hubs, that is, hubs with many links to most of the other modules (0.30<P≤0.75) and (R7) global hubs, that is, hubs with links homogeneously distributed among all modules (P>0.75).
Network randomization and statistical ensembles
We use two different ensembles of random networks19,28. In the first ensemble, which we denote by , we only preserve the degree sequence of the original network; in the second ensemble, denoted , we preserve both the degree sequence and the modular structure of the network. Averages over the first and second ensembles are denoted and , respectively.
To generate random networks in ensemble , we randomize all the links in the network while preserving the degree of each node. To uniformly sample all possible networks, we use the Markov-chain Monte Carlo switching algorithm19,33. In this algorithm, we repeatedly select random pairs of links, for example (i,j) and (l,m), and swap one of the ends of each link, so that the links become (i,m) and (l,j).
To generate random networks in ensemble , we restrict the Markov-chain Monte Carlo switching algorithm28 to pairs of links that connect nodes in the same pair of modules, that is, we apply the Markov-chain Monte Carlo switching algorithm independently to links whose ends are in modules 1 and 1, 1 and 2, and so forth for all pairs of modules. This method guarantees that, with the same partition as the original network, the modularity of the randomized network is the same as that of the original network (as the number of links between each pair of modules is unchanged) and that the role of each node is also preserved.
To investigate whether global properties are representative of module-specific properties, we focus on degree ki, clustering coefficient Ci and normalized clustering coefficient . For each module s in the network, comprising ns nodes, we compute the average of each property in the module (for example, 〈ki〉i∈s). In addition, we compute the distribution of such averages for random modules, which we obtain by randomly selecting groups of ns nodes. If the empirical module average falls outside of the 95% probability of the distribution for the random modules, we consider that the global average is not representative of the module average. We finally compute the fraction r of modules that are not properly described by the global average.
To study degree–degree correlations, we consider the average degree knni of the nearest neighbours of each node i. We define the normalized nearest-neighbours’ degree di as the ratio of knni and: (1) the average value of knnj in the network where N is the number of nodes in the network; (2) the expected value of knni in the ensemble of networks with fixed degree sequence and (3) the expected value of knni in the ensemble of networks with fixed degree sequence and modular structure Note that, in spite of the similar notation, the meaning of is somewhat different from the other two because the normalization involves an average over nodes, whereas in and the normalization involves averages over an ensemble of randomized networks.
To obtain the role-to-role connectivity profiles, we calculate the z-score19,28,29,30 of the number of links between nodes with roles i and j as where rij is the number of links between nodes with roles i and j. To obtain better statistics and an estimation of the error in the z-score, we carry out this process for several partitions of each network.
To evaluate the similarity between two z-score profiles, a and b, we use the scalar product where σza is the standard deviation of the elements in a.
We thank R. D. Malmgren, E. N. Sawardecker, S. M. D. Seaver, D. B. Stouffer and M. J. Stringer for useful comments and suggestions. R.G. and M.S.-P. thank the Fulbright Program. L.A.N.A. gratefully acknowledges the support of a NIH/NIGMS K-25 award, of NSF award SBE 0624318, of the J. S. McDonnell Foundation and of the W. M. Keck Foundation.