Introduction

Networks are ubiquitous. This has triggered a vast amount of research in the last two decades. When focusing on the local connectivity of subgraphs within a network, two approaches have been identified, motifs and graphlets. Motifs are defined as sub-graphs that repeat at frequency higher than in the random graphs1,2. However, they depend on the choice of the network’s null model. In contrast, graphlets3 are induced sub-graphs of a network that appear at any frequency and hence are independent of the null model. Graphlets have found numerous applications as building blocks of network analysis in various disciplines ranging from social science4,5 to biology6,7. In social science, graphlet analysis (known as sub-graph census) is widely adopted in sociometric studies4. Recently, graphlet analysis has also been adapted for directed networks8,9.

Real networks come in various structures including multilayer networks, multiplex networks, multilevel networks or multiscale networks. Understanding local connectivity of subgraphs by using graphlets in these networks could, therefore, lead to improved predictive modes of networks and/or to enhanced description of their properties. In sociology, the importance of multiplex networks has been emphasized by many scholars. White, Boorman and Breiger10 and Boorman and White11 treated multiple networks as a foundation of social structure and argued that the patterning and interweaving of different types of ties are needed to describe and characterize social structures. Multiplexity is critical to diverse phenomena such as the mobilization of social movements12, the consolidation of political power13, the emergence of trust in economic relationships14, the creation of social bonds within civic networks15, and the organization of party coalitions16. Multiplex networks have been studied to understand scientific collaboration17, structural logic of intra-organizational networks18, formation of ties featuring both an economic and a social component in inter-organizational networks19, and formation of relationships among producers in the multiplex triads20.

Multilayer/multiplex networks21,22 have recently been a subject of particularly intense research by the network science and physics communities. Novel structural descriptors23,24,25,26,27 and tools from statistical physics28,29 have been developed for studying multilayer networks. By analyzing multilayer networks, instead of relying on their monolayer counterparts, scientists have documented evidences of novel features and novel insights about real systems (see, for example30,31,32). Multilayer networks are fundamental for understanding of dynamical processes on networked systems, including, for example, spreading processes, such as flows (and congestion) in transportation networks33,34, and information and disease spreading in social networks35,36,37,38,39.

This paper aims at developing graphlet analysis for multiplex networks. The study has been motivated by two facts: (1) graphlets are powerful tool for analyzing single plex/layer networks, and (2) networks with multiplex/multilayer structures are common in nature and societies. The central problem in addressing graphlets in a multiplex network comes from the explosive growth in the number of various edge types with the linear increase in the number of plexes in the network. We now present an illustrative example showing graphlet analysis’s challenges and how they are addressed for multiplex networks.

Consider a simple undirected (single-plex) graph \(G=(V,E)\), see Figure 1, for which

$$V=\{1,2,3,4,5,6,7,8\}$$
(1)
$$E=\{12,13,23,28,34,36,38,45,67,68,78\}.$$
(2)
Figure 1
figure 1

Upper-left panel: single plex graph. Nodes of a graphlet are classified into different orbits. Upper-right panel: multiplex graph with 3 plexes \(a,b,\) and \(c\). On the same panel the multiplex graph is represented as edge-labeled graph: each edge is labeled with a single label, an element of the set \(\{a,b,c,ab,ac,abc\}\). Left table: triangles and wedge stars. Right table: wedge paths. The column ‘nodes’ shows the nodes of the corresponding orbit (triangle, wedge star, or wedge path). The node in the small bracket (for the first triangle this is the node 1 shown as (1)) indicates the first node of the orbit. The column ‘edges’ shows the full description of sub-orbits. While for the single plex graph (upper-left panel) all triangles are same, for the multiplex graph (upper-right pane) these triangles are different and are labeled with a single label from the set \({E}_{t}\). For example the edges of first triangle are labeled with \(ab.ab.b\). The column ‘edges_1’ shows the first reduction of the number of sub-orbits. Thus, for example, all wedge stars \(ab.ac\), \(ab.ab\), \(ab.bc\), and \(ac.bc\) are represented as \(2.2\). The column ‘edges_2’ represents another reduction in which \(ab.ac\), \(ab.bc\), and \(ac.bc\) are represented as \({2}_{x}{.2}_{y}\), while \(ab.ab\) is represented as \({2}_{x}{.2}_{x}\). The number of wedge paths are doubled with paths in which the other non-central node is a starting node of the path.

A graphlet \(G^{\prime} =(V^{\prime} ,E^{\prime} )\) is an induced subgraph of G (see Materials and Methods). Thus, for example, \(G^{\prime} =(\{3,4,5\},\{34,45\})\) is a graplet called wedge. By taking into account the “symmetries” between nodes in a graphlet, the nodes of the graphlet can be classified into different orbits (see Materials and Methods). The node 3 of the graphlet \(G^{\prime} =(\{3,4,5\},\{34,45\})\) belongs to the orbit 1 called “wedge star”, while the node 4 of the graphlet G′ is classified as “wedge path” and is denoted as orbit 2. The orbit 0 of a graphlet is node degree. Consider now a 3-plex network shown on Fig. 1. We write \(a,b,c\) for network plexes. Let \({E}_{t}=\{a,b,c,ab,ac,bc,abc\}\). A multiplex graph is a special type of edge-labeled graph in which each edge is labeled only with a single label – an element of the set \({E}_{t}\). Thus, for our example, the multiplex graph can be written as labeled \(G=(V,{E}_{label})\), see Fig. 1, for which

$$V=\{1,2,3,4,5,6,7,8\}$$
(3)
$${E}_{label}=\{{12}_{ab},{13}_{ab},{23}_{b},{28}_{b},{34}_{ac},{36}_{ab},{38}_{bc},{45}_{b},{67}_{abc},{68}_{c},{78}_{b}\},$$
(4)

where eα means the edge \(e\in E\) is labeled with \(\alpha \in {E}_{t}\). Two problems will be addressed when dealing with graphlet analysis of multiplex networks. The first problem is related to the fact that each orbit of a multiplex network consists of sub-orbits. Thus, the first orbit, the orbit 0, consists of 7 sub-orbits: \({0}_{a}\), \({0}_{b}\), \({0}_{c}\), \({0}_{ab}\), \({0}_{ac}\), \({0}_{bc}\), and \({0}_{abc}\). To successfully define the sub-orbits for other orbits (wedges, triangles, and so on) we first introduce two notations: (1) permutation \([a,b,c]\), stressing the order of edge types as part of the graphlet and (2) set \(\{a,b,c\}\) stressing the symmetry the edges have as part of the graphlet. Thus, for example, the sub-orbit of the orbit wedge star for the node 4 in the graphlet with vertex set \(\{3,4,5\}\) is \({2}_{ac.b}\) which is for simplicity on the Fig. 1 written as \(ac.b\). For our example, all sub-orbits of the orbit 2 (wedge star) are elements of the set \({2}_{\{\alpha .\beta \}}\) where \(\alpha ,\beta \in {E}_{t}\); the cardinality of this set (the set of wedge star sub-orbits) is 28. The sub-orbit of the orbit wedge path for the node 2 in the graphlet with vertex set \(\{2,8,6\}\) is \({1}_{b.c}\) or simply \(b.c\). For our example, all sub-orbits of the orbit 1 (wedge path) are elements of the set \({2}_{[\alpha .\beta ]}\) where \(\alpha ,\beta \in {E}_{t}\); the total number of wedge star sub-orbits is 49. Section Materials and Methods describes how sub-orbits are defined for a given orbit and the size of the orbit (the cardinality of the set of sub-orbits associated with the orbit).

The second problem is the size of the orbit. Even for the simplest graphlet, a node degree, in a multiplex network the size of orbit 0 (node degree) equals \({2}^{d}-1\), where \(d\) is the number of plexes. This limits the application of graphlets for real data even for networks with small number of plexes. In order to address this problem, we propose two different methods for reducing the number of graphlets. Both methods are illustrated on Fig. 1. The simplest possible reduction of the \({E}_{t}=\{a,b,c,ab,ac,cb,abc\}\) is to the set \({E^{\prime} }_{t}=\{1,2,3\}\). In another words, an edge of a multiplex, which has been defined as an element of the set \({E}_{t}\) representing different relation types, is now (after reduction) defined as an element of the set \({E^{\prime} }_{t}\) representing the ‘strength’ (plex count) of the original edge. Thus wedge stars for nodes 3, 1, 4 and 3, 1, 6 which represent two different sub-orbits \(ab.ac\) and \(ab.ab\), respectively, are now represented as a single sub-orbit \(2.2\). More generally, in this example, all sub-orbits \(\alpha .\beta \) such that \(\alpha ,\beta \in {E}_{t}\) and \(|\alpha |=|\beta |=2\) are represented as a single orbit \(2.2\) (see Fig. 1). The second reduction takes into account that the sub-orbit \(2.2\) is result of reduction of two subsets:

$$A=\{\alpha .\beta :\alpha \ne \beta ,\alpha ,\beta \in {E}_{t},|\alpha |=|\beta |=2\}$$
(5)
$$B=\{\alpha .\alpha :\alpha \in {E}_{t},|\alpha |=2\}.$$
(6)

Both subsets A and B, in this reduction, are mapped to two different sub-orbits \({2}_{x}{.2}_{y}\) and \({2}_{x}{.2}_{x}\), respectively (see Fig. 1). This reduction results in a multiplex which is called plex-count multiplex with distinct links inside orbits. Section Materials and Methods provides the full description of both reductions.

The graphlet analysis for multiplex networks can be extended to graphs with node and/or link (categorical) attributes as well as to multilayer and/or multilevel networks as discussed in the Supporting Information (SI). Further, economic and social multiplex networks are analyzed using graphlets providing novel insight of networks’ properties and/or their local structures. In particular, for economic networks we show that (1) countries produce/trade products in local structure of triads which are not closed and (2) counties with small diversity tend to form correlated triangles. For social networks for which a strong tie is related to the multiplex structure, we provide an example of social networks for which the wedges with only strong ties are both present and significantly correlated, in contrast to the Granovetter’s seminal work on the strength of weak ties, in which it has been shown that the wedges with only strong ties are absent.

Materials and Methods

Multilayer/multiplex networks and graphlets

This paper introduces a method for graphlet analysis of multiplex networks. This analysis can also be adopted for graphs with node and/or edge (categorical) attributes. In the SI we explain how the method can be generalized to graphs with node and/or edge attribute and complex (multiplex, multilayer, and multilevel) graphs. Next, definitions of multiplex, multilayer and multilevel networks as well as graphlets for simple graphs are provided.

Multiplex networks

A multiplex network is defined as a \(d+1\)-tuple \(G=(V,{E}^{1},\ldots ,{E}^{D})\) where \(V\) is the set of nodes and for each \(\alpha \in \{1,2,\ldots ,d\}\), \({E}^{\alpha }\) is the set of edges describing the presence or absence of edges of type \(\alpha \) between pairs of nodes. Since a multiplex network is uniquely defined with the node set \(V\) and the edge sets \({E}^{1},\ldots ,{E}^{d}\), we write \(G(V,{E}^{1},\ldots ,{E}^{d})\) to denote the multiplex. The graph \((V,{E}^{\alpha })\) is also called plex. A \(k\)-plex network is a subgraph of the multiplex and is defined as \(G=(V,{E}^{{\alpha }_{1}},\ldots ,{E}^{{\alpha }_{k}})\).

Multilayer networks

A multilayer network is defined as a graph \(G=(V,E)\) for which \(V\subseteq {V}^{1}\times {V}^{2}\ldots \times {V}^{d}\) and \(E\) is the set of edges. Typically, \({V}^{\alpha }=V\), \(\alpha =1,\ldots ,d\); the elements of \(V\) are called nodes. A layer is a sub-graph \(({V}^{\alpha },{E}^{\alpha })\) for which \({E}^{\alpha }=\{ij\in E:i,j\in {V}^{\alpha }\}\). Let \(A=[{a}_{ij}^{\alpha \beta }]\), \(i,j=1,\ldots n\) and \(\alpha ,\beta =1\ldots m\), be the adjacency matrix of the graph \(G=(V,E)\). In general it is assumed that \({a}_{ij}^{\alpha \beta }\ne 0\) for \(\alpha \ne \beta \) and \(i\ne j\). In the special case when a network is represented with an adjacency matrix such that \({a}_{ij}^{\alpha \beta }=0\) when both \(\alpha \ne \beta \) and \(i\ne j\), the network is also called multiplex network. This definition implies that a multiplex consists of layers (not plexes). In what follows, we will not consider multilayer approach of multiplex networks.

Multilevel networks

A multilevel (interconnected) network is a graph \(G=(V,E)\) for which \(V={\cup }_{\alpha =1}^{d}{V}^{\alpha }\) and \(E\) is the set of edges. In general, \({V}^{\alpha }\ne {V}^{\beta }\) for \(\alpha \ne \beta \); each \({V}^{\alpha }\) represents a distinct type of nodes. Multilevel networks can be described with subgraphs \(({V}^{\alpha },{E}^{\alpha })\) called levels, where \({E}^{\alpha }=\{ij\in E:i,j\in {V}^{\alpha }\}\) and bipartite subgraphs \(({V}^{\alpha },{V}^{\beta },{E}^{\alpha \beta })\) such that \({E}^{\alpha \beta }=\{ij\in E:i\in {V}^{\alpha },j\in {V}^{\beta }\}\).

Graphs with attributes

Let \(G=(V,E)\) be a simple graph, \({V}_{A}\) be the set of node attributes, and \({E}_{A}\) be the set of edge attributes. Nodes are labeled \(i=1,2,\ldots ,|V|\), node attributes are labeled \(q=1,2,\ldots ,|{V}_{A}|\), while edge attributes are labeled \(\alpha =1,2,\ldots ,|{E}_{A}|\). We define \(|V|\times |{V}_{A}|\) matrix \(D=[{d}_{iq}]\) as \({d}_{iq}=1\) if and only if the node \(i\) has the attribute \(q\), otherwise 0. We define a sub-graph \({G}^{{\rm{\alpha }}}=(V,{E}^{\alpha })\) as \({E}^{\alpha }=\{e|e\in E\,{\rm{and}}\,e\,{\rm{has}}\,{\rm{attribute}}\,\alpha \}\).

Graphlets

Let \(G=(V,E)\) be a graph, where \(V\) is a set of nodes and \(E\) is a set of edges. A subgraph \(G^{\prime} \) of \(G\) is a graph whose set of nodes and set of edges are subsets of \(G\). An induced subgraph of \(G\), \(G^{\prime} =(V^{\prime} ,E^{\prime} )\), is a subgraph that consists of a subset of nodes in \(G\) and all of the edges that connect them in G, i.e. \(V^{\prime} \subset V\), \(E^{\prime} =\{(u,v):u,v\in V\},(u,v)\in E^{\prime} \). The size of a graphlet is the cardinality of its node set. Unless we explicitly say “induced” in this paper, a subgraph is not necessarily induced. The top part of Table 1 shows all 2-, 3-, and 4-node undirected graphlets \({G}_{i}\), 0 ≤ i ≤ 8. By taking into account the “symmetries” between nodes in the graphlet \({G}_{i}\), the nodes of \({G}_{i}\) are classified into different automorphism orbits (or just orbits, for brevity), where the nodes with the same orbit identification are topologically identical. For all \({G}_{i}\), \(0\le i\le 8\), there are 15 orbits, which are also shown as filled nodes in the top panel of Table 1.

Table 1 Sub-orbit breakdown for graphlets up to order 4. Edges that belong to symmetric sets are always lexically sorted (smaller plex count links first, lexical sort on same size links) before assigning them a specific sub-orbit instance. For a set with n elements, the number of k-combination with repetitions is denoted by \((({n}_{k}))\) where \((({n}_{k}))=(n+k-{1}_{k})\).

Graphlets in multiplex graphs

Table 1 depicts all 2–4 node graphlets with their orbits (15 in total) for a simple graph \(G=(V,E)\). For multiplex graphs, these orbits are further subdivided based on the relation (edge) types, so that each orbit consists of a number of sub-orbits that are defined by the specific edge types inside the graphlet. Let \( {\mathcal R} =\{1,2,\ldots ,d\}\) be the set of relation/edge types. Let \({\mathscr{P}}({\mathscr{R}})\) be the power set of the set \({\mathscr{R}}\), that is, the set of all subsets of \({\mathscr{R}}\), we define \({E}_{t}={\mathscr{P}}({\mathscr{R}})\backslash \rlap{/}{0}\). The set \({E}_{t}\) is the set of combinations of different edge types with cardinality \(|{E}_{t}|={2}^{d}-1\). Multiplex graphs are treated as having \({2}^{d}-1\) different types of edges so that each edge of a multiplex is uniquely defined with an element of the set \({E}_{t}\). Thus, for example, for \({\mathscr{R}}=\{1,2,3\}\), an edge of the multiplex is represented by an element of the set \(\{1,2,3,12,13,23,123\}\). To simplify the notation, the set \(\{a,b,c\}\) - the element of the set \({E}_{t}\) - will be labeled as \(abc\). Thus, the first orbit – the orbit 0 – see Table 1, consists of 7 sub-orbits: \({0}_{1}\), \({0}_{2}\), \({0}_{3}\), \({0}_{12}\), \({0}_{13}\), \({0}_{23}\), and \({0}_{123}\). To successfully define the orbit class for each orbit we first define/introduce two notations: (1) permutation \([a,b,c]\), highlighting the order of the types of edges as part of a graphlet and (2) set \(\{a,b,c\}\) (can have repeating elements, only their order does not matter), highlighting the symmetry the edges have as part of a graphlet. Thus for example, in a multiplex the orbit 1 consists of the orbit class \({1}_{[a,b]},a,b\in {E}_{t}\) while the orbit 2 consists of the orbit class \({2}_{\{a,b\}},a,b\in {E}_{t}\). For \({\mathscr{R}}=\{1,2\}\) the sub-orbit instances of orbit 1 are: \({1}_{1.1}\), \({1}_{1.2}\), \({1}_{1.12}\), \({1}_{2.1}\), \({1}_{2.2}\), \({1}_{2.12}\), \({1}_{12.1}\), \({1}_{12.2}\), and \({1}_{12.12}\), while the sub-orbit instances of orbit 2 are: \({2}_{1.1}\), \({2}_{1.2}\), \({2}_{1.12}\), \({2}_{2.2}\), \({2}_{2.12}\), and \({2}_{12.12}\). Lexical sorting (smaller plex count links first, lexical sort on same plex count links) is used to order the edges inside every wedge star sub-orbit. In this way every sub-orbit has an unique representation with which it is further identified. Figure 2 shows an sub-orbit breakdown for graphlets up to order 4.

Figure 2
figure 2

Sub-orbit breakdown for graphlets up to order 4. The top panel shows all multiplex \({G}_{k}\) graphlets, \(0\le k\le 8\) and their 15 orbits. The orbit nodes are colored gray. Every edge is identified by a letter. Each blue colored edge belongs to a permutation group ([]) inside the orbit class, while each orange edge belongs to a symmetric set group ({}). Edges that belong to symmetric sets are always lexically sorted (smaller plex count links first, lexical sort on same size links) before assigning them a specific sub-orbit instance.

A straightforward way to enumerate all possible graphlet sub-orbits is described in the Supplementary Material. Table 1 visually shows the first 15 orbits and gives information about their multiplex classes. Each orbit is uniquely represented with an orbit class, labeled as shown in Table 1 and the cardinality of this class is called the size of the orbit. To simplify the notation we write \({e}_{1}^{{O}_{i}}.{e}_{2}^{{O}_{i}}.\cdots {e}_{k}^{{O}_{i}}\) for a sub-orbit of the given orbit \({O}_{i}\). When it is clear from the text what orbit is considered, we will drop the explicit \({O}_{i}\) in order to keep the notation uncluttered. Each orbit \({O}_{i}\) can then viewed as a set of sub-orbits, \({O}_{i}=\{{e}_{1}.{e}_{2}.{e}_{3}\mathrm{.}.{e}_{k}\,|\,{e}_{j}\in {E}_{t}\}\). Inside the orbit every list of edges \({e}_{1}\mathrm{.}.{e}_{k}\) is connected in the same way, however they are differentiated by the specific values of the multiplex links \({e}_{j}\) coming from the set \({E}_{t}\), explained visually in 2.

Reducing the number of sub-orbits

A multiplex graph is a special type of labeled graph in which each edge is labeled only with a single label - an element of the set \({E}_{t}\). The number of sub-orbits grows exponentially with the linear increase in the number of plexes in the network. Even for the simplest orbit 0 – the degree – the number of sub-orbits grows as \({2}^{d}\). This limits the application of graphlets for real data even for networks with small number of plexes. In order to address this problem, we propose two different ways for reducing the number of sub-orbits.

Let us first consider an example for which \( {\mathcal R} =\{a,b,c\}\), so that an edge of the multiplex is represented by an element of the set \({E}_{t}=\{a,b,c,ab,ac,cb,abc\}\). In this case, there are 49 wedge-path sub-orbits. We write \({e}_{1}.{e}_{2}\) for a wedge-path sub-orbit, where \({e}_{1},{e}_{2}\in {E}_{t}\). Thus, for example, \(ab.abc\) denotes a sub-orbit with relation types \(a\) and \(b\) associated with the first edge and \(a,b\) and \(c\) with the second edge. The set \({O}_{wp}\) then contains 49 wedge-path sub-orbits \({O}_{wp}=\{a.a,b.b,c.c,\cdots ,abc.ab,abc.ac,abc.bc,abc.abc\}\). The set of triangle sub-orbits is similarly large \(|{O}_{tri}|=196\).

For a given orbit, the size of the set of sub-orbits (corresponding to this orbit) grows exponentially with the linear increase in the number of plexes in the networks (as shown in the Table 1). In fact, the number of sub-orbits of a given orbit is polynomial to \(|{E}_{t}|={2}^{d}-1\). Therefore, in order to reduce the size of the set of sub-orbits, one needs to reduce the set \({E}_{t}\).

The simplest possible reduction of the \({E}_{t}\) is to the set \({E^{\prime} }_{t}\) defined as \({E^{\prime} }_{t}=\{|e||e\in {E}_{t}\}\). Thus, in the above example, the set \({E}_{t}=\{a,b,c,ab,ac,cb,abc\}\) reduces to the set \({E^{\prime} }_{t}=\{1,2,3\}\). In another words, an edge of a multiplex, which has been defined as an element of the set \({E}_{t}\) representing different relation types, is now (after reduction) defined as an element of the set \({E^{\prime} }_{t}\) representing the ‘strength’ (plex count) of the original edge. This reduction leaves us with d possible edge types. The reduced multiplex is called plex-count multiplex.

For the above example, the reduced wedge-path sub-orbit set is now \({O^{\prime} }_{wp}=\{1.1,1.2,1.3,2.1,2.2,2.3,3.1,\)\(3.2,3.3\}\) with \(|{O^{\prime} }_{wp}|=9\), while the triangle sub-orbit set has size \(|{O^{\prime} }_{tri}|=18\). Furthermore, every orbit set has a cardinality that is \(poly(d)\) instead of \(poly({2}^{d}-1)\). In general, the orbit set is now defined as \({O^{\prime} }_{i}=\{{e^{\prime} }_{1}.{e^{\prime} }_{2}.{e^{\prime} }_{3}\mathrm{.}.{e^{\prime} }_{k}|{e^{\prime} }_{j}\in {E^{\prime} }_{t}\}\) where \({e^{\prime} }_{i}=|{e}_{i}|\).

The reduction of \({E}_{t}\) equalizes every \({e}_{i}\) that has the same plex count. This greatly reduces the information in the transformed graph. To maintain some of the original information, we can separate two different links inside a particular orbit that have the same plex count. Suppose we have two links \({e}_{k}={\bf{a}}{\bf{b}},\,{e}_{v}={\bf{b}}{\bf{c}}\), that belong to the following triangle edge list ab. bc. \(abc\in {O}_{tri}\). Clearly \(|{e}_{k}|=|{e}_{v}|=2\) however \({e}_{k}\ne {e}_{v}\). With the first reduction both of these links are transformed to \(2\), so that \(ab.bc.abc\to \mathrm{2.2.3}\). However, in order to retain the fact that \({e}_{k}\) and \({e}_{v}\) are different, we introduce a new reduction rule \({e^{\prime\prime} }_{i}=d\times {I}_{i}+{e^{\prime} }_{u}\), where \({I}_{i}\) represents a distinct id of the original link ei. We assign a different \({I}_{i}\) for every distinct link that shares the same plex count with another link inside the orbit. With this mapping, the triangle \(ab.bc.abc\) corresponds to \(\mathrm{2.5.3}\), where \({e^{\prime\prime} }_{1}=3\times 0+2=2\) and \({e^{\prime\prime} }_{2}=3\times 1+2=5\), since \(d=3\) and \({e^{\prime} }_{u}=|ab|=|bc|=2\). Specifically for a given orbit list \({e}_{1}.{e}_{2}.\cdots .{e}_{{\rm{l}}}\), the distinct links (of the same plex size) are given different indices from left to right, starting with 0. This reduction of the set \({E}_{t}\) adds some new labels in the set \({E^{\prime} }_{t}\) resulting in the set \({E^{\prime\prime} }_{t}\). For example, the set of sub-orbits to the wedge-path orbit is now: \({O^{\prime\prime} }_{wp}=\{1.1,1.4,1.2,1.3,2.1,2.2,2.5,2.3,3.3\}\). Although the size of the set \({O^{\prime\prime} }_{wp}\) is increased compared with the set \({O^{\prime} }_{i}\), the cardinality of the set \(|{O^{\prime\prime} }_{i}|\) is still polynomial to \(d\). We have \(|{O^{\prime\prime} }_{wp}|=11\), and \(|{O^{\prime\prime} }_{tri}|=34\) for \(d > 2\) and so on. This reduction results in a multiplex which will be called plex-count multiplex with distinct links inside orbits. Figure 3 shows all possible sub-orbits of a plex-count multiplex with distinct links inside orbits with two plexes. The exact sizes of the orbit sets, for both reductions, are shown in the Tables 2 and 3.

Figure 3
figure 3

Reducing the number of sub-orbits: plex-count multiplex with distinct links inside orbits. The first 4 orbits and 36 sub-orbits of a 2-plex network with \( {\mathcal R} =\{a,b\}\). Each plex is identified by a different color. Each of the nodes with dark gray color is identified with the specific second reduction sub-orbit.

Table 2 Reducing the number of sub-orbits: the size of the orbits 0, 1, 2, and 3 for a plex-count multiplex network.
Table 3 Reducing the number of sub-orbits: the size of the orbits 0, 1, 2, and 3 for a plex-count multiplex with distinct links inside orbits.

Graphlet metrics

Here several graphlet metrics for a multiplex network are defined. Given a node, its graphlet degree of a sub-orbit is the number of times the node is touched by the sub-orbit. The size of a graphlet is the cardinality of its node set. Let \(G=(V,{E}^{1},\ldots ,{E}^{D})\) be a multiplex network. We define a \((n,k)-\)signature vector of the node \(i\), \(S{I}_{i}(n,k)\), as a vector of graphlet degrees of the node’s (lexicographically ordered) sub-orbits, for all graphlets up to the size \(n\) of the k–plex \(G=(V,{E}^{{\alpha }_{1}},\ldots ,{E}^{{\alpha }_{k}})\).

Let \(G\) be a multiplex network with \(N\) nodes and \(D\) plexes. For large \(D\), computing the signature vector of a node is impossible and we restrict ourselves, in this case, to sub-multiplexes consisting of k plexes. We first define a \(k\)-plex and then construct the vertex \((n,k)\)–signature vector of the \(k-\)plex for graphlets up to size \(n\). In this way, for each vertex and a given k–plex \(G(V,{E}^{{\alpha }_{1}},\ldots ,{E}^{{\alpha }_{k}})\), we obtain its signature vector of length \(|SI(n,k)|\). Next, we construct an \(N\times |SI(n,k)|\) matrix whose rows are the \((n,k)\)–signature vectors for each vertex. For a given multiplex network \(G\) and its sub-network with plexes \({\alpha }_{1},\ldots ,{\alpha }_{k}\), we compute Spearman’s correlation coefficients between all pairs of columns of the above described matrix and present them in a \(|SI(n,k)|\times |SI(n,k)|\) symmetric matrix which is termed graphlet correlation matrix of the k–plex network \(G(V,{E}^{{\alpha }_{1}},\ldots ,{E}^{{\alpha }_{k}})\). In this way, the network topology and its local direction patterns, regardless of network size (the number of vertices) and network volume (the number of edges), are summarized into a \(|SI(n,k)|\times |SI(n,k)|\) matrix for the k–plex \(G(V,{E}^{{\alpha }_{1}},\ldots ,{E}^{{\alpha }_{k}})\). Since the Graphlet Correlation Matrix of a network \({G}_{1}\) is such that devises a network statistic based on correlations between the node properties across the multiplex orbits and sub-orbits, we can examine the network topology of two networks by introducing a Graphlet Correlation Distance (GCD). Moreover, for two graphs, \({G}_{1}\) and \({G}_{2}\) and their graphlet correlation matrices \(GC{M}_{{G}_{1}}\) and \(GC{M}_{{G}_{2}}\), which are clean of redundancies and encode the information about the local network topology of a multiplex network that we examine, the graphlet correlation distance is defined as the Euclidean distance of their upper triangle values:

$$GCD({G}_{1},{G}_{2})=\sqrt{\mathop{\sum }\limits_{i=1}^{d}\,\mathop{\sum }\limits_{j=i+1}^{d}\,{(GC{M}_{{G}_{1}}(i,j)-GC{M}_{{G}_{2}}(i,j))}^{2}}$$
(7)

This metric has been used for a single-plex network by Pržulj et al.7. Here the same distance has been adopted for k-plex network, \(k\ge 2\), representing the network topology through the local connectivity.

Data

Synthetic data

We generate 2000 synthetic multiplex networks, each having two plexes which are generated from the same algorithm, using four graph algorithms: Erdos-Renyi (ER), Watts-Strogatz (WS), Barabashi-Albert (BA) and a modified BA algorithm (PL- powerlaw cluster)40. The size of the network varies between \(N=100,200,300,400,500\) and both plexes are generated using the same algorithm with the same set of parameters. For the PL algorithm the triangle forming probability is 0.8, while WS has a rewiring probability of 0.01. The other parameters are either the size of the connected component or the sparsity probability. These are derived as follows: \(p\in \{0.2,0.35,0.5,0.75,0.8\},k=p\ast N\), where p is the sparsity probability for ER, while k represents the initial connected component parameter for BA and PL, and the size of the initial neighbours in WS. This is fully explained in the SI.

Economic trade networks data

International trade network data from the most recent available year (2000), provided in41, are used to construct a “Multiplex International Trade Network”, in which plexes represent products, nodes are countries and a link between country \(i\) and \(j\) in the plex \(\alpha \) exists if at least one of the countries is a significant exporter of the product \(\alpha \) to the other country. The full explanation of how the network is created is given in the SI. The final network contains of \(N=125\) countries (nodes) and \(D=957\) products (plexes). From this network we then focus on a subset of products, or combine products into product categories by their hierarchical Standard International Trade Classification (SITC) code.

Social networks data

Data collected from 75 villages in the region of Karnatka42 was used. Each node represents an individual with age ranging from 18 to 57. The individuals were asked how they interact between each other in the village, across 12 different aspects of the everyday life: visiting other’s homes, who they were inviting to visit their’s home, kin, nonrelatives with whom they socialize, from who they receive medical advice, those from whom would borrow money and from whom they would lend money, those from which they would borrow material goods (kerosene, rice, etc.), those to whom they would lend material goods, giving or getting advice, people with whom they go to pray (at temple, church or mosque). The data is organized as a multiplex network with a number of nodes dependent on the village size and \(D=12\) plexes.

Results and Discussions

A typical multiplex network has large number of plexes and, therefore, a full graphlet analysis is computationally infeasible. Moreover, a network might contain plexes, which are less significant or contain a small number of links. Because of this, often one might want to focus on a smaller number of plexes to be analyzed. Two different strategies have been adopted to address this problem. First, when the problem in question is such that full graphlet analysis is needed, we consider for a given multiplex network with \(d\) plexes, the set (or well-defined subset) of all \(k-\)plex networks \(G(V,{E}^{{\alpha }_{1}},\ldots ,{E}^{{\alpha }_{k}})\) such that \({\alpha }_{1}\ne {\alpha }_{2}\ne \ldots {\alpha }_{k}\), where \(k\) is a small number, typically \(k=2,3,4\). Second, when the problem to be addressed allows sub-orbit count reduction, we consider both reduced multiplex constructions described in the previous sections, namely plex-count multiplex and plex-count multiplex with distinct links inside orbits. In order to compare results for computing graphlets using full multiplex and two reduced multiplexes, graphlet correlation matrices (GCMs) are computed for all \(k-\)plex networks. From these GCMs we then analyze only those sub-orbit pairs for which significant correlations (above 0.7) exist in more than 60% of these \(k-\)plex networks.

Results using all three approaches (full multiplex and two reduced multiplexes) are comparable and similar to each other for two data sets from two different domains: economic world trade networks and social networks, analyzed in more details in this section. For this reason, we only present and discuss results when the second orbit size reduction is employed. For better clarity orbits are represented as \({O}_{{i}_{x}.{j}_{y}.{k}_{z}}\), where \(O\) is the orbit number, \(i,j,k\) are the plex counts and \(x,y,z\) are the distinction indices (different if the original plex sets are different). The links are ordered by the orbit class definition on Table 1 (edges inside symmetric sets being ordered lexically). As an example the sub-orbit \({3}_{\mathrm{13.4.25}}\) is rewritten as \({3}_{{2}_{x}{.1}_{x}{.2}_{y}}\).

Economic trade networks

The products in the economic trade network are labeled with a hierarchical 4 letter code (SITC Code). The first letters of the code refer to more general product classes such as products of animal or mineral origin, while the full code refers to more specific products such as skimmed milk or pork. The full network contains info for 957 specific products, which are arranged as a multiplex (957-plex) network.

In order to find more general conclusions we focus on 2 letter product class pairs. From each of these product classes we then uniformly chose up to 2000 individual products and create 2-plex networks. For example, if the chosen product classes are paper and furniture products, then from each of them we can create 2-plex networks using (craft paper, leather furniture) or (newspaper rolls, wood furniture) etc.

Figure 4a shows the histogram (normalized frequencies) of the sub-orbits in the economic network. Dominant graphlets are wedges, reflecting how economic networks are built: as trading networks between two countries. Moreover, in economic trade networks wedge paths and wedge stars are (almost) equally represented. We now examine the correlations between graphlet sub-orbits. We randomly chose 100 product class pairs. For each class pair, we select up to 2000 2-plex networks constructed from individual products belonging to the two classes respectively. For every product 2-plex network, we compute its GCM and remember the significant correlations (>0.7). Subsequently for each class pair we retain strong correlations appearing in more than 60% of the individual networks. To generalize about economic networks we than examine correlations appearing in more than 90% of the 100 random product class pairs.

Figure 4
figure 4

Histograms containing normalized frequencies of 22 (second reduction) sub-orbits from 2-plex economic trade networks (a) and social networks (b). 4950 2-plex social networks and 15390 2-plex economic trade networks were analyzed. Detailed explanation how the 2-plex networks were constructed is provided in the main text and in the SI.

The significant correlations that exist in a majority of these pairs provide a more general overview of graphlet correlations in economic trade networks. The full correlation tables are presented in the SI. Part of the correlations that emerged and are more interesting and nontrivial to our understanding are shown on Fig. 5 and described thoroughly below. For these networks, only positive correlations were found, which in the case of wedges and triangles follow an interesting pattern of behavior which also emerged when we employed the same method, for 3 and 4-plex networks. This can be reviewed in the figures provided in the SI, since here we describe in details only the 2-plex correlations and their meaning and importance.

Figure 5
figure 5

(a) Full international trade network of 2 plexes: diary and meat products. The green and orange links represent trade of dairy and meat respectively, while blue links represent both dairy and meat trade. Small induced subgraph from United Kingdom’s trade network is further visualized for better understanding for highlighting different structural patterns: (b) strong preferences of trading with neighbors of node’s traders, (c) trade hubs of the network, (d) triangle single product relations.

The correlation \({0}_{{2}_{x}}\) and \({2}_{{2}_{x}{.1}_{x}}\) appears in all 100 randomly chosen economic 2-plexes. This implies that there are trade relations from a certain node, some of which are strong (trading both products), while others are weak (trading one product). Another similar strong correlation is between \({2}_{{2}_{x}{.2}_{x}}\) and \({2}_{{1}_{x}{.2}_{x}}\) (appearing in 98% of observed networks). From these correlations we can infer that certain countries (nodes) act as a type of network hub, ‘specializing’ in two specific products (links), which they trade interchangeably with different countries. Furthermore from the frequency histogram on the same networks, it is clear that strong relations are not common (which makes sense as we choose random two products), however wedge star relations of the type \({2}_{{2}_{x}{.1}_{x}}\) appear more frequently.

We also observe triangle trade relations in the economic networks. The sub-orbits \({0}_{{1}_{x}}\) and \({3}_{{1}_{x}{.1}_{x}{.1}_{x}}\) are strongly correlated in 96% of observed networks. This implies that countries form trade deals with a specific product. This is supported also by the histogram data, which shows that single product triangles appear more frequently than other triangles in the networks. Furthermore we observe correlations between \({0}_{{2}_{x}}\) and \({3}_{{2}_{x}{.1}_{x}{.1}_{y}]}\) or \({3}_{{2}_{x}{.1}_{x}{.1}_{x}}\), which appear in 97% of sampled networks. Therefore, when a country trades one or two products with two different countries, these countries also form trade deals with each other (with a single product), meaning that one of the countries in the triangle is a ‘stronger’ trader, trading in two products, while the other two countries mostly trade with a single product. This construction is furthermore supported with the correlation between \({3}_{{2}_{x}{.1}_{x}{.1}_{x}}\) and \({3}_{{2}_{x}{.1}_{x}{.1}_{y}}\), appearing in 95% of networks, suggesting that there are ‘trade triangles’ that are composed of a strong trade link \({2}_{x}\), (which are rarer) and single product trade links \({1}_{i}\) composed of any of the two products. Moreover, the formation of these deals can be supported with the following correlations: \({2}_{{2}_{x}{.1}_{x}}\) and \({3}_{{2}_{x}{.1}_{x}{.1}_{x}}\), with 93% of observed networks; \({2}_{{2}_{x}{.1}_{x}}\) and \({3}_{{2}_{x}{.1}_{x}{.1}_{{\rm{y}}}}\), with 93% of observed networks; and \({2}_{{1}_{x}{.1}_{x}}\) and \({3}_{{1}_{x}{.1}_{x}{.1}_{x}}\), with 92% of observed networks. This implies that some of the wedge trade relations are closed into triangles with a single product trade link. The triangle relations are represented in Fig. 5b, where we can see that United Kingdom acts as a hub for trading meat and dairy (thus having a number of trade relations), and it forms triangle \({3}_{{1}_{x}{.2}_{x}{.1}_{x}}\) with Austria and New Zealand, or in Fig. 5d we have the triangle \({3}_{{1}_{x}{.1}_{x}{.1}_{x}}\) between Chile, Italy and the trading hub Netherlands.

The concept of economic complexity has been recently introduced43,44,45,46,47 with the aim to reflect the amount of knowledge that is embedded in the productive structure of an economy. Capability-driven economic competitiveness has been analyzed using three methods: methods for reflections, fitness-complexity method, and modified fitness-complexity method. Two simple measures have been introduced both related to degrees: the first is country degree (in the bipartite network) and is called diversity and the second is the product degree (in the bipartite network) and is called ubiquity. Here graphlet analysis provides another view of the economic competitiveness. For those countries with large diversity (computed as in43, for example) we found that sub-orbits \({2}_{{2}_{x}{.2}_{x}}\) and \({2}_{{1}_{x}{.2}_{x}}\) are correlated as well as sub-orbits \({1}_{{2}_{x}{.2}_{x}}\) and \({1}_{{1}_{x}{.2}_{x}}\) but not \({2}_{{2}_{x}{.2}_{x}}\) and \({2}_{{1}_{x}{.1}_{x}}\) or other pairs for wedges. This implies that the countries with large diversity are structurally (locally) well described with wedges and have both double and single plex links. On the other hand, the fact that correlations between degrees and triangles are found among sub-orbits \({0}_{{2}_{x}}\) and \({3}_{{2}_{x}{.1}_{x}{.1}_{y}}\) or \({0}_{{2}_{x}}\) with \({3}_{{2}_{x}{.1}_{x}{.1}_{x}}\), \({0}_{{1}_{x}}\) and \({3}_{{1}_{x}{.1}_{x}{.1}_{x}}\), and not between other degree-triangle pairs, in particular not between \({0}_{{2}_{x}}\) and \({3}_{{2}_{x}{.2}_{x}{.2}_{x}}\) provide evidence that that the countries with small diversity tend to form correlated triangles.

Social networks

Social network data is organized as 75 (the number of villages) multiplex networks with a number of nodes dependent on the village size and \(D=12\) plexes. Each 12-plex network is further separated into all different combinations of 2-plexes and the graphlet analysis, similar to one performed for economic data, is carried out for social data as well. We run our analysis for all possible relation pairs, 66 in total. For each relation pair we create 2-plex networks for each village, then we extract the strong correlations that appear in a majority of villages (>60%). These correlations are assumed to be representative of the specific relation pair. By finding correlations that appear in a majority of relation pairs (>80%) we aim to find more general correlations that appear in social networks.

Figure 4b depicts the histogram (normalized frequencies) of the sub-orbits in the social network. Dominant graphlets are sub-orbits: degree \({0}_{{2}_{x}}\), wedges with at least one \({2}_{x}\) link, that is \({1}_{{1}_{x}{.2}_{x}}\), \({1}_{{2}_{x}{.1}_{x}}\), \({1}_{{2}_{x}{.2}_{x}}\), and \({2}_{{1}_{x}{.2}_{x}}\), \({2}_{{2}_{x}{.2}_{x}}\), and the triangle \({3}_{{2}_{x}{.2}_{x}{.2}_{x}}\)). A significant correlation (100% of all social relation combinations, in the majority of villages) was found between degree sub-orbit \({0}_{{2}_{x}}\) and triangle sub-orbit \({3}_{{2}_{x}{.2}_{x}{.2}_{x}}\) (shown in Fig. 6b). In the seminal work Granovetter48 suggested the strength of dyadic ties to be the tool linking micro and macro levels of sociological theory. He showed that dyadic ties are related to larger structures by implementing the following principle: the stronger the tie between two individuals, the larger the proportion of individuals to whom they will both be tied. The impact of this principle on diffusion of influence and information, mobility opportunity, and community organization is well documented. This principle has been supported by providing evidence that the triads in which two ties are strong and the third is absent are unlikely to occur. Following Granovetter here we suggest multiplexity as a way of indicating a strong tie. Thus, the tie \({2}_{x}\) (in which both plexes are presented) is called strong tie, while ties \({1}_{i}\) are weak ties. We also call \({3}_{{2}_{x}{.2}_{x}{.2}_{x}}\) a strong triangle. The graphlet analysis shows that triangles other than \({3}_{{2}_{x}{.2}_{x}{.2}_{x}}\) are unlikely to occur; moreover, the occurrence of the strong triangles is highly correlated with the occurrence of the strong ties, supporting Granovetter’s principle. However, in our case, this support is direct. The occurrence of wedges \({1}_{{2}_{x}{.2}_{x}}\) and \({2}_{{2}_{x}{.2}_{x}}\) are significant (in contrast to48 in which such wedges are unlikely to occur). Moreover, we found significant correlation in 89% of 2-plexes between triads in which two ties are strong and the third is absent. In another words, we found that the wedge stars \({2}_{{2}_{x}{.2}_{x}}\) and the wedge paths \({1}_{{2}_{x}{.2}_{x}}\) are strongly correlated.

Figure 6
figure 6

(a) 2-plex social network from one of the villages (village 28): give advice (orange links) and friends relations (green links). Blue links represent both, giving advice and friend relations. Focus, again, for better visualization is given for a smaller induced subgraph. (b) Highlights the strong tie cliques in the network; (c) structural holes of the network, and (d) links between the structural holes and the outreach node of a clique.

Another concept related to weak ties is the concept of structural holes49, introduced to explain the origin of differences in social capital. An individual holds certain positional advantages/disadvantages from how she/he is embedded in neighborhoods or other social structures. A structural hole represents the gap between two individuals who have complementary sources of information. A simple measure of structural holes in a network is the bridge count. According to Granovetter, no strong tie is a bridge48. Several significant graphlet correlations support the concept of structural holes. Thus, significant correlations (>0.7) are found in 100% of the tested 2-plex networks between wedge star \({2}_{{1}_{x}{.2}_{x}}\) with \({1}_{{1}_{x}{.2}_{x}}\), also \({1}_{{1}_{x}{.1}_{x}}\) with \({1}_{{1}_{x}{.2}_{x}}\) in 95% of networks. When looking at the data we found that these structures appear often around the same person. Furthermore we observe the following correlations between \({0}_{{1}_{x}}\) and these structures: \({2}_{{1}_{x}{.2}_{x}}\), \({1}_{{1}_{x}{.2}_{x}}\) with 100% of relations pairs, also \({1}_{{1}_{x}{.1}_{x}}\) with 97%, and \({2}_{{1}_{x}{.1}_{x}}\) with 94% of relation pairs. If we examine the connections that these people have within the network, we can conclude that they behave as structural holes, meaning they connect different social cliques (households). This can be viewed on Fig. 6c which depicts several examples of individuals that are mediators between two or more households as part of different cliques. A correlation that appears in 83% of networks is between wedge paths \({1}_{{1}_{x}{.1}_{y}}\) with \({1}_{{1}_{x}{.2}_{x}}\). This correlation again signifies that the network has individuals that serve as a connection between two cliques. Furthermore, these graphlet structures can also detect individuals inside the clique that are linked to outside, having broader communication reach. This can be observed on Fig. 6d where only a few people inside a clique are connected to an outside person. For the full list of correlations we refer the reader to the SI.

We remark that these conclusions are based on the analysis of a single data set consisting of 75 social networks. Therefore, more thoughtful social network analysis for reaching more general conclusion is needed, which is, however, beyond the scope of this paper and will be provided in a future study.

Conclusions

Graphlets are a powerful tool for analyzing local network structure. Multiplex networks, multilayer networks, and networks with node and/or link (categorical) attributes are pervasive and graphlet analysis developed here can further enhance our understanding of complex networks. Graphlets provide discriminatory property for different type networks. Even a simple graphlet histogram plot of economic and social networks, see Fig. 4, shows the differences between these structures and provides evidences on how the networks are built. Wedges occur more often in economic networks rather than social networks, indicating the tendency of a country to produce/trade of a product in local structure of triads which are not closed (that is, wedges not triangles). Wedges (open triads) also appear in the social networks, however the dominant graphlets in social networks are triangles (closed triads). If the multiplex is the indicator of the strong tie, the graphlet analysis provides another evidence for the concepts of strong/weak ties and structural holes. In contrast to the work of Granovetter48, however, in our work related to a single data set consisting of 75 social networks, wedges with only strong ties are not only present but they are strongly correlated.

Graphlets can also provide clustering. Graphlet correlation matrix of a given network is represented as point in some multidimensional space, which is then visualized in 3D space, by using multidimensional scaling. This has been demonstrated with synthetic networks in Fig. 7. Several conclusions can be drawn from the figure: (1) different graphs are clearly separated, (2) same graphs with different sparsity are also distinguishable, and (3) aggregating a multiplex network in a single-plex network provides less information on the network structure. Graphlet correlation matrices for economic and social networks are visualized in Fig. 8. We again notice a good separation of the economic and social networks. This is present both in the flattened and the 2-plex representation. However, due to the additional plex information, 2-plex graph clouds have better resolution. The final part of Fig. 8 shows the synthetic and real networks in the same 3d space, and again there exists a strong separation among the different graph types. The full description on how graphlets can be used for clustering will be provided in a separate manuscript.

Figure 7
figure 7

Synthetic networks: 3D Visualization using graphlet correlations. Purple points: Erdos-Renyi (ER) network, gray points: Watts-Strogatz (WS) network, blue points: Barabashi-Albert (BA) network, and orange points: modified BA (PL- power-law cluster) network. (ac) Flattened networks of 2, 3 and 4 plexes; (df) 2-plex, 3-plex and 4-plex networks, respectively.

Figure 8
figure 8

3D visualization using graphlet correlations. Green points are the economic 2-plex networks while social 2-plexes are shown with cyan points. (a) For better visualization, 2% of 2-plex flattened networks are shown; (b) full 2-plex networks; (c) social and economic 2-plex networks positioned together with the synthetic 2-plex networks.