Frequent pattern mining in multidimensional organizational networks

Network analysis can be applied to understand organizations based on patterns of communication, knowledge flows, trust, and the proximity of employees. A multidimensional organizational network was designed, and association rule mining of the edge labels applied to reveal how relationships, motivations, and perceptions determine each other in different scopes of activities and types of organizations. Frequent itemset-based similarity analysis of the nodes provides the opportunity to characterize typical roles in organizations and clusters of co-workers. A survey was designed to define 15 layers of the organizational network and demonstrate the applicability of the method in three companies. The novelty of our approach resides in the evaluation of people in organizations as frequent multidimensional patterns of multilayer networks. The results illustrate that the overlapping edges of the proposed multilayer network can be used to highlight the motivation and managerial capabilities of the leaders and to find similarly perceived key persons.

The paper is organized as follows. In the first part of the Methods section, the multidimensional organizational network model is introduced. The second part of this section presents how frequent pattern mining can be used to extract information from multidimensional networks. It is believed that the proposed approach can be widely applied to find significant correlations between layers of multilayer networks. The Results and Discussion section demonstrates how the proposed approach can be used in the development of three organizations.

Methods
Multidimensional representation of organizational networks. In the proposed multidimensional organizational network the nodes represent the employees and labeled edges reflect how the members of the organization communicate, work together, rate and motivate each other, and their personal relationships. Labeled and directed connections define multiple edges form a multidimensional network  = V E D ( , , ), where V represents the node set, D the set of edge labels defines the dimensions of edges, and E denotes the edge set, as can be seen in Fig. 1. As each label can be mapped into an independent network, the model can be interpreted as a multilayer network. A multilayer network is a pair ( , ) i s a f am i ly of g r aphs G α = ( X α , E α ) ( c a l l e d l aye rs of ) a n d is the edge set between nodes of different layers G α and G β with α ≠ β 26 . E α are called intralayers and E αβ (α ≠ β) are referred to as interlayer-connections.
The studied intra-organizational networks can be considered to be directed multiplex networks which are a special type of multilayer networks. Multiplex networks consist of a fixed set of nodes connected by different types of links. In our case the G = (V, E, D) multidimensional network is associated with a multiplex network with (u, v, d) ∈ E andd = α}. Based on our organizational development experience, requirements of our business partners, and the literature of organizational network analysis connection-/interaction-, rating-/perception-, and friendship-type layers were defined in our model:  www.nature.com/scientificreports www.nature.com/scientificreports/ 2. Rating-type layers L6: he/she helps to find information L7: he/she provides the best working relationship L8: he/she has great professional knowledge L9: he/she motivates me L10: he/she is capable of solving complex tasks L11: he/she is capable of managing colleagues L12: he/she is a key person in the organization 3. Friendship-network layers L13: he/she gets along easily with me L14: I would like to have dinner with him/her L15: I would like to work together with him/her as a part of a problem-solving team An online survey was designed to identify the connections. In the survey, there were as many questions as layers. Respondents were asked to mark the names of co-workers that fit the question and were not restricted to a fixed number of answers to minimize measurement error 36 .
The combination of layers is believed to capture the essence of an organization, making it possible to extract information about working connections, trust, employee's perceptions of each other, and leadership.
Frequent pattern mining of edge labels in multidimensional networks. Discovering statistically significant correlations between layers of multilayer networks is one of the major goals of network science over the next years 26 . A recently developed edge-overlap measure evaluates the conditional probability of finding a directed link on a layer given the presence of a directed link between the same nodes on another layer 34,37 which can handle with pairs of dimension. The method is feasible for examining the overlap of a small number of dimensions. As the coexistence of links with different labels between any nodes i and j forms frequent patterns of any number of dimensions, it was found that frequent pattern mining provides a new opportunity to describe correlations between layers.
Frequent itemset mining was initially developed for market basket analysis, and it is used nowadays for almost any task that requires the discovery of regularities between (nominal) variables 38 . This concept has been extended to frequent graph-based substructure pattern mining 39 .
Our work differs from methods developed for frequent subgraph mining in unilayered (labeled) networks 40 . Labeling network motifs in protein-protein interaction (PPI) networks 41 and text networks 42 is also a similar problem. While in these tasks the labels are attached to the nodes, in our case the problem requires the identification and characterization of the frequent multidimensional edges.
As this is the first attempt to introduce frequent itemset mining into the analysis of multidimensional networks, the technique is summarized in Table 1. The dimensions D = {d 1 , d 2 , …, d M } of the network are considered to be a set of items I = {I 1 , I 2 , …, I M } (in market basket analysis, I i represents a given product). The set of transactions of the items T = {t 1 , t 2 , …, t m } are defined as a set such that t I i ⊆ is identical to a given edge in a multigraph between nodes u i and v j . Our aim is to identify frequently occurring subsets of edge dimensions and mine valuable information concerning multidimensional networks based on the analysis of these itemsets. The occurrence of an itemset C is measured as number of transactions (multidimensional edges) that the itemset contains. When this frequency is divided by the size of the transaction set | |  which is identical to the number of edges |E|, the calculated support of s T (C) represents the probability of multidimensional edge C. The ⊆ C T is referred to as frequent when s T (C) ≥ s min exceeds a user-specified minimum s min . The goal of frequent itemset mining is to find all frequent itemsets C ⊆ I in database  38 .
The resultant frequent itemsets can be used to form A⇒B association rules where A and B are disjoint subsets of C, as A ⊂ C, B ⊂ C and A ∩ B = ∅ 43 . The confidence of the rule represents the P(B|A) conditional probability: The lift l is a correlation measure that is based on the ratio of these probabilities: T T T when l < 1 A is negatively correlated with B, meaning that the occurrence of A leads to the absence of B. When l > 1, then A and B are positively correlated, meaning that the occurrence of A implies the occurrence of B 44 . Rules with high level of lift usually exhibit relatively low degree of support 45 . An alternative to lift is leverage that states how much more often A and B occur together than as independent random variables 46 .

T T T
The computational complexity of the proposed methodology is determined by the utilized frequent itemset mining algorithm. The complexity of the most widespread Apriori algorithm is  M m ( ) 2 47 , where M represents the number of items and m the number of data records, thus finding the frequent connection types has quadratic dependence on the M connection types and linear scalability in the = | | m E k number of connection. As M = 15, www.nature.com/scientificreports www.nature.com/scientificreports/ it can be concluded that the calculation of the proposed measures can be computed very quickly even for large networks.
In this section, an analogy between the measures of network science and frequent pattern mining was presented. In the following section, how frequent itemsets and association rule mining can be used to understand the formation of connections is demonstrated.
Node characterization based on incoming multidimensional edges. In (organizational) network research, three levels (dyadic, actors/nodes, networks) of the analysis can be distinguished 48 . At the dyadic level the frequent occurrence of the edge dimensions can be analyzed. Analysis at the level of the actors requires information to be aggregated with regard to the types of edges to characterize the nodes. For example, to measure the degree of innovation and problem-solving abilities of the employees, the centrality of the actors in the communication network of the organization can be studied 49 . The selection of suitable dimensions plays an important role in these ratings, e.g. as information exchange is reflected in the advice network, the perception of information access is mostly determined by the advice centrality 4 . In multilayer networks, nodes can be characterized based on their activities at different layers 26 . The distribution of degrees of nodes among layers can be described by its entropy of the multiplex degree which is similar to the multiplex participation coefficient published in ref. 34 .
In the following a novel method for the characterization of nodes is introduced by calculating the frequent patterns of the incoming/outgoing multidimensional edges of ego-networks. The directed edge set represents the incoming edges of a node u ∈ V. Frequent dimensions of outgoing and incoming edges are specific to the nodes. The outgoing edges are related to the perceptions, ratings and connections to others, while the incoming patterns reflect how the actor is rated.
provide the specifications of the node. As a node can support or weaken association rules with its incoming/outgoing multidimensional edges, the measures of the association rules can be utilized as fingerprints of the organizational network. The similarity between the nodes can be evaluated based on the incoming and outgoing patterns. Based on clustering of the nodes, similar key persons and leaders can also be identified which approach is similar to frequent pattern mining-based community detection 50 .
As modularity is based on the difference between the actual and expected number of edges 51 , the analysis of this difference can reflect attractiveness and talent in individual and organizational levels. Community detection algorithms explore densely linked groups of nodes, so these algorithms can highlight central nodes 52 , leaders of communities 53 and hierarchical structure 54,55 .
The following section demonstrates that based on the similarities of the multidimensional incoming and outgoing connections the clusters of co-workers can be determined and use extracted knowledge can be used to characterize typical roles in the organizations.

Results and Discussions
To demonstrate the applicability of the proposed methodology, leaders and key persons are identified based on the incoming edges and the determination of the effects with regard to the advice network based on frequent patterns containing the advice (L1) dimension are sought.

Frequent itemset mining Multidimensional network
Item base I = {I 1 , I 2 , …, I M } I i , for example, represents a product probability that a transaction contains A∪B probability that a multidimensional edge contains A∪B how much more often A and B occur together than expected under independence www.nature.com/scientificreports www.nature.com/scientificreports/ The studied organizational networks. 83 (response rate (RR): 75%), 57 (RR: 93%) and 203 (RR: 94%) employees from A: a not-for-profit arts organization, B: a multinational manufacturing company, and C: a cultural institute, respectively were studied. The complexity of Company A is illustrated in Fig. 2. The number of nodes and edges with their support is shown in Table 2. The high support of L13 in Company A indicates a friendly atmosphere. The reciprocity in the L13 layer is 43-44% for all companies, which correlates well with other studies 4 .
The number of two or more dimensional edges is shown in Table 3 which indicates that the majority of edges are multidimensional, only 26-33% of the edges are one-dimensional. The dimensions L4, L8 and L13 (55% of the one-dimensional in Company A), as well as the L14 and L15 tend to appear alone.

Analysis of the extracted association rules.
Finding meaningful association rules is one of the biggest challenges in data mining. Filtering the rules based on confidence and support is an obvious approach, but in Figure 2. Six layers of the organizational network of Company A (left: light blue is L1, orange is L4; middle: dark green is L8, magenta is L12; right: dark yellow is L13, dark blue is L15. The nodes are colored according to the departments they belong to. The shape of nodes corresponds to the positions as triangles represent leaders and circles stand for the employees).  Table 2. Support values of the edge labels in the studied organizations.  www.nature.com/scientificreports www.nature.com/scientificreports/ some cases, the grouping of the rules based on variables is necessary 56 , e.g., the setting of a high-threshold support would exclude rare dimensions from the rules (like L2 and L9).
A positive correlation is indicated between the antecedent and consequent sets of all rules with a lift greater than one. Only two rules exist in Company A that possess negative leverages. The L8 ⇒ L13 rule can be found on 370 edges that is less than 380 edges expected under independent conditions, which indicates on average that it is hard to get along well with people who possess a high degree of professional knowledge.
The extracted rules as grouped matrices 45 are summarized in Fig. 3, where the antecedents that are statistically dependent on the same consequents are grouped and shown in columns with their two most frequent dimensions written on the axes. Consequents are arranged in the rows. The bubbles are colored according to the median lift www.nature.com/scientificreports www.nature.com/scientificreports/ of the rules in the groups, while the sizes of the bubbles represent the medians of the supports. The resultant plots highlight that important consequents are very similar in all companies, namely L2, L9, L11, L3, L10 and L1 which refer to leadership, motivation, managerial capability, giving feedback, solving complex tasks and giving advice respectively.
The confidence values of the rules can serve as layer-overlap measures. In Fig. 4 the columns are antecedents and the rows are consequents of rules Column ⇒ Row, furthermore, the values of the matrix show the confidences of the rules. As expected, layers L2 and L9 exhibit a strong correlation between almost all other layers, while it seems that the precedences of the edges L9, L10 and L11 increase the probabilities of connection types L7, L8 and L12. www.nature.com/scientificreports www.nature.com/scientificreports/ Characterization of the leaders. The appearance of dimension L2 in a multidimensional edge shows who is considered to be a leader in an organization because he/she provides instruction in a workflow. The confidences of the leader-related rules are shown in the second columns of the matrices in Fig. 4. The c(L2 ⇒ L9) = P(L9|L2) confidence of the L2 ⇒ L9 rule is a good measure of how a co-worker perceives the motivation of his/her leader.
The two-dimensional evaluation of actors is represented in Fig. 5. The x-axis is the in-degree on the "priorities from" (L2) layer, and the y-axis shows the conditional probability of the presence of "motivates me" (L9) dimension along the same L2 edge. The in-degree centrality does not correlate with the motivating capability (Pearson's ρ between the in-degrees and the motivating capability of the nodes is 0.38 at Company A; −0.09 at Company www.nature.com/scientificreports www.nature.com/scientificreports/ B; and 0.21 at Company C), so the two dimensions provide additional information about actors. However, high and low social capital correlate with the in-degree centrality which reflects the eigenvector centrality captures the importance of the actors 57 . Eigenvector centralities of actors on the L2 layer are also well correlated with in-degrees (Pearson's ρ between the in-degree and the eigenvector centrality of the nodes is 0.71 at Company A; 0.68 at Company B; and 0.67 at Company C). The differences in the eigenvector centrality among actors with the same in-degree can be studied in Fig. 5. The leader numbered as '45' in Company A has much higher eigenvector centrality than leader numbered as '68' , but they have the same motivating capability that indicates that leader '45' motivates more important people than leader '68' which increases his/her overall importance.
The fact that there is no correlation between the numbers of motivation type connections and eigenvector centrality (Company A: 0.11; Company B: −0.06; Company C:0.17) shows that the capability of motivating may a personal trait. The plots can be utilized to evaluate the performance of the leaders and support decisions related to organizational development.

Clustering-based identification of the key persons. Finding influential employees in organizations
should differ from the analysis of formal organizational charts. Research questions like "who is considered to be a key person?" require detailed analysis. A clustering-based algorithm to answer such questions was developed. Similarly evaluated people can be clustered based on how similarly their incoming edges support the association rules. The Partitioning Around Medoids (PAM) algorithm 58 was applied to identify the clusters (see Fig. 6) as it lends itself to clustering based on the specified distance matrix 59 , it has the robustness to noise 60 and performs better for large datasets than the also popular k-means algorithm 61     www.nature.com/scientificreports www.nature.com/scientificreports/ Effects of the advice network. According to the literature, at least two kinds of processes drive how sources of advice are selected: namely status recognition and homophily 13 . The extraction of valuable information concerning these effects based on the analysis of the Lift(L1 ⇒ B) values was attempted. Table 4 shows that the edge types of leadership (L2 and L3), motivating behavior (L9), information resources (L10) and cognitive ability (L6) increase the likelihood of leadership (L1). Although there are some specificities in term of the networks of different companies, e.g. lift(L1 ⇒ L2) is much greater in the case of Company A, but its trends are very similar. The high confidence values in the L1 columns of Fig. 4 indicate that this connection type has positive effects on contact type, trusted relationships and the judgment of professional knowledge.
Dimensions that predict the occurrence of L1 are described by the A ⇒ L1 family of rules. The confidence(A ⇒ L1) of these rules represents the probability of the occurrence of L1 given the existence of the A dimensions. As is shown in Table 5, professional knowledge (L8) leads to a far more significant increase in the probability of get advice dimension (L1) than communication (L4) in the case of the existence of a leader (L2), working relationships (L5) and best working relationship (L7), as well as information sources (L6). However, L4 significantly increases the probability of the advice-type connection when motivation (L9), the capability of solving complex tasks (L10), the ability to manage colleagues (L11), and key person (L12) exist. In other words, the confidences of {L2 or L5 or L6 or L7} ∪ L8 ⇒ L1 are greater than the confidences of {L9 or L10 or L11 or L12} ∪ L8 ⇒ L1, and the confidences of {L2 or L5 or L6 or L7} ∪ L4 ⇒ L1 are less than the confidences of {L9 or L10 or L11 or L12} ∪ L4 ⇒ L1.
Most leaders (L2) give advice, especially in Company A. The probabilities are increased as the dimensions of the rules increase. Almost the same trends are shown in Tables 4 and 5 where the types of connections that are related to the advice network are presented. This result correlates well with the findings of ref. 13 which show that advice is more likely to be sought from colleagues of higher statuses.

Conclusions
Organizational networks have been considered to be multilayer networks since the early 1990s, but so far no feasible method of handling their multidimensionality has been found. It has been demonstrated that frequent pattern mining can be applied to reveal statistically significant correlations between the layers and that the method is applicable regarding edge, actor and organizational level analyses. Frequently occurring outgoing edges have been shown to be related to perceptions and ratings, while incoming patterns reflect how the actor is rated. It was also highlighted that measures of the association rules could be used to define the fingerprints of organizational networks. The applicability of the methodology was demonstrated by the characterization of leaders and key persons in three organizations. In the future, the utilization of an extracted rule-base for the design of personal development programs, and the determination of a property-preserving multidimensional edge reordering algorithm to support goal-oriented organizational development is desired. The method can be applied to other multilayer networks where layers can represent dimensions and appropriate to make rankings.

Data Availability
Original data is not available for public use.  Table 5. Confidences of the rule A⇒L1 of the companies.