Graph partitioning MapReduce-based algorithms for counting triangles in large-scale graphs

Counting number of triangles in the graph is considered a major task in many large-scale graph analytics problems such as clustering coefficient, transitivity ratio, trusses, etc. In recent years, MapReduce becomes one of the most popular and powerful frameworks for analyzing large-scale graphs in clusters of machines. In this paper, we propose two new MapReduce algorithms based on graph partitioning. The two algorithms avoid the problem of duplicate counting triangles that other algorithms suffer from. The experimental results show a high efficiency of the two algorithms in comparison with an existing algorithm, overcoming it in the execution time performance, especially in very large-scale graphs.

Over the last decade, the size of graphs used in social networks has grown significantly due to the increase in data used in these networks. One of the most important problem in social networks is to analyze their large graphs to extract useful information. Sequential algorithms can't deal with large graphs due to limitations in memory and processing capabilities. We can overcome those problems by applying parallel computing in analyzing these networks. One of the most popular parallel computing methods is MapReduce 1 which is the-state-of-the-art for processing large scale graphs and is implemented on a cluster of machines using Hadoop 2 which is an open source framework provided by Apache.
One of the most major problems in graph analysis is to count the number of triangles in a graph, which is called triangle counting. Triangle counting is considered the core in many graph analytic operations such as measuring clustering coefficient 3 , transitivity ratio 4 , triangular connectivity, k-truss 5 , etc. Also, there are many real-world applications based mainly on triangle counting such as spam detection, Facebook, and LinkedIn 6 .
In this paper, we propose two new MapReduce algorithms to count triangles in large scale graphs. Our algorithms partition a large graph into sub-graphs, then count triangles in each sub-graph. After partitioning the graph, every triangle in the graph is classified into one of three categories according to the number of the partitioned sub-graphs containing that triangle. The three triangle categories are named Type-1, Type-2, and Type-3 triangle.
We evaluate our two MapReduce algorithms locally in a single node running Hadoop, and also distributed in a cluster of 15 nodes running Hadoop. Experimental results show that our two algorithms have better performance in execution time than the existing algorithm, especially for very large-scale graphs.

MapReduce
MapReduce 1 is a parallel distributed programming model for processing huge amounts of data (i.e. size is in Terabytes or Petabytes) on large clusters of commodity machines. In this section, we give a brief overview of MapReduce algorithm and how it works.
MapReduce is inspired from Map and Reduce operations in functional programming languages. Using MapReduce, programmer can write a distributed application easily. The most characteristic that MapReduce provides is fault-tolerance, high scalability, and low cost. Hadoop 2 is an open source framework for implementing a MapReduce on cluster of machines. Hadoop has two major layers: Computation layer (MapReduce), and Storage layer (Hadoop Distributed File System [HDFS]). Inputs and outputs of MapReduce are stored in ≺ key; value ≻ pairs. MapReduce model consists of three phases: Map, Shuffle, and Reduce. First, Map Phase is written by the programmer. Each Map instance, also called Mapper, receives a line from an input file on HDFS in the form of ≺ key; value ≻ pairs, where key is start position of line in the file, and value is line content. Output of Map instance is a number of ≺ key; value ≻ pairs. However, Map instance may have no output if required. The Shuffle Phase is not written by the programmer, as it is done automatically by the framework. The input of the Shuffle phase is the output of Map phase. Shuffle Phase sorts the output of Map phase, and then merge all its elements value; that have the same key as ≺ key; {value1, value2, . . . } ≻ . Finally, Reduce Phase is written by the programmer. Each Reduce instance, also called Reducer, receives one of the output pairs ≺ key; {value1, value2, . . . } ≻ of the Shuffle phase as its input. The output of Reduce phase is a number of ≺ key; value ≻ pairs that are stored on HDFS. An example of how a MapReduce work is shown in Fig. 1.

Triangles count
List of notations used in this paper is shown in Table 1. Let G(V , E) be an undirected graph, where V is set of vertices, E is set of edges, n = |V | , and m = |E| . We define the set of Neighbors of vertex v as τ (v) = {u ∈ V |(v, u) ∈ E} , and the degree of vertex v as d(v) = |τ (v)| . A triangle, �(u, v, w) , in a graph G is any three vertices in the graph which are connected to each other, i.e. (u, v), (v, w), (u, w) ∈ E . Counting the number of the triangles in the graph While NodeIterator++ algorithm is a modified version of NodeIterator. The problem of NodeIterator is to count triangle six times. When it passes by a vertex, it selects the edge connected to it, so each edge is selected two times as it has two end vertices, so for a triangle the three edges are counted six times. NodeIterator++ avoids this problem by using a total order on all of the vertices denoted by ≻ , e.g. u ≻ v if d(u) > d(v) . The running time of NodeIterator++ is O m 3/2 8 . Another triangle counting algorithm is Edge-Iterator algorithm which iterates each edge (u, v) ∈ E , and computes the neighbors of source vertex u and target vertex v, then counts the common neighbors of u and v. Forward algorithm is another sequential algorithm for counting triangles in the graph which is an enhanced version of Edge-Iterator which doesn't compare all neighbors of two adjacent vertices. The running time of Forward algorithm is O m 3/2 23 and its memory space has θ (3m + 3n) 23 . An enhancement version of Forward algorithm is Compact-Forward algorithm shown in 23 that reduces memory space from θ(3m + 3n) to θ(2m + 2n) . In another hand, there are many MapReduce algorithms to count triangles in an enormous graph as mentioned in "Related work" section.

Our proposed algorithm
We propose two enhanced MapReduce algorithms to count the number of triangles in large-scale graphs. Those algorithms avoid duplication problem in TTP algorithm. Before, we propose two algorithms, some terms are required to understand those algorithms. In our work, each triangle in the graph �(u, v, w) can be classified either Type-1, Type-2 or Type-3 where: • Type-1 the three nodes of the triangle are in the same partition, e.g. �(1, 2, 3) shown in Fig. 2.
• Type-2 two nodes of a triangle are in the same partition, and the third node exists in a different partition, e.g. �(2, 3, 4) shown in Fig. 2. • Type-3 each of the three nodes of the triangle exists in different partitions, e.g. � (3,4,10) shown in Fig. 2.
Moreover, there are three types of partitioning a graph into a set of sub-graphs which are 1-partition, 2-partition or 3-partition. These three types represent the three types of the triangle that can be defined as follows: For every vertex v in this sub-graph, the partition number of this vertex, P(v) , equals i. For example, for ρ = 4 , the 1-partition sub-graphs of the graph shown in Fig. 2 are G 1 , G 2 , G 3 , and G 4 as shown in Fig. 3. In general, for any graph partitioned in ρ sub-graphs, there are ρ 1-partition sub-graphs. • 2-partition 2-partition graph is denoted by G ij = V ij , E ij for 1 ≤ i < j ≤ ρ . This graph contains every vertex v of the graph if the partition number of this vertex, P(v) , equals i or j. For example, for ρ = 4 , the 2-partition sub-graphs of the graph shown in Fig. 2 are G 12 , G 13 , G 14 , G 23 , G 24 , and G 34 as shown in Fig. 4. In general, for any graph divided into ρ sub-graphs, there are ρ 2 2-partition sub-graphs.
• 3-partition 3-partition sub-graph is denoted by G ijk = V ijk , E ijk for 1 ≤ i < j < k ≤ ρ which is a sub-graph with partition number of every vertex in such graph, P(v) , equals to i, j or k. For example, for ρ = 4 , the 3-partition sub-graphs of the graph shown in Fig. 2 are G 123 , G 124 , G 134 , and G 234 as shown in Fig. 5. In general, for any graph separated into ρ sub-graphs, there are ρ 3 3-partition sub-graphs.

P(u)
Partition number of a node u.    www.nature.com/scientificreports/ An edge (u, v) is called inner-edge if both node u and v exist in the same partition; otherwise is called crossedge. As mentioned before, TTP algorithm processed Type-1 redundantly while Type-2, and Type-3 processed only once. Type-3 only contains cross-edges while Type-2 contain inner-edges and cross-edges, and Type-1 contains only inner-edges. Hence, OTP avoids duplication problem by treating Type-1 and Type-2 at the same time in 1-partition sub-graphs and Type-3 alone in 3-partition sub-graphs. OTP divides graph into ρ equal sized sub-graphs. Since, Type-2 triangle contains one inner-edge and two cross-edges; where each cross-edge is included in two sub-graphs. So, cross-edges are converted to inner-edges by duplicating cross-edges in both two sub-graphs according to partition number of two vertices of those edges. Therefore, OTP algorithm treats Type-2 triangle as Type-1 triangle where both types belong to 1-partition sub-graphs. For example, �(1, 2, 3) is a Type-1 triangle, P(1) = P(2) = P(3) = 1 , so (1, 2) , (1, 3) , and (2, 3) are only in G 1 ; while �(2, 3, 4) is a Type-2 triangle, where the inner-edge (2, 3) , P(2) = P(3) = 1 , presents only in G 1 , and cross-edges (2,4) and (3,4) [ P(2) = P(3) = 1 and P(4) = 2 ] are in both G 1 and G 2 . So, we put each edge of Type-1 and Type-2 to a single sub-graph according to partition number of the vertex and if partition number of two nodes are different, edge is put in two sub-graphs as shown in Fig. 6. 3-partition sub-graphs of OTP algorithm contain only Type-3 triangles OTP algorithm consists of Map and Reduce functions. In the Map function (Lines 1-9), a graph is divided into 1-partition and 3-partition sub-graphs. Each edge of the graph is sent to Map instance as input. If edge (u, v) is inner-edge, the output pair of Map instance is ≺ P(u); (u, v) ≻ (Line 2), where P(u) returns an integer within [1, ρ] that refers to partition number of a node u. This mean that the edge (u, v) is in G P(u) . If edge (u, v) is a cross-edge, then this edge may be belonging to 1-partition or 3-partition sub-graph as explained earlier. So, we treat it as 1-partition and distribute this edge to the two different 1-partition graphs (Line 2 and Line 4) and also distribute it as a 3-partition sub-graph (Line 9). So, the output of Map instance will be ≺ P(u); where P(2) = 1 and P(4) = 2 , the output will be ≺ 1; (2, 4) ≻ , ≺ 2; (2, 4) ≻ , ≺ (1, 2, 3); (2, 4) ≻ , and ≺ (1, 2, 4); (2, 4) ≻ . The output of Map instance will be as ≺ key; value ≻ , where key refers to the graph number (1-partition or 3-partition) and value refers to the edge belonging to such graph. After all map instances complete, all values of map outputs are combined together if they have the same key as mentioned in "MapReduce" section. In the reduce function (Lines 10-24), triangles are counted and identified in each sub-graph. The input of each reduce instance is a graph number (1-partition or 3-partition) as a key and all edges belonging to this graph as a value. The function in the reduce step is based on Compact-forward algorithm 23 in which it is parallelized to enhance the performance time of the algorithm. For each edge (u, v) in the graph, search for a common neighbor w for both u and v; i.e. w ∈ τ (u) and w ∈ τ (v) . The output of reduce instance is ≺ (u, w, v); 1 ≻ ; if a common neighbor w is found between u and v (Lines 21-23) to avoid processing the edges of triangle three times. To avoid the concurrency problem, we use the lock mechanism in Line 22 to avoid race condition problem that may arise when two different iterations write their own result to the same location of the file. For example, in Fig. 2, if the input of reduce instance is ≺ (1, 2, 4); {(2, 4), (3,4), (3,10), (4, 10)} ≻ , the output will be ≺ (3, 4, 10); 1 ≻ only, not ≺ (4, 3, 10); 1 ≻ or ≺ (3, 10, 4); 1 ≻ . In lines 16-19, we search for the next minimum neighbor of two nodes of the edge if this neighbor is not common between source and destination vertices of the edge.

Lemma 1 Each triangle in the graph is counted exactly once by OTP.
Proof Each of Type-1 and Type-2 triangles appears only once in one of the ith 1-partition sub-graph G i . Since, Type-1 (u, w, v) triangle, i.e. P(u) = P(w) = P(v) , appears only in G P(u) , i.e. u, w, v ∈ G P(u) , and Type-2 (u, w, v) triangle, i.e. P(u) = P(w) and P(w) = P(v) , appears only in . Therefore, Type-1 and Type-2 triangles are counted correctly and only once. Each one of Type-3 triangles appears only once in one of the 3-partition sub-graphs. Since, Type-3 (u, w, v)  www.nature.com/scientificreports/ triangle, i.e. P(u) = P(w) = P(v) , appears only in G P(u)P(v)P(w) , where all of P(u), P(v), P(w) ∈ [1, ρ] ; i.e. (u, w), (w, v), (u, v) ∈ G P(u)P(v)P(w) . Therefore, Type-3 triangles also are counted correctly and only once. Thus, all triangles in the graph are counted exactly once.

Lemma 2 Expected number of all map instances output of OTP is
Proof The proof consists of two consequent steps. In the first step, if a map instance input is an inner-edge, then the output is G i where i ∈ [1, ρ] and i is the partition number of this edge. Therefore, every inner-edge in the graph appears only in one sub-graph. The probability that an edge is an inner-edge is 1 ρ . So, the probability of all inner-edge in the graph is m ρ . Therefore, the expect size of inner-edges output is: In the second step, if a map instance input is a cross-edge, then the output is both G i and G ijk , where 1 ≤ i < j < k ≤ ρ . Since, the output of every cross-edge for G i is generated two times and the output of every cross-edge for G ijk is (ρ − 2) times. So, the total output of every cross-edge is 2 + (ρ − 2) = ρ times. Hence, the probability of cross-edge is 1 − 1 ρ m = ρ−1 ρ m . Therefore, the expected number of cross-edges output is: From the above two steps, we include that the expected number of all map instances output of OTP is: Proof Each reduce instance input is either ≺ (i); E i ≻ or ≺ i, j, k ; E ijk ≻ . The probability that two nodes of the edge are in a specific partition is 1 For the 1-partition sub-graph, it contains inner-edges and crossedges of the graph. Since the expected number of two nodes of inner-edges in 1-partition sub-graph equals 1 ρ × 1 ρ = 1 ρ 2 , and the expected number of two nodes of cross-edges in the same partition equals Hence, for m edges, the expected number of two nodes of the inner-edges (cross-edges) in the same partition equals m × 1 . Therefore, the expected number of edges in 1-partition is: For the 3-partition, it contains cross-edge only. The number of two nodes of the edge in 3-partition equals 3 2 ; hence, the expected number of edges in 3-partition is: From the above two equations, we include that for any input, reduce instance takes O m ρ 2 . (1) From Lemma 3, reduce instance takes O m ρ 2 as input, and assume that the graph is a sparse graph; Therefore, the running time of reduce instance is O m ρ 2 .

Theorem 1 The running time of reduce instance of OTP algorithm is better than TTP algorithm.
Proof From Lemma 4 (Assume graph is a sparse graph), the running time of OTP algorithm is O m ρ 2 . TTP algorithm also takes O m ρ 2 as input and the running time of reduce instance is O m 3/2 9 . Hence, the running time of reduce instance in TTP algorithm is O m ρ 2 3/2 . Therefore, the running time of reduce instance of OTP algorithm is better than TTP algorithm.
ETTP consists of Map and Reduce functions. In the map function (Lines 1-11), a graph is divided into both 2-partition and 3-partition sub-graphs. Each edge of the graph is sent to Map instance as input. If edge (u, v) is an inner-edge, the output pair of Map instance is ) is a cross-edge, then this edge may be belonging to 2-partition or 3-partition graph. So, we treat it as 2-partition and distribute this edge to all 2-partition sub-graphs to which it belongs and also distribute it as 3-partition graph. So, the output of Map Thus, the output of Map instance is ≺ key; value ≻ , where key refers to the graph partition number (2-partition or 3-partition) and value refers to the edge belonging to that graph. After all map instances complete, all values of map outputs are aggregated together if they have the same key as mentioned in "MapReduce" section. In the reduce function (Lines 12-31), triangles are counted and identified in each sub-graph. The input of each reduce instance is the graph partition number (2-partition or 3-partition) as a key and all edges belonging to this graph as a value. Reduce instance algorithm of ETTP is also based on Compact-forward algorithm. For each edge (u, v) in the graph, search for a common neighbor w in τ (u) and τ (v) . If w's id is between u's and v's ids (i.e. w ≺ u, v ) (Line 23) and the triangle is a Type-1 triangle (Line 24), then it counts the triangle only once when the partition number of vertex, P(u) , equals to i and j = i + 1 (i.e. the first 2-partition sub-graph belongs to it), or if this triangle exists in the last partition [i.e. P(u) = ρ ], then the triangle is counted and identified only in the last sub-graphs [i.e. P(u) = ρ and j = i + 1 ] (Lines 25-27). So, Lines 24-27 of the algorithm count Type-1 triangle only once. For example, in Fig. 2, although �(1, 2, 3) is a type-1 triangle that exists in G 12 , G 13 , and G 14 , the algorithm considers it only in G 12 only. Also, �(10, 11, 12) is a type-1 triangle where the partition number of its three nodes is ρ in which it exists in G 14 , G 24 , and G 34 . The algorithm identifies it only in the last sub-graph G 34 . If w's id is between u and v (i.e. w ≺ u, v ) (Line 23) and the triangle is not Type-1 (Line 28), then it counts this triangle (Line 30). For example, in Fig. 2, if the input of reduce instance is ≺ (1, 2); {(1, 2), (1, 3), (2, 3), (2, 4), (3,4), (4, 5), (5, 6)} ≻ , the output will be ≺ (1, 2, 3); 1 ≻ (i.e. Type-1 triangle), and ≺ (2, 3, 4); 1 ≻ (i.e. Type-2 triangle); if the input of reduce instance is ≺ (1, 2, 4); {(2, 4), (3,4), (3,10), (4, 10)} ≻ , the output will be ≺ (3, 4, 10); 1 ≻ (i.e. Type-3 triangle).

Lemma 5 Each triangle in the graph is counted exactly once by ETTP.
Proof Each Type-1 triangle, �(u, v, w) , appears in 2-partition graph. So, Type-1 triangle is counted only once in the first sub-graph G ij it belongs, [ j = i + 1 and i = P(u) ] or in the last sub-graph when the partition number of three nodes of the triangle belongs to the last partition [ j = i + 1 and P(u) = ρ ]. So those two conditions allow Type-1 triangles to count only once. While, each one of Type-2 triangle, �(u, v, w) , appears only once in 2-partition, G P(u)P(w) , where P(u) < P(w) . Moreover, Type-2 triangle appears only in 2-partition sub-graph not 3-partition sub-graph because there is an inner-edge in the triangle of Type-2 that exists only in 2-partition subgraph. Therefore, Type-2 is counted correctly. On the other hand, Type-3 triangles appear only once in 3-partition sub-graphs. Therefore, ETTP counts the triangles correctly and only once.

Lemma 6 Expected number of all map instances output of ETTP is
Proof The proof consists of two consequent steps. First, if map instance input is an inner-edge (u, v) , then the output is G ij where i, j ∈ [1, ρ], i � = j , and partition number of two nodes belongs to i or j. Therefore, the output of every inner-edge is ρ − 1 time. The probability that an edge is inner-edge is 1 ρ . So, probability of all inner-edge in the graph is m ρ . Therefore, the expect size of inner-edges output is: Second, if map instance input is cross-edge, then the output is both G ij and G ijk where i, j, k ∈ [1, ρ] and i = j = k . Then, the output of every cross-edge for G ij is generated one time and the output of every cross-edge for G ijk is www.nature.com/scientificreports/ (ρ − 2) times. So, total output of every cross-edge is 1 + (ρ − 2) = ρ − 1 times. The probability of cross-edge is 1 − 1 ρ m = (ρ−1) ρ m . Therefore, the expected number of cross-edges output is: From the above two steps, we include that the expected number of all map instances output of ETTP is: The probability that two nodes of the edge are in a specific partition is 1 For the 2-partition, it contains inner-edges and cross-edges of the graph. Since the expected number of two nodes of inner-edges in 2-partition equals 1 ρ 2 + 1 ρ 2 = 2 ρ 2 , and the expected number of two nodes of cross-edges in the same partition equals 1 ρ 2 . Hence, for m edges, the expected number of two nodes of inner-edges in the same partition equals m × 2 ρ 2 = 2m ρ 2 , and the expected number of two nodes of cross-edges in the same partition equals m × 1 ρ 2 = m ρ 2 . Therefore, the expected number of edges in 2-partition equals: For the 3-partition, it contains cross-edge only. The number of two nodes of the edge in 3-partition equals 3 2 .
Hence, the expected number of edges in 3-partition equals: From the above two equations, we include that for any input, reduce instance takes O m ρ 2 .

Lemma 8 The running time of reduce instance of sparse graph is O(m).
Proof It's already proofed in Lemma 4. From Lemma 7, reduce instance takes O m ρ 2 as input, and assume graph is a sparse graph; Therefore, the running time of reduce instance is O m ρ 2 . . Therefore, the running time of reduce instance of ETTP algorithm is better than TTP algorithm.

Experimental results
In this section, we present and discuss the experimental results of our algorithms. We ran our two algorithms on a set of datasets found in SNAP 24 and compared their running time with TTP algorithm. The experiments are divided into two parts. In the first part, the three algorithms run locally on a single node running Hadoop and in the second part, the three algorithms run in a distributed made on a cluster of machines having Hadoop running on them. Table 2 shows the basic characteristic of the datasets used in the experiments.
Single node. In the first set of experiments, the three algorithms are run on a single machine with Intel Core i5 processor, and 4GB RAM. This machine has Hadoop software running on it. Table 3 shows the running times of our two algorithms and TTP algorithm on this single node using a fixed number of partitions ( ρ = 20 ). From Table 3, we notice that our two algorithms, OTP and ETTP, always have running times smaller than that of TTP algorithm. In the case of big datasets with very high number of nodes and edges such as Brightkite_edges dataset, we notice that our two algorithms are much clearly faster than TTP algorithm, while OTP algorithm has better Multi node. In the second set of experiments, the three algorithms are run on a cluster of 15 nodes (one master node and 14 slaves) running Hadoop framework. The 15 nodes are homogeneous and each node is a machine with Intel Core Quad processor, and 3.7 GB RAM. We run our two algorithms on the cluster and compare the results with TTP algorithm as shown in Table 4 with ρ = 20 . From Table 4, we notice that both our two algorithms are better than TTP algorithm. In the case of big dataset such as soc-Epinions dataset shown in Table 4, we notice that our two algorithms are much faster than TTP algorithm, and ETTP algorithm has better performance time than OTP algorithm. Therefore, our experimental results show that our two algorithms are faster than TTP algorithm, and OTP algorithm has better performance time than ETTP algorithm in smaller cluster. Also, we study the effect of number of partitions on the running times of the three algorithms applied in ca-HepTh dataset and wiki-Vote dataset as shown in Fig. 9. The figure shows that OTP and ETTP is more efficient than TTP algorithm when applied with different ρ partitions. Finally, we evaluate the workload of OTP, ETTP, and TTP as well in terms of the number of shuffles and the number of reducers as shown in Fig. 10. The figure shows that OTP has less workload than both ETTP and TTP. However, ETTP is better, as concluded earlier, and recommend to use in a large cluster of machines.

Conclusion
Triangle counting is used significantly in many applications especially in social network analytics. Many researchers presented algorithms to solve this problem, but those algorithms can't solve the problem properly due to the huge data. So, researchers use parallel algorithms over distributed frameworks (e.g. Hadoop MapReduce) to solve the problem as it is hard to use sequential algorithms to solve the problem. We use parallel algorithms to solve the problem, where we proposed two algorithms based on MapReduce parallel computing and graph partitioning to significantly enhance the time performance. The two proposed algorithms, ETTP and OTP, avoid repeated triangle counting by identifying each triangle only once in the graph. The experimental results show that ETTP and OTP algorithms give better execution time than the previous MapReduce algorithms, where ETTP is much better and recommended over OTP algorithm in a large cluster of machines. In the future, we plan to improve the performance of the proposed algorithms as well as evaluating the proposed algorithms on large datasets.