Exploring Spatio-temporal Dynamics of Cellular Automata for Pattern Recognition in Networks

Network science is an interdisciplinary field which provides an integrative approach for the study of complex systems. In recent years, network modeling has been used for the study of emergent phenomena in many real-world applications. Pattern recognition in networks has been drawing attention to the importance of network characterization, which may lead to understanding the topological properties that are related to the network model. In this paper, the Life-Like Network Automata (LLNA) method is introduced, which was designed for pattern recognition in networks. LLNA uses the network topology as a tessellation of Cellular Automata (CA), whose dynamics produces a spatio-temporal pattern used to extract the feature vector for network characterization. The method was evaluated using synthetic and real-world networks. In the latter, three pattern recognition applications were used: (i) identifying organisms from distinct domains of life through their metabolic networks, (ii) identifying online social networks and (iii) classifying stomata distribution patterns varying according to different lighting conditions. LLNA was compared to structural measurements and surpasses them in real-world applications, achieving improvement in the classification rate as high as 23%, 4% and 7% respectively. Therefore, the proposed method is a good choice for pattern recognition applications using networks and demonstrates potential for general applicability.


S1 Life-Like Cellular Automata
Cellular Automata (CA) are discrete mathematical models that can be used to describe the interactions between the elements (cells) of a system. Due to their abstract and dynamical nature, CA have shown to be suitable for modeling a vast amount of applications. At the same time, one interesting characteristic that is drawing attention of CA is the temporal pattern obtained from its evolution. Some of these patterns are characterized by emergent behaviors which arises from simple deterministic rules and may lead to the formation of complex patterns [10].
Formally, CA consist of a tessellation defined on a n-dimensional space and their fundamental units are called cells. For each cell i = 1, 2, ..., N a state s i is defined in time t, which is also represented by s(c i , t). CA are discrete in time and space, and their states are updated according to deterministic rules, also called transition rules, φ. These rules depend on the current state of cell i and on the states of its neighborhood. A CA is defined by the quintuple [1]: C = T , S, s 0 , N , φ , on which: • T , is an infinite tessellation on a n-dimensional Euclidean space R n , that consists of cells c i , i ∈ N; • S, a set of n states, such that S ∈ N; • s 0 , T → S assigns to every cell c i an initial state, i.e., s(c i , 0) = s 0 (ci). Notice that σ s is the probability of having a state s. For instance, σ 1 = 50% represents an uniform distribution of the initial states having ones as state cell.
• N , a function that defines a neighborhood of each cell: N : T → ∪ ∞ p=1 T p . This function maps each cell c i to a finite sequence N (c i ) = (c ij ) that consists in |N (c i )| distinct cells; • φ, an output function, such that φ : N → S. This function determines the state of cell c i in time t, i.e., s(c i , t); A tessellation T may be composed of cells on manifold formats, regular or irregular. The cells are assumed to be homogeneous, i.e., they present the same properties and follow the same transition rules [8]. The state of each cell depends only on what is happening in its local neighborhood, that is why CA have parallel architecture. Interesting patterns and particular properties were found on the famous CA Game of Life, or simply Life, introduced by Martin Gardner in the 70s [6]. Life is defined over an orthogonal bi-dimensional grid with a set of binary states: alive or dead, where each cell interacts with its eight neighbors (Moore neighborhood).
Life-like CA are a family of cellular automata inspired by the rules of Life, which are defined on the same topology and the same neighborhood, however, the set of transition functions vary. For example, the original Life rule is defined by the following notation: B3/S23, i.e., a cell is born if there are 3 neighbors alive and a cell survives if at least 2 or 3 neighbors are alive. B and S are strings containing values on the interval [0 − 8]. Since each cell can have a maximum of 8 neighbors, there are a total of 2 18 possible transition rules when combining the conditions for birth (B) and survival (S). Some of the transition rules Bx/Sy were deeply investigated regarding the patterns formed by their time evolution even acquiring names. Moreover, each of these rules presents particular properties that have been employed for specific purposes.

S2 Measurements extracted from Life-Like Network Automata
This section details three different measurements used to extract features from the spatio-temporal patterns obtained trough the evolution of the Life-Like Network Automata.

Shannon entropy (µ S )
The Shannon entropy [9] for a given node i is defined by i is the probability of having zeros in the time series and p 1 i is the probability of having ones. This measure quantifies how homogeneous is the evolution pattern and is normalized between [0, 1]. Spatio-temporal series containing only zeros or ones present µ S = 0. Whereas, oscillating patterns tend to present higher Shannon entropy.

Word length (µ W )
Tthe Word length (µ W ) describes how homogeneous is the signal regarding the length of the "words". In this context, a "word" is a sequence of ones limited by zeros, for instance, in the following sequence q = (0011101100), we have one word of length three and one word of length two. The number of evolution steps t is the largest possible word length, which is obtained when a node starts the evolution alive and never changes its state.

Lempel-ziv complexity (µ L )
The Lempel-ziv complexity is derived from the data compression algorithm proposed by Lem-pel&Ziv [7]. The Lempel-Ziv complexity measurement, (µ L ), is one of the derivations proposed by the referred authors which is based in a dictionary of sub-sequences. This algorithm counts the number of different blocks a sequence contains. The leftmost bit of a binary sequence q defines the first block. From this bit, one moves rightward bit by bit, until a sub-sequence formed from the previous block up to the current position has not appeared before, then, this new block is added to the dictionary. For example, the following binary sequence q = (01010101010101010101) of length l = 20 is decomposed into g = 7 minimum blocks "0|1|01|010|10|101|0101", so the Lempel-Ziv complexity is µ L = g log l l = 1.049.

S3 Structural network measurements
Given the adjacency matrix A, where a ij = 1, if i is connected to j and a ij = 0, otherwise, then, the number of neighbors or degree of node i is defined as: k i = N j=1 a ij , where N is the total number of nodes. Therefore, we have the following measurements [3]: • Average Degree -k : Hierarchical Degree (level 2) -H k 2 : the hierarchical degree of level 2, H k 2 , of a given node i, H k 2 (i), is the sum of the degrees of its neighbors. Therefore, the average hierarchical degree of level 2, is given by: • Average Hierarchical Degree (level 3) -H k 3 : the hierarchical degree of level 3, H k 3 , of a given node i, H k 3 (i), is the sum of the degrees of the neighbors of its neighbors. Therefore, the average hierarchical degree of level 3, is given by: • Average Clustering Coefficient ( cc ): the clustering coefficient of node i, cc i , measures the probability of two vertices j and k being connected to each other since both are connected to node i and is calculated by . The average clustering coefficient is given by: • Average Path Length (l): the average path length is the average length of the shortest paths between any two nodes, i and j, of the network.

S4 Performance measurements
The experiments performed in this study were evaluated using a set of performance measurements which are described in this section. These measurements were calculated for each experiment in order to evaluate the results obtained with LLNA in each specific application. The terminology derives from the confusion matrix: True Positive (TP), True Negative (TN), False Positive (FP), type I error, and, False Negative (FN), type II error.
• Accuracy (ACC): the rate of correctly classified instances • F1-score is a measure derived from the combination of precision and recall: • Area Under a ROC Curve (AUC): AU C = R 1 n 1 n 2 − n 1 (n 1 + 1) 2n 1 n 2 where n 1 , n 2 is the sample size for sample 1 and 2, respectively, and R 1 is the sum of the ranks in sample 1. Following the MannWhitney U statistic test.

S5 Analysis and Selection of Parameters: influence of the number of Nodes N
This section introduces the analysis of the number of network nodes, N , regarding its influence on the spatio-temporal pattern obtained by applying LLNA. Fig. S1 presents the histograms of the Shannon entropy, µ S , using rule B01678/S0457 for the four network models studied in this paper: random, small-world, scale-free and geographical. We can see that distinctly from the number of evolution steps, t, and from the distribution of initial alive population, σ, the parameter N does not influence µ S . The histograms for distinct values of N are very similar given the same network model.

S6 Classifying network models
Table S1 presents accuracy rates in classifying network models for the 10 selected Life-like rules. Columns µ S , µ W and µ L show the correct classification rates for each rule when using Shannon entropy, word length and Lempel-Ziv distributions as attributes, respectively. The results when combining these distributions are shown in the fifth column of the same table. This combination leads to a maximum accuracy of 99.992 ±0.002% for rule B135678/S03456. Finally, the last column shows the accuracies when using the average values of the same measures as attributes: µ S , µ W and µ L . Table S2 presents the performance measurements for the same experiment.  Table S2: Performance measurements corresponding to the best accuracy as shown in Table S1.
TP Rate FP Rate F-Measure MCC ROC Area Class

S6.1 Robustness to noise
The robustness of the LLNA method was evaluated regarding noise tolerance. The network topology was modified by the removal and the addition of edges according to a noise rate ρ N . As this rate increases, more structural changes are performed on the network topology. A noise rate of 20% (ρ N = 20%) indicates the addition of 10% of edges, regarding the total number of edges, and, the removal of 10% of the existing edges. Both set of edges are randomly selected. Table S3 shows the correct classification rate of the four network models when considering different values of ρ N and using SVM classifier. The feature vector composed by the combination of the distributions, [ µ S , µ W , µ L ], provided the best accuracy. In general, we can observe that the increase of ρ N does not influence the accuracy obtained for each rule since the changes of accuracy values were very small.

S6.2 Comparison with structural measurements
The comparison between the performance of LLNA and the performance using structural measurements as feature vectors is shown in. Table S4. The structural measurements were also evaluated regarding their robustness to noise. The best individual scores were obtained by the average clustering coefficient ( cc ) and by the Pearson degree correlation (ρ P ). The combination of the highlighted measurements reached 100.00% of accuracy, even in the presence of noise. LLNA achieved nearly the same accuracy with rule B135678/S03456. Regarding the network robustness to noise, some traditional measurements have shown to be more affected by the variation of ρ N , e.g, l and ρ P , whereas the others are more stable. S7 Classifying network models in combination with k Table S5 presents the accuracy in classifying the network models in combination with the mean degree k as classes, Table S6 shows the performance measurements for the best rule (B01678/S0457), and, Table S7 presents the comparison of LLNA with structural measurements. The performance of LLNA in this case surpasses the performance of the structural measurements with an improvement of 25.54 ± 0.31%. The mean degree influence in the classification of the network models can also be analyzed in Figs S2 and S3. The first presents the confusion matrix for rule B01678/S0457 and the combination of distributions as feature vector [ µ S , µ W , µ L ] and the second presents the canonical analysis for the four network models. From both figures we can see that not all k values can be discriminated. For instance, for the scale-free model it is possible to distinguish networks with mean degree 4 (red), 6 (yellow) and 8 (green), while the other classes are closer to each other. A similar behavior is observed for the geographical model. Random and small-world models also have a set of k values that are also distinguishable from the others.   Table S7: Accuracy (%) of the synthetic-dataset regarding the classification of network models and k using structural network measurements as feature vectors. Moreover, the best accuracy of LLNA using rule B05/S13568 and µ S as feature vector.
Accuracy  Figure S2: Confusion matrix for rule B01678/S0457 using the following feature vector [ µ S , µ W , µ L ], which corresponds to the best accuracy as shown in Table S5. Each cell represents a percentage (%) of the correct predicted class, where filled cells represents nonzero elements.

Scale-free
Random Geographical Small-world 1st canonical variable 2nd canonical variable (synthetic-dataset). This analysis was performed using rule B01678/S0457, which provided the highest accuracy as shown in Table S5.

S12
S8 Classifying scale-free models Table S8 presents the accuracy obtained in classifying different scale-free models, with linear and non linear preferential attachment (α = {0.5, 1.0, 1.5, 2.0}) using the Barabsi&Albert model [2], and, networks generated according to the model of Dorogovtsev&Mendes [4]. Table S9 shows the performance measurements for the best rule (B0157/S457). Finally, Table S10 presents the comparison with structural measurements. The improvement in accuracy provided by LLNA was 2.08±0.25% in this experiment.   S9 Identifying organisms using metabolic networks Table S11 presents the accuracy results regarding the metabolic-dataset for the 10 selected rules. Table S12 shows the performance measurements for the classification of the best case reported for this using rule B05/S13568 and µ S as feature vector. Moreover, Table S13 presents the comparison of LLNA with other structural measurements for the same dataset.   LLNA (B05/S13568) 87 (± 13) S14 S10 Identifying structural patterns in social network Table S14 presents the accuracy results regarding the social-dataset for the best 10 select rules. Table S15 shows the true-positive rate, false-positive rate, precision and recall, F-measure, MCC and ROC area for the classification of the best case reported on the latter experiment using rule B0167/S248 and µ L as descriptor. Moreover, Table S16 presents the comparison with traditional measurements for the same dataset.   LLNA (B0167/S248) 92 (± 1) S15 S11 Classifying stomata distribution patterns Fig. S4 depicts the construction of the network of interactions for the stomata-dataset. Fig. S4a) is the original microscopic image of a leaf from Tradescantia zebrina. Fig. S4-b) represents the stomata centroids, which were manually segmented. Finally, Fig. S4-c) shows the resulting network given the threshold δ T = 0.9062. a) b) c) Figure S4: a) Microscopic image of a Tradescantia zebrina leaf, and, b) the respective stomata distribution under 4h of artificial lighting [5]. c) Network obtained using δ T = 0.9062.