Computing cliques and cavities in networks

Complex networks contain complete subgraphs such as nodes, edges, triangles, etc., referred to as simplices and cliques of different orders. Notably, cavities consisting of higher-order cliques play an important role in brain functions. Since searching for maximum cliques is an NP-complete problem, we use k-core decomposition to determine the computability of a given network. For a computable network, we design a search method with an implementable algorithm for finding cliques of different orders, obtaining also the Euler characteristic number. Then, we compute the Betti numbers by using the ranks of boundary matrices of adjacent cliques. Furthermore, we design an optimized algorithm for finding cavities of different orders. Finally, we apply the algorithm to the neuronal network of C. elegans with data from one typical dataset, and find all of its cliques and some cavities of different orders, providing a basis for further mathematical analysis and computation of its structure and function. The study of high-order networks as attracted significant attention recently. The authors introduce the concept of computability for searching maximum clique in large networks and an optimized algorithm for finding cavities with different orders.


Introduction
A network has three basic sub-structures: chain, star and cycle.Chains are closely related to the concept of average distance, while a small average distance and a large clustering coefficient together signifies a small-world network 1 , where the clustering coefficient is determined by the number of triangles, which are special cycles.Stars follow some heterogeneous degree distribution, under which the growth of node numbers and a preferential attachment mechanism together distinguishes a scale-free network 2 from a random network 3 .Cycles not only contain triangles, but also have deeper and more complicated connotation.The cycle structure brings redundant paths into the network connectivity, creating feedback effects and higher-order interactions in network dynamics.
In [4], we introduced the notion of totally homogeneous networks in studying optimal network synchronization, which are networks with the same node degree, same girth (length of the smallest cycle passing the node in concern) and same path-sum (sum of all distances from other nodes to the node).We showed 4 that totally homogeneous networks are the ones of easily self-synchronizing among all networks of the same size.Recently, we identified 5 the special roles of two invariants of the network topology expressed by the numbers of cliques and cavities, the Euler characteristic number (alternative sum of the numbers of cliques of different orders) and the Betti number (number of cavities of different orders).In fact, higher-order cliques and smallest cavities are basic components of the totally homogenous networks.
More precisely, higher-order cycles of a connected undirected network include cliques and cavities.A clique is a fully-connected sub-network, e.g., a node is a 0-clique, an edge is a 1-clique, a triangle is a 2-cliques, and a complete graph of four nodes is a 3-clique, and so on, where the numbers indicate the orders.Also, a triangle is the smallest first-order cycle, which consists of three edges.The number of 1-cycles with different lengths (number of edges) is huge in a large-scale network.Similarly, a 3clique is the smallest 2-cycle, and a 2-cycle contains some triangles.A chain is a broken cycle, where an edge is the shortest 1-chain, a triangle and two triangles adjacent by one edge are 2-chains, and so on, while a cycle is a closed chain.In the same manner, all these concepts can be extended to higher-order ones.
It is more challenging to study network cycles than node degrees; therefore, new mathematical concepts and tools are needed 6.7 , including such as simplicial complex, boundary operator, cyclic operation and equivalent cycles, to classify various cycles and select their representatives for effective analysis and computation.
In higher-order topologies, the addition of two -cycles,  and , is defined 5 by set operations as  +  = ( ∪ ) − ( ∩ ).They are said to be equivalent, if  +  is the boundary of a ( + 1)-chain 5 .All equivalent cycles constitute an equivalent class.Cavity is a cycle with the shortest length in each independent cycle-equivalent class (see Ref [5] for more details).
Cycles, cliques and cavities are found to play important roles in complex systems such as biological systems especially the brain.In the studies of the brain, computational neuroscience has a special focus on cyclic structures in neuronal networks.It was found 8 that cycles generate neural loops in the brain, which not only can transmit information all over the brain but also have an important feedback function.It was suggested 8 that such cyclic structures provide a foundation for the brain functions of memories and controls.Unlike cliques, which are placed at some particular locations (e.g.cerebral cortexes), cavities are distributed almost everywhere in the brain, connecting many different regions together.It is pointed out 9 that in both biological and artificial neural networks, one can find huge numbers of cliques and cavities which, being large and complex, have not been explored before.Of particular importance is that cavities play an indispensable role in brain functioning.All these findings indicate an encouraging and promising direction in brain science research.However, it remains unclear as how all such neuronal cliques and cavities are connected and organized.This calls for further endeavor into understanding the relationship between the complexity of higher-order topologies and the complexity of intrinsic neural functions in the brain.To do so, however, it is necessary to find most, if not all, cliques and especially cavities of different orders from the neuronal network.
Artificial intelligence, on the other hand, relies on artificial neural networks inspired by the brain neuronal network 10 , including recurrent neural networks, convolutional neural networks, Hopfield neural networks, etc.Now, given the recent discovery of higher-order cliques and cavities in the brain, the question is how to further develop artificial intelligence to an even higher level by utilizing all the new knowledge about the brain topology.Notably, an effective neuronal network construction was recently proposed by a research team from the Massachusetts Institute of Technology, inspired by the real structure of neuronal network of the C. elegans 11 .
It is important to understand how the brain stores information, learns new knowledge and reacts to external stimuli.It is also essential to understand how the brain adaptively creates topological connections and computing patterns.All these tasks depend on in-depth studies of the brain neuronal network.Recently, the Brain Initiative project of USA 12 , the Human Brain project of EU 13 and the China Brain project 14 have been established to take such big challenges.
Many renowned mathematicians had contributed a lot of fundamental work to related subjects, such as Euler characteristic number, Betti number, groups of Abel and Galois, higher-order Laplacian matrices, as well as Euler-Poincaré formula and homology.This also demonstrates the importance of studying cliques and cavities for a further development of network science.In addition, the advance from pairwise interactions to higher-order interactions in complex system dynamics requires the knowledge of higher-order cliques and cavities of networks 15 .The numbers of zero eigenvalues of higher-order Hodge-Laplacian matrices are equal to the corresponding Betti numbers, while their associate eigenvectors are closely related to higher-order cavities 16 .
Motivated by all the above observations, this paper studies the fundamental issue of computability of a complex network, based on which the investigation continues to find higher-order cliques and their Euler characteristic number, as well as all the Betti numbers and higher-order cavities.The proposed approach starts from  -core decomposition 17 and, through finding cliques of different orders, performs a sequence of computations on the ranks of the corresponding boundary matrices, to obtain all the Betti numbers.To that end, an optimized algorithm is developed for finding higherorder cavities.Finally, the optimized algorithm is applied to computing the neuronal network of C. elegans from a typical dataset, identifying its cliques and cavities of

Results
For computable undirected networks, the proposed approach is able to find all higher-order cliques, thereby obtaining the Euler characteristic number and all Betti numbers as well as some cavities of different orders.These can provide global information for understanding and analyzing the relationships between topologies and functions of various complex networks such as the brain neuronal network.

Computable Networks
For undirected networks, the cliques and their numbers of different orders in a network are both important sub-networks and parameters for analysis and computation.A simple example is shown in Figure 1, from which it is easy to find all cliques and their numbers of different orders.For a given general large-scale complex network, however, finding all cliques of different orders is never an easy task.In fact, even just searching for a maximum clique (namely, a clique with the largest possible number of nodes) from a large network is a computationally NP-complete problem 18 .It is well known that to find all cliques of a large-scale undirected network, especially when the network is dense, the number of cliques are huge and will increase exponentially as the network size becomes larger.For example, in the real USair, Jazz and Yeast networks 19 , if the number of cliques is limited to not more than 10 7 to be computable on a personal computer, the orders of the cliques are found to be only 9, 6 and 4, respectively, as summarized in Table 1.For these three real-world examples, it becomes impossible to compute the numbers of their higher-order cliques.

arXiv2101.00536 accepted by Communications Physics in October 2021
It is noticed that, even for large and dense networks, k-core decomposition can be used to efficiently determine their cells (layers), where the k-th cell has all nodes with degrees at least k, and the kernel of the network has the largest coreness value and is very dense.Therefore, the largest coreness value kmax can be used to estimate the order of a maximum clique.For this reason,  -core decomposition is used to determine whether a given network is computable or not, subject to the available limited computing resources.If the computing resources allow the number of cliques, with the first several lowest orders, to be no more than 10 7 to be computable, then the maximum coreness value should not be bigger than 30.In this paper, this coreness value threshold is set to kmax = 25, as detailed in Supplementary Note 1.

Clique-Searching Method
The Bron-Kerbosch algorithm 20 is a popular scheme for finding all cliques of an undirected graph, while the Hasse-diagram algorithm 9 is useful for finding all cliques of a directed network.For computable networks, we propose a method with an algorithm, named common-neighbors scheme, which can find all cliques of different orders and the associate Euler characteristic number.
In a network, the average degree is denoted by 〈〉 and the number of edges by || .The computational complexity of the proposed algorithm is estimated to be (||〈〉) for finding all 2-cliques.
(2) From the above list, edges are generated in increasing order of node indexes: Then, the number of edges in 1-clique is computed, yielding: m1 = 26.

Computing Betti Numbers
The above-obtained cliques of various orders can be used to generate boundary matrices Bk,  = 1,2, … .Here, B1 is the node-edge matrix, in which an element is 1 if the node is on the corresponding edge; otherwise, it is 0. Similarly, B2 is the edge-face matrix, in which an element is 1 if the edge is on the corresponding face (triangle); otherwise, it is 0. All higher-order boundary matrices Bk are generated in the same way.It is straightforward to compute the rank rk of matrices Bk for  = 1,2, … , using linear row-column operations in the binary field, following the binary operation rules, namely 1 + 1 = 0, 1 + 0 = 1, 0 + 1 = 1, 0 + 0 = 0.Then, the Betti numbers 5 can be obtained by using formulas   =   −   −  +1 , for  = 1,2, … .
One can also calculate the numbers of zero eigenvalues of higher-order Hodge-Laplacian matrices, so as to find the Betti numbers.To do so, some algebraic topology rules are needed to form oriented cliques 16 .
As an example, the left-hand sub-network shown in Figure 1 is discussed, which has Nodes 1-8.The node-edge boundary matrix B1 of rank  1 = 7 is formed as follows, where the shaded row is linearly dependent on the other rows: arXiv2101.00536accepted by Communications Physics in October 2021 Moreover, its edge-face boundary matrix of rank  2 = 4 is obtained as follows, where the shaded column is linearly dependent on the other columns: Table 2 summarizes all calculation results for the sub-network on the left-hand side of the network shown in Figure 1, in which the Euler characteristic number and Betti numbers satisfy the Euler-Poincaré formula 5   =  0 −  1 +  2 −  3 = 1 + 2 − 1 − 0 = 0

Cavity-Searching Method
The concept of cavity comes from the homology group in algebraic topology.Since a large-scale network has many 1-cycles, for instance the network shown in Figure 1 has hundreds, to facilitate investigation they are classified into equivalent classes.In a network, each 1-cavity belongs to a linearly independent cycle-equivalent class 5 with the total number equal to the Betti number  1 .It is relatively easy to understand 1-cavity, which has boundary edges consisting of 1-cliques.Imagination is needed to understand higher-order cavities, which have boundary consisting of some higher-order cliques of the same order.In the literature, only one 2-cavity consisting of 8 triangles is found and reported 8 .In this paper, we found all possible smallest cavities and list them up to order 11 in Supplementary Note 2.
Since a cavity belongs to a cycle-equivalent class, only one representative from the class with the shortest length (namely, the smallest number of cliques) is chosen for further discussion.To find the smallest one, however, optimization is needed.

Finding Cavity-Generating Cliques
The procedure is as follows.
First, a maximum linearly independent group of column vectors is selected from arXiv2101.00536accepted by Communications Physics in October 2021 the boundary matrix Bk, used as the minimum th-order spanning tree, which consists of rk k-cliques, where rk is the rank value of the boundary matrix Bk discussed above.Then, linear row-column binary operations are performed to reduce it to be in a simplest form.In every row of the resultant matrix, the column index of the first nonzero element is used as the index of the k-clique in the spanning tree.As an example, for the subnetwork with Nodes 1-8 on the left-hand side of the network shown in Figure 1, the bold-faced 1's in matrix  1 correspond to the columns indicated by (1,2), (1,3), (1,4), (1,5), (3,6), (3,8), (6,7) shown at the top of the matrix, which constitute a spanning tree in the sub-network with Nodes 1-8 in Figure 1.It should be noted that the minimum th-order spanning trees are not unique in general.
Next, the maximum group of linearly independent column vectors from boundary matrix Bk+1 is found, obtaining rk+1 (k+1)-cliques as a group of linearly independent cliques.From this group, one continues to search for a -clique that belongs to the boundary of the (k+1)-clique but does not belong to the th-order spanning tree.In other words, the rk+1 k-cliques should not be a k-clique in the minimum spanning tree.If this cannot be found, then one can choose another maximum group of linearly independent column vectors from boundary matrix Bk+1 and search again.In this way, rk+1 (k+1)-cliques are found.As an example, for the sub-network with Nodes 1-8 in Figure 1, the bold-faced 1's in the boundary matrix  2 correspond to the edges indicated by (2, 3), (2,4), (2,5), (3,4).These are edges on the left-hand side of the boundary matrix Bk+1, which are different from the cliques in the spanning tree.
Then, the formula of Betti numbers,   =   −   −  +1 , is used for computing, which is the number of linearly independent k-cavities.The task now is to find the rest k-cliques that are not in the kth-order minimum spanning tree and also not on the boundaries of linearly independent ( + 1)-cliques.These are called cavity-generating cliques.In the same sub-network example, there is only one such clique: (7,8).On the minimum spanning tree, after including all linearly independent boundaries, adding any cavity-generating k-clique will create a linearly independent k-cavity; in this example, the created one is the 1-cavity (3,6,7,8).

Searching Cavities by 0-1 Programming
Every cavity-generating k-clique corresponds to at least one k-cavity.However, a cavity-generating  -clique may correspond to several cavities of different lengths, where the length is equal to the number of cliques.Since a cavity is a linearly independent cycle with the smallest number of cliques, the task of searching for a cavity can be reformulated as a 0-1 programming problem.
As shown above, there are mk k-cliques, Bk is the boundary matrix between a ( − 1)-clique and a k-clique, Bk+1 is the boundary matrix between a k-clique and a ( + 1)clique, and a -cavity consists of some -cliques.In the following, the vector space arXiv2101.00536accepted by Communications Physics in October 2021 formed by k-cliques is denoted by Ck.A k-cavity can be expressed as  = ( 1 ,  2 , … ,    ) ∈   , in which each component   takes value 1 or 0, where 1 represents a -clique with index  in the cavity, while 0 means no such cliques there.Now, a cavity-generating k-clique with index  is taken from all k-cliques and a vector  = (1, 1, … , 1)  is introduced.Then, the problem of searching for a k-cavity becomes the following optimization problem, which is to be solved for a nonzero solution: Here, the first constraint means that the cavity comes from the cavity-generating kclique with index .The second constraint implies that the cavity is a k-cycle, namely the boundaries of  -cliques that form the cavity should appear in pairs.The third constraint shows that the k-cavity to be found will not be a linear representation of the ( + 1)-cliques, where   indicates that the operations are performed in the binary field, which can avoid generating false cavities.

Cliques and Cavities of C. elegans
For a dataset of C. elegans with 297 neurons and 2148 synapses 21 , all cliques and some cavities are obtained here by using the above-described techniques and algorithms, which is compared to the homogeneous Erdös-Rényi (ER) random network with the same numbers of nodes and edges.The results are shown in Figure 2 and Table 3.

Discussions
For a given directed network, how can one analyze its higher-order cliques and cavities?
In [9], a Hasse algorithm was designed to find all directed cliques.However, both concepts of cycle and cavity were not precisely defined there.For an undirected network, the length of a cavity, namely the number of cliques that compose it, is longer than the lengths of the cliques (1-order higher cavity) as a cycle.For example, an undirected triangle of length 3 not only is a 2-clique but also is a 1-cycle, while 1-cavity at least is a quadrangle of length 4. For a directed network, however, this may not be true.For instance, the smallest directed 1-cavity could be composed by two reversely directed edges between two nodes, where both edges have length 2. But, a directed 2clique could be a directed triangle of length 3.This shows the extreme complexity of directed cavities, which will be a research topic for future investigation.
It should be noted that the key technique used in this paper is to examine various combinations of cliques and cavities, which differs from the studies based on node degrees in the current literature, where the focus is on statistical rather than topological properties of the network.After comparing the neuronal network of a C. elegans to a random network, it was found that they are very different regarding the numbers of cliques and cavities.From the perspective of brain science, various combinations of arXiv2101.00536accepted by Communications Physics in October 2021 higher-order topological components such as cliques and cavities are of extreme importance, without which it is very difficult or even impossible to understand and explain the functional complexity of the brain.In fact, this provides reasonable supports to many recent studies on the brain 12,13,14 .
The intrinsic combination of cliques and cavities also brings some unexpected problems to programming the proposed optimization algorithm.For example, because a minimum spanning tree of a network is not unique, the algorithm may not produce the expected results when searching for cavities.Efforts have been made to determine the information of cavities by eigenvectors corresponding to zero eigenvalues of higherorder Hodge-Laplacian matrices.However, similar non-uniqueness problem occurred in finding eigenvectors, demonstrating the extreme complexity of the clique and cavitysearching problems.

Method and Algorithms
To solve the above optimization problem (Eq.( 1)) with   k-cavities is difficult due to the third constraint in the optimization, and because the minimum spanning tree is not unique.As a remedy, the optimization problem is separated into two parts.
(3) can be transformed to the following 0-1 programming problem for a linear system of equations: Equation ( 4) can be solved by using Matlab toolbox for 0-1 linear system of equations, and the algorithm is described as follows.Output：specific cycles x*

Finding all cavities
The third constraint in the optimization problems (Eqs.( 3) and ( 4)) is needed to check, so as to identify which cycles found by Algorithm 1 are -cavities and then to determine their cavity-generating cliques.For cavity-generating cliques not included in the list in Algorithm 1, or if there are many cavity-generating cliques appearing in the same cavity, one has to search new cycles obtained by Algorithm 1 again by increasing the lengths of the 2 k-1 cliques.
Summarizing the above steps gives the following cavity-searching algorithm: Correspondence and requests for materials should be addressed to DS and GC.If the number of cliques is limited to not more than 10 7 to be computable on a personal computer, the orders of the cliques are found to be only 9, 6 and 4, respectively, as indicated by the three bold numbers in Table 1.For these three real-world examples, it becomes impossible to compute the numbers of their higher-order cliques.
It is noted that in any network of a fixed size, except trees, its number of cliques of different orders has a peak value as the order number increases, namely it is first increasing and then decreasing.For instance, for a fully-connected network of size , the numbers of its -th order cliques are: , where it peaks at the (  Given limited computational resources, how can one determine if a given network is computable?For relatively large-scale and dense networks,  -core decomposition 17 can provide an estimate.The -core technique is used to determine the cells of different orders, where all nodes on the -sell have degrees larger than or equal to .The cell with the largest coreness value is the core of the network, in which the connection is dense; therefore, it can be used to measure the order of the largest clique in the network.For example, in the Jazz network, the 29th cell has 30 nodes and 435 edges, implying that this is a fully-connected network; therefore, its core is a 29-clique, which also shows the order of the largest clique in the Jazz network.In the USAir network, the largest coreness value is 26, where the core has 35 nodes and 539 edges; therefore, its largest clique is a 21-clique, which is close to the true coreness value 26.For the Yeast network, its core has 64 nodes and 1623 edges, which is known to have the largest coreness value 40; although the computation here reaches up to 6-cliques, it can be seen that the order of the largest clique would not be small.The detailed coreness values of USAir, Jazz and Yeast are summarized in Table SI-2, where mi is the coreness value of the -core,  = 0, 1, 2, … , 29.

Yeast-40
The above analysis shows that, given the limited computational resources today, if the number of -cliques is up to the order of 10 7 then the largest coreness value of a network should not be larger than 30, or even should not be larger than 25 to be feasible on a personal computer.The third 3-cavity with 8 nodes (171,13,3,195,185,119,118,173) is surrounded by the following 16 3-cliques: (171,13,3,195) (13,3,195,185) (3,195,185,119) arXiv2101.00536accepted by Communications Physics in October 2021 different orders.
arXiv2101.00536accepted by Communications Physics in October 2021 arXiv2101.00536accepted by Communications Physics in October 2021

Figure 2 .
Figure 2. Cliques and Betti numbers.The number of cliques and the Betti numbers for the C. elegans neuronal network versus the Erdös-Rényi (ER) random network arXiv2101.00536accepted by Communications Physics in October 2021
arXiv2101.00536accepted by Communications Physics in October 2021

For 2 -
cavities, only those with eight nodes are listed here.A total of 4 have been found, which are divided into two types, as shown in Figure SI-2.

Table 1 .
Three real networks: their sizes (number of nodes |N| and number of edges |E|), maximum coreness kmax, maximum number of cliques cmax, number of -cliques   , and the maximum order k of cliques with their numbers mk < 10 7

Table 2 .
Computational results for the sub-network on the left-hand side of the network shown in Figure1.Here,   is the number of -cliques,   is the rank of the boundary matrix   , and   is the th Betti number

Table 3 .
Euler characteristic number, Betti numbers and the Euler-Poincaré formula for the C. elegans network and the Erdös-Rényi (ER) network

Table SI - 1
Number of cliques of different orders in three real networks

Table SI - 2
Coreness values of three real networks