Network clique cover approximation to analyze complex contagions through group interactions

Contagion processes have been proven to fundamentally depend on the structural properties of the interaction networks conveying them. Many real networked systems are characterized by clustered substructures representing either collections of all-to-all pair-wise interactions (cliques) and/or group interactions, involving many of their members at once. In this work, focusing on interaction structures represented as simplicial complexes, we present a discrete-time microscopic model of complex contagion for a susceptible-infected-susceptible dynamics. Introducing a particular edge clique cover and a heuristic to find it, the model accounts for the higher-order dynamical correlations among the members of the substructures (cliques/simplices). The analytical computation of the critical point reveals that higher-order correlations are responsible for its dependence on the higher-order couplings. While such dependence eludes any mean-field model, the possibility of a bi-stable region is extended to structured populations.


Introduction
Epidemics 1 , rumor spreading 2 , adoption 3 and opinion dynamics 4 are well-known manifestations of real-world contagion processes.
One of the most remarkable achievements of network science has been the characterization of the dependence of contagion processes on the structural properties of the interaction network through which they spread 5;6;7;8 . In particular, many real-world networks of interest, especially the social ones, boast a clustered and cycles-rich structure. Triadic closures are indeed renowned to be a distinguishing feature of social systems 9;10 , together with the presence of larger communities in which every element is connected to (nearly) any other element in them 11 . Households and workplaces are common contact-based examples of that, while online social communities and groups in messaging apps are information-based ones. In the language of graph theory, such all-to-all substructures are called cliques. Specifically, a n-clique (or clique of order n) * sergio.gomez@urv.cat consists of n nodes all pair-wisely connected to each other, i.e., a complete subgraph of n nodes.
Apart from a collection of dyadic interactions, cliques can be also regarded as the pair-wise projection of richer substructures representing group interactions (also known as 'higher-order' interactions), involving more than two agents (nodes) at once. In fact, from few years on, a growing branch of literature dedicated to the study of various dynamical processes involving group interactions 12;13;14;15;16;17;18 , has been showing that such interactions can heavily affect the dynamics, and neglecting them can therefore lead to wrong predictions.
The interaction patterns can be properly formalized by means of hypergraphs 19;20 , a generalization of graphs in which the nodes can be grouped in hyperedges of any order -not only in pairs. Specifically, here we make use of simplicial complexes 21 , defined as sets of faces, whereby a face is a set of nodes with the hereditary property, stating that each of its subsets is also a face of the simplicial complex (SC). Associating a group interaction to each face, a SC then represents the entire set of interactions among the nodes in it.
Going to the dynamics, we are interested in complex contagion processes 22;23;24;25;26 , in which the outcome of a potentially spreading interaction depends on how many different contagious agents take part to it, and not -as in simple contagions-only on the strength of the interaction. The presence of complex contagion mechanisms has been assessed in various contexts 27 , but largely in online social networks and forums 28;29 , thanks to their unique data traceability.
Several studies have recently appeared showing the dynamical effects of group interactions on complex contagion.
They provide either a qualitative understanding by means of mean-field approximations 30;31;32 , finding such interactions as responsible for critical mass effects; or a more quantitative one, reinforcing the previous qualitative findings, but for very particular hypergraphs 33;34;35 . All the studies considered continuous-time dynamics and, noteworthy, uncorrelated nodes' states. For the case of 2-dimensional SCs (i.e., consisting of edges and triangles), Matamalas et al. 36 provided a microscopic discrete-time model accounting for first-order (two-nodes) and, only partially, second-order (three-nodes) dynamical correlations. However, as shown afterwards, the same subtle inconsistencies that make the model applicable to any 2-dimensional SC, also impede the computation of the critical point.
In this work, looking for a discrete-time model holding for SCs of any dimension, we reveal that fixing those inconsistencies puts a topological condition on the interaction structure the model can describe, namely that two group interactions can only share one node. In order to satisfy this constraint, we introduce the notion of edge-disjoint edge clique cover (EECC) of a SC and a heuristic to find it. By means of our Microscopic Epidemic Clique Equations (MECLE), we provide a two-fold extension of the existent discrete-time complex contagion models, accounting for higher-order correlations and group interactions (if any) at the same time. We prove those dynamical correlations to be essential to describe how the critical point depends on the higher-order couplings. Lastly, different approaches to treat group interactions sharing multiple nodes are also discussed, while hinting at easy adaptations of ours.

Results and Discussion
The over-counting problem. Let us first introduce all the basic notions needed to state the problem. We start from the interaction structures we used, simplicial complexes. A simplicial complex K is a subset of the power-set 2 V of a vertex set V , endowed with the hereditary property: given f ∈ K and f ⊆ f , then f ∈ K. Note that we can neglect the empty set from 2 V , for it does not have any practical interest here. The elements of K are called faces, and a n-dimensional face (or n-face) is a subset of V made of n + 1 nodes. Given a face f , its power-set 2 f is called a simplex. If f is a n-face, 2 f is a n-simplex. Giving a geometrical interpretation to K, a n-simplex is also the n-dimensional polytope being the convex closure of its n + 1 vertices. For example, given V = {i, j, k} and K = 2 V , {i} is a 0-simplex (a point), {{i, j}, {i}, {j}} is a 1-simplex (a segment), {{i, j, k}, {i, j}, {i, k}, {j, k}, {i}, {j}, {k}} is a 2-simplex (a triangle).
If a simplex in K is not included in any other simplex in K, then it is said to be maximal. If d is the maximum dimension of the faces in K, then K is d-dimensional and is called simplicial d-complex. The underlying graph K (1) of K is the graph induced by the 1-simplices in K, i.e., the graph whose node set is V and whose edge set is the set of all the 1-faces in K. Consequently, a n-simplex induces a (n + 1)-clique, made of n+1 2 edges (i.e., 1-faces), in K (1) . If a clique is not part of a larger one, it is said to be maximal. Finally, K is said to be q-connected if, given any two simplices s 1 , s 2 ⊂ K, there exists a sequence of simplices connecting s 1 and s 2 such that any two adjacent simplices of the sequence have (at least) a q-face in common; if K is connected, then K (1) is 0-connected -as any other connected graph.
To identify whether a n-clique c in K (1) corresponds to a (n − 1)-simplex in K or it is just a n-clique also in K, we introduce a binary variable g, the group classifier, defined to be 1 or 0, respectively; by which c is regarded as a (g, n)-clique. Since a 1-simplex is equivalent to a 2-clique, we choose to assign g = 0 to any 2-clique, so that n 3 when g = 1. From now on, unless explicitly specified, we refer to the g-classified cliques in K (1) simply as 'cliques'. These are the building blocks of our description.
Going to the dynamics of interest, we adopt, without loss of generality, a standard epidemiological terminology.
We consider a discrete-time susceptible-infected-susceptible (SIS) dynamics on a SC. Let β (n) be the probability with which a susceptible node in a group of n + 1 nodes (a n-face of the SC) gets infected when all the other n nodes are infected; and µ be the probability with which an infected node recovers. Due to the hereditary property, if n nodes are infected, and thus can pass the infection as a group of n nodes, then also any subset of that group can: for any k n, there are n k subsets passing the infection with probability β (k) as a group of k nodes. In other words, to each state configuration of the n nodes (i.e., which of them are infected) corresponds a set of concurrent channels of infections, and these channels are correlated. Compared to existing models 36 , we take a first step forward by accounting for such correlation, allowing us to compute the critical point. Then, when computing the probability for a node to get infected within a group, the contribution coming from each state configuration of the group consists of a product over the concurrent channels that configuration admits. Additionally, each contribution is weighted by the probability for the group to be in that configuration. Now, given a clique, no matter whether it conveys (g = 1) or not (g = 0) group interactions, we want to account also for the dynamical correlations among the states of the nodes in it. However, if s of such cliques have m 2 nodes in common, the contribution to the infection of one of the m nodes, coming from the others m−1, is counted s times instead of once. This is because the m − 1 nodes would appear in the state configuration probabilities associated with each of the s cliques. To note that if the cliques are just edges, then necessarily m = 1, meaning that the over-counting is excluded in pair-wise models.
Edge-disjoint Edge Clique Cover. To avoid the over-counting, we aim at covering all the edges of K (1) by means of a set of cliques such that any two of them share at most one node, giving what we call here an edge-disjoint edge clique cover. Cliques sharing more than one node are consequently decomposed in lower-order cliques. This comes with no essential repercussions when the decomposed cliques have all g = 0. Otherwise, some group interactions would be ignored, implying the model to strictly apply for group interactions sharing at most one node. Interestingly, as shown later on, the model remains reliable when the over-counted interactions are relatively scarce.
Covering the structure by cliques only, we are able to give a unique equation holding for any order n; an impossible task via generic, less symmetric substructures. . The SC, shown on the left, consists of fourteen nodes (0-simplices), identified via letters, connected by one 3-simplex, one 2-simplex, and fourteen 1-simplices. Grey areas indicate r-simplices with r 2, including r + 1 nodes each. The EECC of the SC is shown on the right, where colored and dotted areas are used to visualize, respectively, the (1, r)-cliques and (0, r)-cliques in it, with colors carrying no specific meanings. The SC is decomposed in: one (1,4) Furthermore, since we want to capture as much correlations and group interactions as possible, we want the cover to consist of the least possible number of cliques. Finding such minimal set of edge-disjoint cliques is closely related to the edge clique cover problem, known to be NP-complete ? . Heuristics are thus necessary to estimate the solution in large graphs. For convenience, from now on, we reserve the acronym EECC to sets which are solutions of the problem.
If all the maximal cliques in K (1) are edge-disjoint, then K (1) admits a unique EECC, simply given by the set of the maximal cliques in it. Otherwise, K (1) generally admits multiple EECCs. See Fig. 1 for illustration.
To estimate an EECC, we propose the following greedy heuristic (the best among several options we conceived for this task). Given a graph G, the heuristic proceeds as follows: 1. Find the set C of all the maximal cliques in G 2. Include in the EECC and remove from C all the elements in C that do not share edges with other elements in C 3. While C is not empty (a) For every maximal clique c ∈ C, compute the score rc, defined as the fraction of edges that c shares with the other elements in C (b) Consider the elements of C with the lowest score; include in the EECC and remove from C (one, randomly chosen, of) the element(s) of highest order among them Noteworthy, when dealing with highly modular structures -as those representing many real social systems-in which communities of nodes are loosely connected among them, the search for an (edge-disjoint) edge clique cover can be speeded up. Indeed, each time two regions of the structure are joined by bridging cliques (which are evidently maximal), the problem of finding the optimal cover of the whole structure reduces to that of finding it in each of the two, smaller regions. Now, whenever the maximal cliques in K (1) are not all edge-disjoint, some cliques are forced to be decomposed in sub-cliques during step 4.
Since decomposing a (1, ·)-clique in (0-connected) sub-cliques means also neglecting some group interactions, whenever a (0, ·)-clique and a (1, ·)-clique are not edge-disjoint, we prefer to include the latter in the EECC. This additional difficulty disappears whenever the SC is 0-connected or when it has dimension 1 (i.e., when it is a graph). Thus, being K a SC and K (1) its g-classified underlying graph, a EECC of K is constructed as follows: I. Consider the subgraph G1 ⊆ K (1) induced by the nodes in the maximal (1, ·)-cliques in K. Find a EECC of G1; let us call S(K) the resulting EECC II. Consider the subgraph G0 = K (1) \G1. Find a EECC of G0; let us call it C(K) III. D(K) ≡ (S(K), C(K)) is the estimated EECC of K A basic question is about the dependence of the prediction made by the model when different EECCs are estimated for a given structure. Indeed, while the effectiveness of the heuristic ensures a better performance of the model, its robustness is an indispensable quality, as we look for a reliable model giving certain results when fed up with a certain structure, of which a minimal EECC is estimated. In Supplementary Note 1 and Supplementary Figs. 1 and 2, our model is shown to be robust under EECC variability. Therefore, given a SC, it is safe to make use of the first EECC computed for it.
Calling m 1 the maximum order of the cliques we want to include in S(K) (i.e., considering simplices of dimension up to m 1 − 1 in K), m 1 could be smaller than ω 1 , the maximum order of the cliques in G 1 . In such case, when looking for an EECC of G 1 , those maximal cliques in G 1 of order greater than m 1 must be decomposed in edge-disjoint sub-cliques (corresponding to sub-simplices in K) of variable order m ∈ {1, . . . , m 1 }. Clearly, the higher is m 1 , the higher is the order of the group interactions (and of the correlations within them) included in the description. Overall, as long as the proportion of obviated group interactions is small enough, the deviations from the complete dynamics are comparatively negligible; or alternatively, the error made by including non-0-connected simplices is negligible.
In the building of an EECC, different values m 0 2 for the maximum order to be considered for (0, ·)-cliques can also be chosen. The higher is m 0 , the higher is the order of the captured dynamical correlations within the cliques in K. In any case, m 0 ω 0 , being ω 0 the maximum order of the cliques in G 0 .
Summarizing, the couple (m 0 , m 1 ) identifies the considered implementation of the MECLE.
Microscopic Epidemic Clique Equations. Given a (g, n)-clique {i 1 , . . . , i n } in D(K), with n m ≡ max{m 0 , m 1 }, we indicate with P σi 1 ...σi n i1...in,g the joint probability that node i 1 is in the state σ i1 , node i 2 is in the state σ i2 , etc., where {σ i1 , . . . , σ in } ∈ {S, I} n . Besides, with P σi 1 ...σi k−1 σi k+1 ...σi n |σi k i1...i k−1 i k+1 ...in|i k ,g we indicate the conditional probability that nodes i 1 , . . . , i k−1 , i k+1 , . . . , i n are in their respective states σ i1 . . . σ i k−1 σ i k+1 . . . σ in , given that node i k is in the state σ i k . Clearly, the normalization condition must hold: Indicating with {σ i k } k=1,...,n the states at time t and with {σ i k } k=1,...,n those at time t + 1, the MECLE model dynamic equation governing the evolution of the state of a (g, n)-clique {i 1 , . . . , i n } reads where is the transition probability from the starting state {σ i k } k=1,...,n to the arrival state {σ i k } k=1,...,n . It is understood that Φ g is computed at time t. This is expressed as a product over the single-node transition probabilities {φ i k ,g }, given by where 1[p] gives 1 if condition p is fulfilled and 0 otherwise; N I = |{i k=1,...,n |σ i k = I}| is the number of infected nodes in the starting state; w (n−1) {β (s) } is the probability that a susceptible node (i k ) does not get infected within a (g, n)-clique ({i 1 , . . . , i k , . . . , i n }) whose state configuration (N I n − 1) is known, reading w (n−1) and q (r) } is the probability that node i k does not get infected via any of the (g , r + 1)-cliques incident on it, that is where Γ (r) i k ,g indicates the set of r-tuples of indexes corresponding to subsets of r nodes forming a (g , r + 1)-clique with i k , and q (n−1) i k (¬i k ),g coincides with q (n−1) i k ,g expect for excluding the considered (g, n)-clique {i 1 , . . . , i k , . . . , i n } from the product.
Finally, the products in Eq. (4) are performed over the couples {(g , r)} such that 2 r m 0 for g = 0, and 3 r m 1 for g = 1.
In Eq. (3) the single-node transition probabilities, φ, are treated as independent from each other within the time step. This merely derives from the implicit assumption that all the events within a given time step are simultaneous, and therefore not causally related. Simply put, the state of a node at time step t + 1 only depends on its state and on the states of its neighbors at the previous time step t, as in any markovian model. Importantly, to get the expression for q (r) i k ,g we have adopted the following closure (the classifier g is assigned only to the cliques in D(K)). Therefore, by definition of conditional probability, the probability P is treated as an independent dynamical unit of the system, accounting for the correlations among the states of the nodes it includes. In the form of a generalization of the classical pair approximation 37 , the dynamical correlations between two adjacent cliques are conveyed by the state probability of the node they share (in the denominator to avoid double counting). In this regard, the larger the adjacent cliques, the smaller the expected influence of the state of the shared node on all the other ones. For this reason, the presented closure -and consequently the MECLE-is expected to gain further accuracy with the increasing of the order of the cliques.
The probability P I i for the single node i of being infected is computed as a marginal probability from any (g, n)-clique including i, as in which {¬i} and {σ ¬i } indicate, respectively, the set of the other n − 1 nodes in the clique and their states. Equation (8) is, consistently, also found taking n = 1 in Eq.
(2) (w (0) i,g = 1). The average value ρ of the single node probabilities, is the epidemic prevalence, which is also the order parameter of the system. It is important to note that the used closure, Eq. (7), apart from preserving the n-node state correlations of a considered (g, n)-clique, is the only one making feasible the marginalization of Eq. (2) to get Eqs. (8)- (9) and, consequently, get the expression for the epidemic threshold, through Eqs. (15) to (19) in Methods.
At any time step t, Eq. (8) yields one constraint for each of the n nodes in a (·, n)-clique, leaving 2 n − n − 1 independent state probabilities to be determined, one being fixed by the normalization. Therefore, if K has N nodes and D(K) consists of C (n) (·, n)-cliques, n m, the MECLE is defined by a system of N + Finally, we can frame the existent models in the (m 0 , m 1 ) notation.
As m 1 = 2 implies S(K) = ∅, {(m 0 , 2)} 2 m0 ω0 is the class of models accounting for correlations within cliques of order up to m 0 on graphs (simple contagion). In particular, (2,2) gives the Epidemic Link Equations (ELE) model 38 . The Microscopic Markov Chain Approach (MMCA) 39 is then recovered from (2, 2) by assuming that P IS ji = P I j P S i (i.e., P I|S j|i = P I j in Eq. (6)). Considering group interactions, the simplicial ELE and MMCA 36 fall outside the MECLE class of models, for they do not account for the correlations among the concurrent channels of infection within simplices. This, alongside the consideration of higher-order dynamical correlations among nodes' states, is precisely the refinement made here.

Results for simplicial 2-complexes.
We apply here the developed formalism to the case of simplicial 2-complexes. As we are only using synthetic structures, the SCs are constructed from some graph (the future underlying graph) converting into (1, 3)-cliques (i.e., 2-faces) a fraction p of the 3-cliques allowed by the EECC of the graph, while considering as (0, 3)-cliques the remaining fraction 1 − p . Specifically, if all the 3-cliques are converted (p = 1.0), the resulting SCs are the clique complexes of the respective original graphs. To better appreciate the improvement made here, we mainly show results for clique complexes, although notable improvements can be generally found for any value of p depending on the used structure. Conveniently, if p is not specified then the structure is understood to be a clique complex. We identify a such generated SC adding 'SC' to the name of the graph model used for it: a 'Dorogovtsev-Mendes SC', for instance, is built upon a graph given by the Dorogovtsev-Mendes generative model 40 . Specifically, the 'random SC' is obtained as follows: first we generate a simplicial 2-complex through the random simplicial complex model 30 ; then we consider its underlying graph and compute its EECC; finally, the random SC is got, as before, by converting 3-cliques in 2-faces, in this way ensuring the SC to be 0-connected.
In Figure 2 we compare the prevalence ρ obtained using Monte Carlo (MC) simulations (see Methods for details), the MECLE model, and the other discrete-time markovian models, i.e., the simplicial ELE and MMCA models 36 , for different 0-connected clique complexes. For all structures, the improvement brought by the MECLE with respect to the simplicial ELE is substantial, in both predicting the epidemic prevalence and, even more, locating the critical point, for which the relative errors ε β (1) are reported. Note that the predictions is expected to improve for increasing order of the cliques, since the precision of the used closure, Eq. (7), grows with it as well, especially in the case of populations arranged in lowly inter-connected dense communities.
We then leverage the prominent jump of the discontinuous transition found for random SCs, to illustrate, in Fig. 3, the existence of a hysteresis cycle, enclosing a bi-stable region. This is in line with recent findings 30;31;33;34;35;36 . Again, the simplicial ELE is outperformed by the MECLE, especially in predicting the 'backward' curves, as the overestimation made by the former (see next subsection) is emphasized in that case.
More interestingly, our analysis clearly shows that models with uncorrelated nodes' states generally fail to pinpoint the ('forward') epidemic threshold at even a qualitative level. Indeed, by neglecting those dynamical correlations, Eq. (22) (see Methods) predicts β (1) cr , the value of β (1) at which the epidemic-free state becomes unstable, to be independent from the values of the higher-order infection probabilities, {β (s) } s>1 . In particular, in the case of 2-dimensional simplicial complexes, increasing enough β (2) results in the appearance of a bi-stable region, but β (1) cr does not change in those models 30;31 . On the contrary, when the actual structure of the interactions is retained, increasing β (2) does lead β (1) cr to decrease, as arises from MC simulations and correctly predicted by the MECLE. We show this dependence in Fig. 4, while it can also be grasped by comparing panels (c) and (d) in Fig. 2.
The calculation of the critical point in the MECLE (see Methods), together with the evidences coming from the MC simulations, reveal the necessity of accounting  for higher-order dynamical correlations. An interaction (infection) of order s, with coupling (infection probability) β (s) , requires s nodes in a simplex to be active (infected). Therefore, in order to preserve its contribution near the critical point, the state probability with s active nodes must not be neglected -as done instead in models with uncorrelated nodes' states. In other words, near the epidemic threshold, higher-order interactions rely exclusively on comparatively higher-order correlations, so that their effect when varying the higher-order couplings, {β (s) } s>1 , is observed only if such correlations are preserved. Accordingly, such dependence eludes the MF and MMCA approximations of the system. Besides, as second-order correlations within triangles are partially accounted for in the simplicial ELE, the latter predicts this dependence, but not as accurately as MECLE does. Finally, in the endemic state and away from the threshold, low-order correlations suffice to sustain higher-order interactions. Nevertheless, the higher the order of the accounted correlations, the more accurate is the quantification of the prevalence.
While a closed equation for β (1) cr is generally inaccessible for complex interaction structures, in Supplementary Note 3 and Supplementary Figs. 3 and 4, we explicitly derive the monotonous decrease of β (1) cr with respect to the higher-order infection probabilities for selected symmetrical structures. In particular, we study regular SCs, like the herein studied periodic triangular SC, and SCs built upon Friendship graphs, taken as proxies for homogeneous and heterogeneous structures, respectively. We prove that β (1) cr decays with β (2) whatever the system size N , and that the dependence considerably increases with the structural heterogeneity.
It should be noted that the herein observed and proved dependence of the critical point on the higher-order couplings, already showed up in previously reported numerical simulations, especially in SCs built from real data 30 . However, it went seemingly overlooked,  the mean number of (g, n + 1)-cliques incident on a node, and triangle infection probability β (2) = 0.25; the relative errors in locating the epidemic threshold are ε for. β (1) ≈ 0.14 and ε back. β (1) ≈ 0.10 for MECLE, and ε for.
Dependence of the epidemic threshold on the triangle infection probability β (2) . β (1) cr , computed via Eqs. (15) to (19) in Methods, is shown against β (2) for a Dorogovtsev-Mendes simplicial complex withk (0,1) = 1.10,k (0,2) = 0.00 and k (1,2) = 1.45, beingk (g,r) the mean number of (g, n + 1)-cliques incident on a node. Note that Eq. (22), disregarding dynamical correlations, wrongly predicts β (2) . eventually leading to claim 41 that only pair-wise interactions govern the value of β (1) cr . Even though we have considered a discrete-time dynamic, instead of the continuous-time' used in those works upon which that claim is based on, we predict the qualitative shift brought by our analysis to hold in continuous-time as well. The continuous-time limit is here recovered by neglecting all those terms in the equations appearing as second or greater powers of any combination of the infection probabilities {β (s) } and the recovery probability µ, i.e., allowing only single-node state changes. Still, the linear terms proportional to any of the infection probabilities show up in Eq. 19, thus contributing to the value of the critical point. In Supplementary Note 4, we derive the continuous-time limit of the MECLE equations for simplicial 2-complexes. Taking the example of a Dorogovtsev-Mendes SC, the predicted dependence of the critical point on β (2) is shown in Supplementary Fig. 5.
Lastly, considering clique complexes with some fraction p s of edges shared by two or more 2-faces, we have studied how the MECLE behaves out of the bounds of the 0-connectedness. As shown in Fig. 5, when p s is low enough, it performs comparably to or still better than the simplicial ELE. As expected, the value of p s above which the MECLE performs worse turns out to specifically depend on the structure, preventing us to find a simple relation. Precisely, that value resulted to be around 0.06 for Dorogovtsev-Mendes SCs and around 0.04 for RSCs.
Possible approaches for group interactions sharing multiple nodes. In discrete-time models, allowing groups to share two or more nodes comes with the drawback of impeding the computation of the epidemic threshold, as shown in Methods. Forgoing the latter, let us discuss some approaches which may still come in handy.
The first option is the simplicial ELE model 36 . It describes a SIS dynamics in simplicial 2-complexes by means of a pair approximation that, via a specific triangle closure, is able to partially account for second-order correlations. However, considering the multiple channels of infection within a simplex as mutually uncorrelated, it is not possible to marginalize the 1-faces equations to get the single nodes one, hence neither the epidemic threshold. To further elucidate the effects of disregarding the correlations among the concurrent infections within a simplex, let us consider the toy example of a triangle upon the set of nodes {i, j, k}. In the MECLE model, the probability for i to be infected by j and k, reads

II|S jk|i
In particular, taking β (1) = 1, the dependence on β (2) correctly disappears: no matter the infectiousness β (2) of the couple {j, k}, the probability for node i to be infected equals the probability that at least one node between j and k is infected. On the contrary, the simplicial ELE neglects that correlation, and the above infection probability becomes where P II|S jk|i is expressed in terms of edge probabilities. Here, taking β (1) = 1, there is still a dependence on β (2) , with the effect of both overestimating the probability of infection, hence the prevalence, and wrongly anticipating the position of the epidemic threshold. Evidently, the error grows with both β (2) and P II|S . To notice that this analysis does not make any reference to the correlations among nodes' states a model considers. Consequently, a similar comparison holds also between the MMCA and MF approximations of the MECLE and those of the simplicial ELE, with the latter overestimating more the prevalence.
Curiously, when the 0-connectedness is heavily broken (making the MECLE unreliable), the simplicial ELE appears to gain accuracy, especially in locating the critical point 36 . However, while we could explain the reasons why the MECLE outperforms the simplicial ELE in (nearly) 0-connected SCs, the unexpected improvement of the latter for higher connectedness remains unclear. Changes in topological factors, like the spectral dimension (note that, independently from their geometrical dimension, 0-connected SCs have spectral dimension very close to that of a random tree 42 ) or the triangles' percolation, are probably able to soften the approximations of the model. Future work addressing the role of those factors may help to better understand the limits of the simplicial ELE, while giving new, general insights about the relation between dynamical correlations and topology.
An alternative approach, assuming the considered contagion would make it usable, is to opt for more general hypergraphs by dropping the hereditary property out. One could then generalize the approach of Matamalas et al. 36 to any hypergraph, but at the aforementioned cost of accepting some inconsistencies in the marginalization of the probabilities. Besides, referring to the class of simple hypergraphs 19 , i.e., those in which a hyperedge -what in a SC is a face-cannot be subset of any other hyperedge, one can still resort to the fully consistent approach of the MECLE. Indeed, in such structures, to each configuration of the group corresponds a unique channel of infection, so the necessity of constraining groups to share not more than a single node can be relaxed. A model can then be easily constructed adapting Eq. (2) by solely modifying the form of w (r) l,1 , being g = 1 for hyperedges including more than two nodes. For example, supposing the contagion to be maximally conservative, we would write w (r) l,1 = 1[l = r] 1 − β (r) . Moreover, for linear hypergraphs 19 , in which two hyperedges can only share one node, the critical point is again calculable.
Lastly, in a regime of both high and rapid infectiousness and recovery, each set of nodes shared by a sufficient number of groups (e.g., two partners carrying out many of their activities together) could be effectively treated as a single super-node, whose state always represents that of each of the nodes it contains. This effective approach can be combined with any of the ones that have been discussed.

Methods
Epidemic threshold. Here we derive the critical point β (1) cr , defined as the value of β (1) at which the inactive (epidemic-free) state becomes unstable, thus marking the onset of the active (endemic) state. In the presence of a bi-stable region, it identifies the rightmost transition.
We linearize Eq. (2) by regarding of the same order 1, all the state probabilities containing at least one infected node, i.e., P Being interested in stationary states, the value of the time step is omitted from now on. All the states appearing in q (r) i k ,g , see Eq. (6), include at least one infected node, therefore it takes the form, where the squared brackets contain O( ) terms only. It is important to remark that, if in the considered clique there is at least another node ik forming with i k an edge included in some other clique, let us say a (g , r)-clique, q (r) i k ,g cannot be linearized, since the product corresponding to that (g , r)-clique would be made of state probabilities conditioned to the state of both i k and ik (not of i k only, as in Eq. (10)). Indeed, those states in which ik is in state I, would give O(1) terms, for both numerator and denominator would be O( ). This is the reason why in markovian models, when aiming to account for the dynamical correlations within some subsets of nodes (e.g., cliques), a consistent expression for the critical point can be given only when those subsets are edge-disjoint.
Returning to the derivation, since P S...S i 1 ...in,g is fixed by the normalization condition, we only need the linearized equations for arrival states with at least one infected node, i.e., {σ i 1 , . . . , σ in } = {S, . . . , S}. Retaining the O( ) terms in φ i k ,g , after some algebra, we find where N σ→σ = {i k=1,...,n |σ i k = σ, σ i k = σ } is the number of nodes going from state σ to state σ . The terms in curly brackets derive from those transitions starting in state {S, . . . , S} and arriving to a state with exactly one infected node.
In particular, Eq. (8), the dynamic equation for a single node, becomes At this point, to get an expression for the critical threshold, we put Eq. (13) in the form of an eigenvalue equation for the vector of single-node probabilities P I = {P I i } i∈V . To this purpose, given a (·, n)-clique {i 1 , . . . , in}, we need to express every joint probability over its nodes states as a linear combination of the marginal probabilities, P I i 1 , . . . , P I in . Eq. (12) provides a linearized equation for each of the 2 n − 1 unknown state probabilities, hence a system admitting a unique solution. At this point, instead of the n equations for the transition to a state with a single node in state I (second term in Eq. (12)), we use the n consistency relations for the marginal probabilities, i.e., P I i k = {σ ¬i k } P I{σ ¬i k } i k {¬i k },g , ∀k ∈ {1, . . . , n}. In this way, the system is still made of 2 n − 1 equations, hence is determined, but now it includes the marginal probabilities. Eventually, with some algebra, one gets the decomposition with its linear coefficients. Alternatively, asserted the uniqueness of the solution of the linear system, the problem can be approached in the other way around. That is, firstly expressing each of the joint probabilities as the most general linear combination of the marginal probabilities and then inserting them into the 2 n − 1 equations. Doing so, we get a new, determined linear system whose unknowns are the linear coefficients of the original system. In the end, given a (g, n)-clique {i 1 , . . . , in}, the most general and proper linear decomposition of P σ i 1 ...σ in i 1 ...in,g in terms of P I i 1 , . . . , P I in , reads where we can take Y (n−1) n,g = 0, being null the term it multiplies, as there are no nodes in state S for N I = n. We get one coefficient for N I = n and two coefficients for each N I ∈ {1, . . . , n − 1}, leading to 2n − 1 of them in total. Summing for every order n from 2 to m 0 for g = 0, and from 3 to m 1 for g = 1, we get a maximum of m 0 2 + m . Being the decomposition edge-disjoint, only one of those matrices can have a non-zero element in the position corresponding to a given pair of nodes. Consequently, the D(K)-independent adjacency matrix of K (1) , the underlying graph of K, is simply obtained by the sum of all those D(K)-dependent adjacency matrices. It also follows that the degree k i of node i in K (1) is computed as are the total number of neighbors of i within, respectively, (0, ·)-cliques and (1, ·)-cliques.
Substituting Eq. (14) in Eq. (13), and doing some algebra and combinatorics, we get where we have defined the matrices M and D, of elements being δ ij the elements of the N × N identity matrix. Equations (15) to (17) define a generalized eigenvalue problem. The form taken by M and D is easily understood in this way. Each sum multiplying A (g,r) ij in M ij represents the marginal contribution to the infection of node i coming from its neighbor j through the (g, r + 1)-clique they share. Given j in state I, r−1 a is the number of ways in which a out of r − 1 nodes can be chosen to be in state I (a = l − 1) or S (a = r − l − 1). Similarly, each sum multiplying k (g,r) i in D ii represents the contribution coming from all the configurations of any of the (g, r + 1)-cliques incident on node i, in which i is in state S. r l is the number of ways in which l out of r nodes can be chosen to be in state I. Now, M is non-negative. Indeed, for any fixed (g, r), the sum multiplying A (g,r) ij must be positive whenever A Mean-field approximation. The homogeneous mean-field (MF) approximation of Eq. (8) is found by neglecting both the state correlations and the local structural heterogeneity among the nodes, i.e., regarding every node as the "average node" in the structure 44 . Given any (g, r)-clique {j 1 , . . . , jr}, it follows P I...IS...S j 1 ...j l j l+1 ...jr ,g = ρ l (1 − ρ) r−l ; and, for any node i, k (g,r) i =k (g,r) , wherek (g,r) is the average value of the (g, r)-degree of the nodes in the structure. Thus, Eq. (8) becomes The stationary solution is then got imposing ρ (t + 1) = ρ (t) in Eq. (20). The linearization around the epidemic-free state is then implemented by taking ρ = 1. Looking at Eq. (21), the only O( ) terms are given by l = 1, whatever the couple (g, r). That is, only the pair-wise probability β (1) contributes in the MF approximation. With few algebra, one gets the renowned formula wherek = 1 N N i=1 k i , is the average degree of a node in K (1) . More generally, Eq. (22) holds for any model treating the nodes states as independent, thus including the MMCA approximation of the MECLE, the MMCA and MF approximations of the simplicial ELE, and also, expect for substitutingk with k 2 /k, the heterogeneous MF approximation 44 of both. The same result is found in continuous time 31 .
Numerical simulations. The equilibrium value of the prevalence ρ is computed using synchronous Monte Carlo simulations and the quasistationary state (QS) method 45 . In the specific case of the simplicial 2-complexes, for each node i ∈ V , a simulated time step proceeds as follows: (1) if i is currently infected, it recovers with probability µ; (2) if i is currently susceptible, (2.1) it gets infected with probability 1 − (1 − β (1) ) n (1) i , being n  (2) ) n (2) i , being n (2) i the number of currently infected couples of neighbors of i through triangles. For higher dimensions, n > 2, step (2) consists of n − 2 additional sub-steps, analogously defined and ordered as shown here.
In accordance with the QS method, every time the absorbing state ρ = 0 is reached, it is replaced by one of the previously stored active states of the system, i.e., one of those states with at least an active individual. Since for finite systems, when approaching the critical point, a large number of realizations end up in the absorbing state, the QS method properly reduces to a single run the wasteful method of performing many simulations. We have made use of 50 stored active states and an update probability of 0.25. We have given the systems a transient time of 10 5 time steps, and then calculated ρ as an average over 2 × 10 4 additional time steps.

Connectivity structures. We have referred to various interaction structures.
For the numerical evaluation, we have made use of synthetic simplicial 2-complexes with around N = 10 4 nodes, presenting dissimilar structural properties: regular structures, as the clique complexes built from a triangular lattice; homogeneous structures, generated from the random simplicial complex model 30 , in which both the (·, 1)-and the (·, 2)-degree follow a Poisson distribution; heterogeneous structures, derived from the Dorogovtsev-Mendes model 40 , having (·, 1)-and (·, 2)-degree distributions nearly following power-laws with exponent around 3.

Data availability
Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.

Code availability
The code for estimating a minimal edge-disjoint edge clique cover (EECC) of a graph has been implemented as the DisjointCliqueCover.jl ? package for the Julia language, available at github and archived at zenodo.org.

Supplementary Note 2. MECLE for simplicial 2-complexes
We here illustrate the form taken by the MECLE model when the interaction structure is a simplicial 2-complex, i.e., the (3, 3) implementation of the model. Cliques and faces can only have order n = 2, 3.
The evolution of the probability P I i for node i being infected is governed by Eq. (8), which now takes the form According to Eq. (2), the state of a (0, 1)-clique {i, j} is governed by the following equations where q (1) i(j),0 coincides with q i,0 except for excluding the (0, 1)-clique {i, j} from the product, and analogously for the other similar terms.
where q (2) i(jk),0 coincides with q i,0 except for excluding the (0, 2)-clique {i, j, k} from the product, and analogously for the other similar terms.

Supplementary Note 3. Epidemic threshold for highly symmetrical structures
Given the high accuracy of the MECLE -as shown in the main text-, here we make further predictions for some particularly symmetrical structures, for which a closed form of β (1) cr , expressing it in terms of the higher-order infection probabilities, {β (s) } s>1 , and the recovery probability, µ, can be provided from Eq. (19). As a particular case, we show the monotonous decrease of β (1) cr with respect to β (2) in the herein examined 2-dimensional SCs. We stress that such dependence, as like as any other one, of β (1) cr on the higher-order couplings, is completely overlooked by models treating the nodes' states as uncorrelated.
a. Regular SCs. For a SC such that k (g,r) i = k (g,r) , ∀i ∈ V , ∀(g, r), since all the non-zero elements of M are equal to the same constant, from the Collatz-Wielandt formula 43 it follows where both equalities hold for any chosen i ∈ V . Using Eq. (19), Λ max (M ) = 1, one can solve with respect to β (1) and express β (1) cr in terms of all the other parameters. The decreasing of β (1) cr with β (2) is shown in Supplementary Fig. 3 for two classes of regular simplicial 2-complexes. In particular, the periodic triangular clique complex, considered in the main text, falls within this class of structures. Note there is no dependence on the number of nodes, N = |V |. b. Friendship SCs. As an opposite case, we consider now extremely heterogeneous SCs. A Friendship graph F n is a Windmill graph Wd(m, n) whose "sails" are cliques of order m = 3. It consists of N = 2n + 1 nodes, where n is the number of 3-cliques incident on the central node. Starting from F n , we convert a fraction p of the n 3-cliques in 2-faces. The central node has k (0,2) = (1 − p )n ≡ n 0 and k (1,2) = p n ≡ n 1 . Then, 2n 0 of the peripheral nodes have each k (0,2) = 1 and k (1,2) = 0, while the remaining 2n 1 have k (1,2) = 1 and k (0,2) = 0. The greater N (hence, n), the higher the heterogeneity between the central node and the peripheral ones. In the large N limit, the average number of neighborsk = 3 N −1 N tends to 3, whereas any higher m-moment diverges as N m−1 . Accordingly, we expect the epidemic threshold to vanish in that limit.
In order to find a closed expression for the epidemic threshold β 1,g + Y (2) 1,g for g = 0, 1. P is a (N − 1) × 1 matrix whose elements equal M (P,0) or M (P,1) depending on whether the peripheral node corresponding to the considered row participates, respectively, to a (0, 3)-or a (1, 3)-clique. Similarly, C is a 1 × (N − 1) matrix whose elements equal M (C,0) or M (C,1) depending on whether the central node participates, respectively, to a (0, 3)-or a (1, 3)-clique with the peripheral node corresponding to the considered column; where 1,g + Y (2) 1,g for g = 0, 1. We now compute the determinant of where I N is the N × N identity matrix. Thanks to the Schur complement formula 43 , we can compute it as Using the properties of block diagonal matrices 43 , (S14) implying that λ (P,g) ≡ M (P,g) , ∀g ∈ {0, 1}, solves det(M − λI N ) = 0; and therefore one between λ (P,0) and λ (P,1) is the leading eigenvalue of B. However, receiving contributions from peripheral nodes only, the latter can be shown to never coincide with the largest eigenvalue of M . In particular, when both n 0 > 1 and n 1 > 1, we already know this is true, for the largest one is a simple eigenvalue 43 . Therefore, let us suppose λ = λ (P,g) , ∀g ∈ {0, 1}, and look for Λ max (M ) in the other factor, the one containing the contribution coming also from the central node. It is easily found that the inverse matrix ofB g = B g − λI 2 , g = 0, 1. With a few algebra, it follows We finally look for λ = Λ max (M ) among the solutions of Eq. (S18). As before, we solve Λ max (M ) = 1 with respect to β (1) to find β cr as a function of the microscopic parameters, β (2) and µ, and of N and p . Results are shown in Supplementary Fig. 4, where several values of N are explored for p = 0.5. There is shown that the epidemic threshold vanishes in the limit of large N , while always decreasing with β (2) . Interestingly, the dependence on β (2) grows with N , hence with the degree disparity between the central node and the peripheral ones. This, together with the weaker dependence found for regular structures ( Supplementary Fig. 3), suggests that the dependence of β (1) cr on β (2) grows with the heterogeneity of connections. In fact, as shown in Fig. 4, a very similar dependence is found for Dorogovtsev-Mendes SCs. This indicates that, despite their simplicity, the Friendship SCs are able to capture some important dynamical properties of more complex heterogeneous structures. To notice that, a strong correlation between edge-degree, k (0,1) , and triangle-degree, k (1,2) , of a node exists in both Dorogovtsev-Mendes SCs and Friendship SCs.
Supplementary Note 4. Continuous-time limit of MECLE for simplicial 2-complexes It is possible to derive the continuous-time equations of the SIS dynamic in SCs as a limit of the MECLE model. Here, we show this process for the particular case of interaction structures arranged in simplicial 2-complexes, i.e., the continuous-time version of Eqs. S1-S5. In order to take the continuous-time limit, we make the substitutions where now µ and β (s) represent rates, i.e., probabilities per unit time, instead of the original discrete-time probabilities 44 . Then, we take the limit ∆t → 0, which means neglecting all those terms in Eqs. S1-S5 appearing as second or greater powers of ∆t or, equivalently, any combination of β (1) , β (2) and µ. In other words, only single-node state changes are allowed during an infinitesimal interval dt.
The qs, Eqs. S2a-S2c, now become i,1 , giving the probability that node i gets infected, reads The evolution of the probability P I i for node i being infected, now takes the form i,1 is given by Eq. S21.
The state of a (0, 1)-clique {i, j} is governed by the following equations where q (1) i(j),0 coincides with q i,0 except for excluding the (0, 1)-clique {i, j} from the sum, and analogously for the other similar terms.
Finally, for a (1, 2)-clique {i, j, k}, we get the following equations where q (2) i(jk),1 coincides with q i,1 except for excluding the (1, 2)-clique {i, j, k} from the sum, and analogously for the other similar terms.
Evidently, second-order dynamical correlations let β (2) appear in the dynamical equations. When linearizing Eqs. S20-S25 as done in Methods, the products between a state probability with some infected node and 1−q (1) i,0 q (2) i,0 q cr . In Supplementary Fig. 5 we report the dependence of β (1) cr on β (2) as computed for a Dorogovtsev-Mendes SC.
Non-0-connected structures. Eq. S21 (or Eq. S20b) holds for a 0-connected SC. Nonetheless, differently from its discrete-time version, it can be adapted to hold for any connectedness. Indeed, the sum over the (·, 3)-cliques in Eq. S21 (or Eq. S20b) can now be split into two sums: one regarding the first-order infections coming from the edges of the (·, 3)-cliques, and one regarding the second-order infections coming from the (1, 3)-cliques. In the former, rather than over the couples of neighbors of a node, one can sum over its single neighbors, in this way avoiding to over-count their contribution. To this end, one may defineΓ (2) i as the set of neighbors of node i such that j ∈Γ (2) i if ∃k such that {j, k} ∈ Γ (2) i,g , ∀g ∈ {0, 1}. Relaxing the edge-disjoint requirement, we can now look for a standard edge clique cover of the structure. Keeping the (g, n) nomenclature for the cliques in the cover, Eq. S21 adopts the form whereP I|S j|i ≡ P II|S jk|i,g + P IS|S jk|i,g for k such that {j, k} ∈ Γ (2) i,g . Since the edge {i, j} is now allowed to be included in more than one (·, 3)-clique, k can identify more than one node. Differently from single nodes, a complete marginalization of the dynamical equation for the edge {i, j}, that would makeP I|S j|i independent from the chosen k, is however unfeasible (and not only with the chosen closure, Eq. (7)). Therefore, a more symmetrical way to evaluateP I|S j|i is to compute it as the average value of P II|S jk|i,g + P IS|S jk|i,g over all nodes k such that {j, k} ∈ Γ (2) i,g , g = 0, 1. What has been done here for simplicial 2-complexes, can be extended to define the continuous-time limit of the MECLE for non-0-connected simplicial complexes of any higher dimension. The ratio σ rel between the sample standard deviation σ and ρavg for each β (1) . The pick at about β (1) = 0.0047 is due to some curves transitioning at slightly different values of β (1) . In fact, the uncertainty about the location of the critical point is of only about 0.0002, corresponding to a relative uncertainty of less than 5%. The inset plot shows a zoom of σ rel to the right of the transition: σ rel stays below the 6% next to the transition, while rapidly decreasing towards zero for larger β (1) s, hence enlightening the robustness of the model.