Introduction

Epidemics1, rumor spreading2, adoption3, and opinion dynamics4 are well-known manifestations of real-world contagion processes. One of the most remarkable achievements of network science has been the characterization of the dependence of contagion processes on the structural properties of the interaction network through which they spread5,6,7,8. In particular, many real-world networks of interest, especially the social ones, boast a clustered and cycles-rich structure. Triadic closures are indeed renowned to be a distinguishing feature of social systems9,10, together with the presence of larger communities in which every element is connected to (nearly) any other element in them11. Households and workplaces are common contact-based examples of that, while online social communities and groups in messaging apps are information-based ones. In the language of graph theory, such all-to-all substructures are called cliques. Specifically, a n-clique (or clique of order n) consists of n nodes all pair-wisely connected to each other, i.e., a complete subgraph of n nodes.

Apart from a collection of dyadic interactions, cliques can be also regarded as the pair-wise projection of richer substructures representing group interactions (also known as ‘higher-order’ interactions), involving more than two agents (nodes) at once. In fact, from few years on, a growing branch of literature dedicated to the study of various dynamical processes involving group interactions12,13,14,15,16,17,18, has been showing that such interactions can heavily affect the dynamics, and neglecting them can therefore lead to wrong predictions.

The interaction patterns can be properly formalized by means of hypergraphs19,20, a generalization of graphs in which the nodes can be grouped in hyperedges of any order—not only in pairs. Specifically, here we make use of simplicial complexes (SCs)21, defined as sets of faces, whereby a face is a set of nodes with the hereditary property, stating that each of its subsets is also a face of the SC. Associating a group interaction to each face, a SC then represents the entire set of interactions among the nodes in it.

Going to the dynamics, we are interested in complex contagion processes22,23,24,25,26, in which the outcome of a potentially spreading interaction depends on how many different contagious agents take part to it, and not—as in simple contagions—only on the strength of the interaction. The presence of complex contagion mechanisms has been assessed in various contexts27, but largely in online social networks and forums28,29, thanks to their unique data traceability.

Several studies have recently appeared showing the dynamical effects of group interactions on complex contagion. They provide either a qualitative understanding by means of mean-field approximations30,31,32, finding such interactions as responsible for critical mass effects; or a more quantitative one, reinforcing the previous qualitative findings, but for very particular hypergraphs33,34,35. All the studies considered continuous-time dynamics and, noteworthy, uncorrelated nodes’ states. For the case of two-dimensional SCs (i.e., consisting of edges and triangles), Matamalas et al.36 provided a microscopic discrete-time model accounting for first-order (two-nodes) and, only partially, second-order (three-nodes) dynamical correlations. However, as shown afterwards, the same subtle inconsistencies that make the model applicable to any two-dimensional SC, also impede the computation of the critical point.

In this work, looking for a discrete-time model holding for SCs of any dimension, we reveal that fixing those inconsistencies puts a topological condition on the interaction structure the model can describe, namely that two group interactions can only share one node. In order to satisfy this constraint, we introduce the notion of edge-disjoint edge clique cover (EECC) of a SC and a heuristic to find it. By means of our Microscopic Epidemic Clique Equations (MECLE), we provide a two-fold extension of the existent discrete-time complex contagion models, accounting for higher-order correlations and group interactions (if any) at the same time. We prove those dynamical correlations to be essential to describe how the critical point depends on the higher-order couplings. Lastly, different approaches to treat group interactions sharing multiple nodes are also discussed, while hinting at easy adaptations of ours.

Results and discussion

The over-counting problem

Let us first introduce all the basic notions needed to state the problem. We start from the interaction structures we used, SCs. A SC \({\mathcal{K}}\) is a subset of the power-set 2V of a vertex set V, endowed with the hereditary property: given \(f\in {\mathcal{K}}\) and \(f^{\prime} \subseteq f\), then \(f^{\prime} \in {\mathcal{K}}\). Note that we can neglect the empty set from 2V, for it does not have any practical interest here. The elements of \({\mathcal{K}}\) are called faces, and a n-dimensional face (or n-face) is a subset of V made of n + 1 nodes. Given a face f, its power-set 2f is called a simplex. If f is a n-face, 2f is a n-simplex. Giving a geometrical interpretation to \({\mathcal{K}}\), a n-simplex is also the n-dimensional polytope being the convex closure of its n + 1 vertices. For example, given V = {i, j, k} and \({\mathcal{K}}={2}^{V}\), {i} is a 0-simplex (a point), {{i, j}, {i}, {j}} is a 1-simplex (a segment), {{i, j, k}, {i, j}, {i, k}, {j, k}, {i}, {j}, {k}} is a 2-simplex (a triangle).

If a simplex in \({\mathcal{K}}\) is not included in any other simplex in \({\mathcal{K}}\), then it is said to be maximal. If d is the maximum dimension of the faces in \({\mathcal{K}}\), then \({\mathcal{K}}\) is d-dimensional and is called simplicial d-complex. The underlying graph \({{\mathcal{K}}}^{(1)}\) of \({\mathcal{K}}\) is the graph induced by the 1-simplices in \({\mathcal{K}}\), i.e., the graph whose node set is V and whose edge set is the set of all the 1-faces in \({\mathcal{K}}\). Consequently, a n-simplex induces a (n + 1)-clique, made of \(\left(\genfrac{}{}{0.0pt}{}{n+1}{2}\right)\) edges (i.e., 1-faces), in \({{\mathcal{K}}}^{(1)}\). If a clique is not part of a larger one, it is said to be maximal. Finally, \({\mathcal{K}}\) is said to be q-connected if, given any two simplices \({s}_{1},{s}_{2}\subset {\mathcal{K}}\), there exists a sequence of simplices connecting s1 and s2 such that any two adjacent simplices of the sequence have (at least) a q-face in common; if \({\mathcal{K}}\) is connected, then \({{\mathcal{K}}}^{(1)}\) is 0-connected—as any other connected graph.

To identify whether a n-clique c in \({{\mathcal{K}}}^{(1)}\) corresponds to a (n − 1)-simplex in \({\mathcal{K}}\) or it is just a n-clique also in \({\mathcal{K}}\), we introduce a binary variable g, the group classifier, defined to be 1 or 0, respectively; by which c is regarded as a (g, n)-clique. Since a 1-simplex is equivalent to a 2-clique, we choose to assign g = 0 to any 2-clique, so that n 3 when g = 1. From now on, unless explicitly specified, we refer to the g-classified cliques in \({{\mathcal{K}}}^{(1)}\) simply as ‘cliques’. These are the building blocks of our description.

Going to the dynamics of interest, we adopt, without loss of generality, a standard epidemiological terminology. We consider discrete-time susceptible-infected-susceptible (SIS) dynamics on a SC. Let β(n) be the probability with which a susceptible node in a group of n + 1 nodes (a n-face of the SC) gets infected when all the other n nodes are infected; and μ be the probability with which an infected node recovers. Due to the hereditary property, if n nodes are infected, and thus can pass the infection as a group of n nodes, then also any subset of that group can: for any kn, there are \(\left(\genfrac{}{}{0.0pt}{}{n}{k}\right)\) subsets passing the infection with probability β(k) as a group of k nodes. In other words, to each state configuration of the n nodes (i.e., which of them are infected) corresponds a set of concurrent channels of infections, and these channels are correlated. Compared to existing models36, we take a first step forward by accounting for such correlation, allowing us to compute the critical point. Then, when computing the probability for a node to get infected within a group, the contribution coming from each state configuration of the group consists of a product over the concurrent channels that configuration admits. In addition, each contribution is weighted by the probability for the group to be in that configuration.

Now, given a clique, no matter whether it conveys (g = 1) or not (g = 0) group interactions, we want to account also for the dynamical correlations among the states of the nodes in it. However, if s of such cliques have m 2 nodes in common, the contribution to the infection of one of the m nodes, coming from the others m − 1, is counted s times instead of once. This is because the m − 1 nodes would appear in the state configuration probabilities associated with each of the s cliques. To note that if the cliques are just edges, then necessarily m = 1, meaning that the over-counting is excluded in pair-wise models.

Edge-disjoint edge clique cover

To avoid the over-counting, we aim at covering all the edges of \({{\mathcal{K}}}^{(1)}\) by means of a set of cliques such that any two of them share at most one node, giving what we call here an EECC. Cliques sharing more than one node are consequently decomposed in lower-order cliques. This comes with no essential repercussions when the decomposed cliques have all g = 0. Otherwise, some group interactions would be ignored, implying the model to strictly apply for group interactions sharing at most one node. Interestingly, as shown later on, the model remains reliable when the over-counted interactions are relatively scarce.

Covering the structure by cliques only, we are able to give a unique equation holding for any order n; an impossible task via generic, less symmetric substructures. Furthermore, since we want to capture as many correlations and group interactions as possible, we want the cover to consist of the least possible number of cliques. Finding such minimal set of edge-disjoint cliques is closely related to the edge clique cover problem, known to be NP-complete37. Heuristics are thus necessary to estimate the solution in large graphs. For convenience, from now on, we reserve the acronym EECC to sets which are solutions to the problem.

If all the maximal cliques in \({{\mathcal{K}}}^{(1)}\) are edge-disjoint, then \({{\mathcal{K}}}^{(1)}\) admits a unique EECC, simply given by the set of the maximal cliques in it. Otherwise, \({{\mathcal{K}}}^{(1)}\) generally admits multiple EECCs. See Fig. 1 for illustration.

Fig. 1: Edge-disjoint edge clique cover (EECC) of a small simplicial complex (SC).
figure 1

The SC, shown on the left, consists of fourteen nodes (0-simplices), identified via letters, connected by one 3-simplex, one 2-simplex, and fourteen 1-simplices. Gray areas indicate r-simplices with r 2, including r + 1 nodes each. The EECC of the SC is shown on the right, where colored and dotted areas are used to visualize, respectively, the (1, r)-cliques and (0, r)-cliques in it, with colors carrying no specific meanings. The SC is decomposed in: one (1, 4)-clique, {b, d, m, n}; one (1, 3)-clique, {i, j, k}; three (0, 3)-cliques, {c, d, e}, {f, g, m}, {h, i, m}; and five (0, 2)-cliques, {a, b}, {a, n}, {g, h}, {k, l}, {l, m}. The underlying subgraph induced by the subset {a, b, d, m, n} originally consists of a (1, 4)-clique and a (0, 3)-clique. To preserve the group interaction mediated by the (1, 4)-clique, is preferable to include this in the EECC and then break the (0, 3)-clique into two (non-maximal) (0, 2)-cliques. Besides, the underlying subgraph induced by the subset {f, g, h, i, m} is made of three overlapping (0, 3)-cliques, and the EECC is in this case obtained by including {f, g, m} and {h, i, m} (and then the remaining edge {g, h}), instead of {g, h, m} first.

To estimate an EECC, we propose the following greedy heuristic (the best among several options we conceived for this task). Given a graph G, the heuristic proceeds as follows:

  1. 1.

    Find the set C of all the maximal cliques in G

  2. 2.

    Include in the EECC and remove from C all the elements in C that do not share edges with other elements in C

  3. 3.

    While C is not empty

    1. (a)

      For every maximal clique cC, compute the score rc, defined as the fraction of edges that c shares with the other elements in C

    2. (b)

      Consider the elements of C with the lowest score; include in the EECC and remove from C (one, randomly chosen, of) the element(s) of highest order among them

Noteworthy, when dealing with highly modular structures—as those representing many real social systems—in which communities of nodes are loosely connected among them, the search for an (edge-disjoint) edge clique cover can be speeded up. Indeed, each time two regions of the structure are joined by bridging cliques (which are evidently maximal), the problem of finding the optimal cover of the whole structure reduces to that of finding it in each of the two, smaller regions.

Now, whenever the maximal cliques in \({{\mathcal{K}}}^{(1)}\) are not all edge-disjoint, some cliques are forced to be decomposed in sub-cliques during step 4. Since decomposing a (1, )-clique in (0-connected) sub-cliques means also neglecting some group interactions, whenever a (0,  )-clique and a (1, )-clique are not edge-disjoint, we prefer to include the latter in the EECC. This additional difficulty disappears whenever the SC is 0-connected or when it has dimension 1 (i.e., when it is a graph). Thus, being \({\mathcal{K}}\) a SC and \({{\mathcal{K}}}^{(1)}\) its g-classified underlying graph, a EECC of \({\mathcal{K}}\) is constructed as follows:

  1. I.

    Consider the subgraph \({G}_{1}\subseteq {{\mathcal{K}}}^{(1)}\) induced by the nodes in the maximal (1, )-cliques in \({\mathcal{K}}\). Find a EECC of G1; let us call \({\mathcal{S}}({\mathcal{K}})\) the resulting EECC

  2. II.

    Consider the subgraph \({G}_{0}={{\mathcal{K}}}^{(1)}\backslash {G}_{1}\). Find a EECC of G0; let us call it \({\mathcal{C}}({\mathcal{K}})\)

  3. III.

    \({\mathcal{D}}({\mathcal{K}})\equiv ({\mathcal{S}}({\mathcal{K}}),{\mathcal{C}}({\mathcal{K}}))\) is the estimated EECC of \({\mathcal{K}}\)

A basic question is about the dependence of the prediction made by the model when different EECCs are estimated for a given structure. Indeed, while the effectiveness of the heuristic ensures better performance of the model, its robustness is an indispensable quality, as we look for a reliable model giving certain results when fed up with a certain structure, of which a minimal EECC is estimated. In Supplementary Note 1 and Supplementary Figs. 1 and 2, our model is shown to be robust under EECC variability. Therefore, given a SC, it is safe to make use of the first EECC computed for it.

Calling m1 the maximum order of the cliques we want to include in \({\mathcal{S}}({\mathcal{K}})\) (i.e., considering simplices of dimension up to m1 − 1 in \({\mathcal{K}}\)), m1 could be smaller than ω1, the maximum order of the cliques in G1. In such case, when looking for an EECC of G1, those maximal cliques in G1 of order greater than m1 must be decomposed in edge-disjoint sub-cliques (corresponding to sub-simplices in \({\mathcal{K}}\)) of variable order \(m^{\prime} \in \{1,\ldots ,{m}_{1}\}\). Clearly, the higher is m1, the higher is the order of the group interactions (and of the correlations within them) included in the description. Overall, as long as the proportion of obviated group interactions is small enough, the deviations from the complete dynamics are comparatively negligible; or alternatively, the error made by including non-0-connected simplices is negligible.

In the building of an EECC, different values m0 2 for the maximum order to be considered for (0,  )-cliques can also be chosen. The higher is m0, the higher is the order of the captured dynamical correlations within the cliques in \({\mathcal{K}}\). In any case, m0ω0, being ω0 the maximum order of the cliques in G0.

Summarizing, the couple (m0, m1) identifies the considered implementation of the MECLE.

Microscopic epidemic clique equations

Given a (g, n)-clique {i1, …, in} in \({\mathcal{D}}({\mathcal{K}})\), with \(n\,\leqslant \,m\equiv \max \{{m}_{0},{m}_{1}\}\), we indicate with \({P}_{{i}_{1}\ldots {i}_{n},g}^{{\sigma }_{{i}_{1}}\ldots {\sigma }_{{i}_{n}}}\) the joint probability that node i1 is in the state \({\sigma }_{{i}_{1}}\), node i2 is in the state \({\sigma }_{{i}_{2}}\), etc., where \(\{{\sigma }_{{i}_{1}},\ldots ,{\sigma }_{{i}_{n}}\}\in {\{S,I\}}^{n}\). Besides, with \({P}_{{i}_{1}\ldots {i}_{k-1}{i}_{k+1}\ldots {i}_{n}| {i}_{k},g}^{{\sigma }_{{i}_{1}}\ldots {\sigma }_{{i}_{k-1}}{\sigma }_{{i}_{k+1}}\ldots {\sigma }_{{i}_{n}}| {\sigma }_{{i}_{k}}}\) we indicate the conditional probability that nodes i1, …, ik−1, ik+1, …, in are in their respective states \({\sigma }_{{i}_{1}}\ldots {\sigma }_{{i}_{k-1}}{\sigma }_{{i}_{k+1}}\ldots {\sigma }_{{i}_{n}}\), given that node ik is in the state \({\sigma }_{{i}_{k}}\). Clearly, the normalization condition must hold:

$$\mathop{\sum}\limits_{{\{{\sigma }_{{i}_{k}}\}}_{k = 1,\ldots ,n}}{P}_{{i}_{1}\ldots {i}_{n},g}^{{\sigma }_{{i}_{1}}\ldots {\sigma }_{{i}_{n}}}=1$$
(1)

Indicating with \({\{{\sigma }_{{i}_{k}}\}}_{k = 1,\ldots ,n}\) the states at time t and with \({\{{\sigma }_{{i}_{k}}^{\prime}\}}_{k = 1,\ldots ,n}\) those at time t + 1, the MECLE model dynamic equation governing the evolution of the state of a (g, n)-clique {i1, …, in} reads

$${P}_{{i}_{1}\ldots {i}_{n},g}^{{\sigma }_{{i}_{1}}^{\prime}\ldots {\sigma }_{{i}_{n}}^{\prime}}\left(t+1\right)=\mathop{\sum}\limits_{\{{\sigma }_{{i}_{1}},\ldots ,{\sigma }_{{i}_{n}}\}}{P}_{{i}_{1}\ldots {i}_{n},g}^{{\sigma }_{{i}_{1}}\ldots {\sigma }_{{i}_{n}}}\left(t\right)\,{{{\Phi }}}_{g}\left(\{{\sigma }_{{i}_{k}}\},\{{\sigma }_{{i}_{k}}^{\prime}\};\{{\beta }^{(s)}\},\mu \right)$$
(2)

where

$${{{\Phi }}}_{g}\left(\{{\sigma }_{{i}_{k}}\},\{{\sigma }_{{i}_{k}}^{\prime}\};\{{\beta }^{(s)}\},\mu \right)=\mathop{\prod }\limits_{k=1}^{n}{\phi }_{{i}_{k},g}\left(\{{\sigma }_{{i}_{k}}\},{\sigma }_{{i}_{k}}^{\prime};\{{\beta }^{(s)}\},\mu \right)$$
(3)

is the transition probability from the starting state \({\{{\sigma }_{{i}_{k}}\}}_{k = 1,\ldots ,n}\) to the arrival state \({\{{\sigma }_{{i}_{k}}^{\prime}\}}_{k = 1,\ldots ,n}\). It is understood that Φg is computed at time t. This is expressed as a product over the single-node transition probabilities \(\{{\phi }_{{i}_{k},g}\}\), given by

$${\phi }_{{i}_{k},g}\left(\{{\sigma }_{{i}_{k}}\},{\sigma }_{{i}_{k}}^{\prime};\{{\beta }^{(s)}\},\mu \right) = \, {\mathbb{1}}{\left[{\sigma }_{{i}_{k}}=I,{\sigma }_{{i}_{k}}^{\prime}=I\, \right]} \left(1-\mu \right)\\ \, +{\mathbb{1}}{\left[{\sigma }_{{i}_{k}}=I,{\sigma }_{{i}_{k}}^{\prime}=S\right]}\,\mu \\ \, +{\mathbb{1}}{\left[{\sigma }_{{i}_{k}}=S,{\sigma }_{{i}_{k}}^{\prime}=I\,\right]}\left[1-\frac{{w}_{{N}_{I},g}^{(n-1)}}{{q}_{{i}_{k}(\neg {i}_{k}),g}^{(n-1)}}\mathop{\prod}\limits_{(g^{\prime} ,r)}{q}_{{i}_{k},g^{\prime} }^{(r)}\right]\\ \, +{\mathbb{1}}{\left[{\sigma }_{{i}_{k}}=S,{\sigma }_{{i}_{k}}^{\prime}=S\right]}\left[\frac{{w}_{{N}_{I},g}^{(n-1)}}{{q}_{{i}_{k}(\neg {i}_{k}),g}^{(n-1)}}\mathop{\prod}\limits_{(g^{\prime} ,r)}{q}_{{i}_{k},g^{\prime} }^{(r)}\right]$$
(4)

where \({\mathbb{1}}\)[p] gives 1 if condition p is fulfilled and 0 otherwise; \({N}_{I}=\left|\{{i}_{k = 1,\ldots ,n}| {\sigma }_{{i}_{k}}=I\}\right|\) is the number of infected nodes in the starting state; \({w}_{{N}_{I},g}^{(n-1)}\equiv {w}_{{N}_{I},g}^{(n-1)}\left(\{{\beta }^{(s)}\}\right)\) is the probability that a susceptible node (ik) does not get infected within a (g, n)-clique ({i1, …, ik, …, in}) whose state configuration (NIn − 1) is known, reading

$${w}_{{{N}_{I},g}}^{(n-1)}=\, {\mathbb{1}}{[g=0]} \left(1-{\beta }^{(1)}\right)^{{{N}_{I}}}\, + {\mathbb{1}}{[g=1]}\mathop{\prod }\limits_{s=1}^{{{N}_{I}}} \left(1-{\beta }^{(s)}\right)^{{\left( {{{N}_{I}}\atop {s}}\right)}}$$
(5)

and \({q}_{{i}_{k},g^{\prime} }^{(r)}\equiv {q}_{{i}_{k},g^{\prime} }^{(r)}\left(\{{\beta }^{(s)}\}\right)\) is the probability that node ik does not get infected via any of the \((g^{\prime} ,r+1)\)-cliques incident on it, that is

$${q}_{{i}_{k},g^{\prime} }^{(r)} = \, \mathop{\prod}\limits_{ {{\{{j}_{1},\ldots ,{j}_{r}\}}\atop {\in {{{\Gamma }}}_{{i}_{k},g^{\prime} }^{(r)}}}}\left[1-\mathop{\sum }\limits_{l=1}^{r}\frac{1-{w}_{l,g^{\prime} }^{(r)}}{l!\left(r-l\right)!}\right.\\ \, \left.\times \mathop{\sum }\limits_{{k}_{1}\ne \ldots \ne {k}_{r}=1}^{r}{P}_{{j}_{{k}_{1}}\ldots {j}_{{k}_{l}}{j}_{{k}_{l+1}}\ldots {j}_{{k}_{r}}| {i}_{k},g^{\prime} }^{I\ldots IS\ldots S| S}\right]$$
(6)

where \({{{\Gamma }}}_{{i}_{k},g^{\prime} }^{(r)}\) indicates the set of r-tuples of indexes corresponding to subsets of r nodes forming a \((g^{\prime} ,r+1)\)-clique with ik, and \({q}_{{i}_{k}(\neg {i}_{k}),g}^{(n-1)}\) coincides with \({q}_{{i}_{k},g}^{(n-1)}\) expect for excluding the considered (g, n)-clique {i1, …, ik, …, in} from the product. Finally, the products in Eq. (4) are performed over the couples \(\{(g^{\prime} ,r)\}\) such that 2 rm0 for \(g^{\prime} =0\), and 3 rm1 for \(g^{\prime} =1\).

In Eq. (3) the single-node transition probabilities, ϕ, are treated as independent from each other within the time step. This merely derives from the implicit assumption that all the events within a given time step are simultaneous, and therefore not causally related. Simply put, the state of a node at time step t + 1 only depends on its state and on the states of its neighbors at the previous time step t, as in any markovian model.

Importantly, to get the expression for \({q}_{{i}_{k},g^{\prime} }^{(r)}\) we have adopted the following closure

$${P}_{{j}_{{k}_{1}}\ldots {j}_{{k}_{l}}{j}_{{k}_{l+1}}\ldots {j}_{{k}_{r}}{i}_{1}\ldots {i}_{k}\ldots {i}_{n}}^{I\ldots IS\ldots S{\sigma }_{{i}_{1}}\ldots S\ldots {\sigma }_{{i}_{n}}}=\frac{{P}_{{j}_{{k}_{1}}\ldots {j}_{{k}_{l}}{j}_{{k}_{l+1}}\ldots {j}_{{k}_{r}}{i}_{k},g^{\prime} }^{I\ldots IS\ldots SS}{P}_{{i}_{1}\ldots {i}_{k}\ldots {i}_{n},g}^{{\sigma }_{{i}_{1}}\ldots S\ldots {\sigma }_{{i}_{n}}}}{{P}_{{i}_{k}}^{S}}$$
(7)

(the classifier g is assigned only to the cliques in \({\mathcal{D}}({\mathcal{K}})\)). Therefore, by definition of conditional probability, the probability \({P}_{{j}_{{k}_{1}}\ldots {j}_{{k}_{l}}{j}_{{k}_{l+1}}\ldots {j}_{{k}_{r}}| {i}_{1}\ldots {i}_{k}\ldots {i}_{n}}^{I\ldots IS\ldots S| {\sigma }_{{i}_{1}}\ldots S\ldots {\sigma }_{{i}_{n}}}\) appears in Eq. (6) as \({P}_{{j}_{{k}_{1}}\ldots {j}_{{k}_{l}}{j}_{{k}_{l+1}}\ldots {j}_{{k}_{r}}| {i}_{k},g^{\prime} }^{I\ldots IS\ldots S| S}\). Intuitively, each clique in \({\mathcal{D}}({\mathcal{K}})\) is treated as an independent dynamical unit of the system, accounting for the correlations among the states of the nodes it includes. In the form of a generalization of the classical pair approximation38, the dynamical correlations between two adjacent cliques are conveyed by the state probability of the node they share (in the denominator to avoid double counting). In this regard, the larger the adjacent cliques, the smaller the expected influence of the state of the shared node on all the other ones. For this reason, the presented closure—and consequently the MECLE—is expected to gain further accuracy with the increasing of the order of the cliques.

The probability \({P}_{i}^{I}\) for the single node i of being infected is computed as a marginal probability from any (g, n)-clique including i, as

$${P}_{i}^{I}\left(t+1\right)= \, \mathop{\sum}\limits_{\{{\sigma }_{\neg i}\}}{P}_{i\{\neg i\},g}^{I\{{\sigma }_{\neg i}\}}\left(t+1\right)\\ = \, {P}_{i}^{I}\left(t\right)\left(1-\mu \right)+{P}_{i}^{S}\left(t\right)\left(1-\mathop{\prod}\limits_{(g,r)}{q}_{i,g}^{(r)}\right)$$
(8)

in which {¬i} and {σ¬i} indicate, respectively, the set of the other n − 1 nodes in the clique and their states. Equation (8) is, consistently, also found taking n = 1 in Eq. (2) (\({w}_{{N}_{I},g}^{(0)}=1\), \({q}_{i,g}^{(0)}=1\)). The average value ρ of the single node probabilities,

$$\rho \left(t\right)=\frac{1}{N}\mathop{\sum }\limits_{i=1}^{N}{P}_{i}^{I}\left(t\right)$$
(9)

is the epidemic prevalence, which is also the order parameter of the system.

It is important to note that the used closure, Eq. (7), apart from preserving the n-node state correlations of a considered (g, n)-clique, is the only one making feasible the marginalization of Eq. (2) to get Eqs. (8) and (9) and, consequently, get the expression for the epidemic threshold, through Eqs. (15)–(19) in “Methods”.

At any time step t, Eq. (8) yields one constraint for each of the n nodes in a (, n)-clique, leaving 2n − n − 1 independent state probabilities to be determined, one being fixed by the normalization. Therefore, if \({\mathcal{K}}\) has N nodes and \({\mathcal{D}}({\mathcal{K}})\) consists of C(n) (, n)-cliques, nm, the MECLE is defined by a system of \(N+\mathop{\sum }\nolimits_{n = 2}^{m}\left({2}^{n}-n-1\right){C}^{(n)}\) independent equations.

To help in the understanding of the model, we show in Supplementary Note 2 all the equations for the particular case of the simplicial 2-complexes, i.e., the (3, 3) implementation of the MECLE.

Finally, we can frame the existent models in the (m0, m1) notation. As m1 = 2 implies \({\mathcal{S}}({\mathcal{K}})=\varnothing \), \({\{({m}_{0},2)\}}_{2\leqslant {m}_{0}\leqslant {\omega }_{0}}\) is the class of models accounting for correlations within cliques of order up to m0 on graphs (simple contagion). In particular, (2, 2) gives the Epidemic Link Equations (ELE) model39. The Microscopic Markov Chain Approach (MMCA)40 is then recovered from (2, 2) by assuming that \({P}_{ji}^{IS}={P}_{j}^{I}{P}_{i}^{S}\) (i.e., \({P}_{j| i}^{I| S}={P}_{j}^{I}\) in Eq. (6)). Considering group interactions, the simplicial ELE and MMCA36 fall outside the MECLE class of models, for they do not account for the correlations among the concurrent channels of infection within simplices. This, alongside the consideration of higher-order dynamical correlations among nodes’ states, is precisely the refinement made here.

Results for simplicial 2-complexes

We apply here the developed formalism to the case of simplicial 2-complexes. As we are only using synthetic structures, the SCs are constructed from some graph (the future underlying graph) converting into (1, 3)-cliques (i.e., 2-faces) a fraction p of the 3-cliques allowed by the EECC of the graph, while considering as (0, 3)-cliques the remaining fraction 1 − p. Specifically, if all the 3-cliques are converted (p = 1.0), the resulting SCs are the clique complexes of the respective original graphs. To better appreciate the improvement made here, we mainly show results for clique complexes, although notable improvements can be generally found for any value of p depending on the used structure. Conveniently, if p is not specified then the structure is understood to be a clique complex. We identify a such generated SC adding ‘SC’ to the name of the graph model used for it: a ‘Dorogovtsev-Mendes SC’, for instance, is built upon a graph given by the Dorogovtsev-Mendes generative model41. Specifically, the ‘random SC’ is obtained as follows: first we generate a simplicial 2-complex through the random SC model30; then we consider its underlying graph and compute its EECC; finally, the random SC is got, as before, by converting 3-cliques in 2-faces, in this way ensuring the SC to be 0-connected.

In Fig. 2 we compare the prevalence ρ obtained using Monte Carlo (MC) simulations (see Methods for details), the MECLE model, and the other discrete-time markovian models, i.e., the simplicial ELE and MMCA models36, for different 0-connected clique complexes. For all structures, the improvement brought by the MECLE with respect to the simplicial ELE is substantial, in both predicting the epidemic prevalence and, even more, locating the critical point, for which the relative errors \({\varepsilon }_{{\beta }^{(1)}}\) are reported. Note that the predictions is expected to improve for increasing order of the cliques, since the precision of the used closure, Eq. (7), grows with it as well, especially in the case of populations arranged in lowly inter-connected dense communities.

Fig. 2: Epidemic prevalence ρ as a function of the edge infection probability β(1) on different 2-dimensional simplicial complexes (SCs).
figure 2

Results obtained from Monte Carlo (MC) simulations are depicted by dots, while lines represent the analytically computed prevalence using the indicated models. MMCA and MMCA(MECLE) refer to the Microscopic Markov Chain approximation of, respectively, the simplicial Epidemic Link Equations (ELE) model and the Microscopic Epidemic Clique Equations (MECLE) model, as obtained by considering the state probabilities of the nodes as uncorrelated; while MF and MF(MECLE) refer to their homogeneous mean-field approximations (see Methods). Note that MF and MF(MECLE) are indistinguishable at the used scale. The value of the epidemic threshold, as computed in the MECLE through Eqs. (15)–(19) in Methods, is marked with a vertical dotted line. The recovery probability is fixed to μ = 0.2. a Periodic triangular SC with \({\overline{k}}^{(0,1)}=0.00\), \({\overline{k}}^{(0,2)}=0.00\) and \({\overline{k}}^{(1,2)}=3.00\), being \({\overline{k}}^{(g,r)}\) the mean number of (g, n + 1)-cliques incident on a node, and triangle infection probability β(2) = 0.25; the relative error in locating the epidemic threshold is \({\varepsilon }_{{\beta }^{(1)}}\approx 0.08\) for MECLE and \({\varepsilon }_{{\beta }^{(1)}}\approx 0.12\) for ELE. b Random SC with \({\overline{k}}^{(0,1)}=4.10\), \({\overline{k}}^{(0,2)}=0.00\) and \({\overline{k}}^{(1,2)}=3.95\), and β(2) = 0.15; \({\varepsilon }_{{\beta }^{(1)}}\approx 0.06\) for MECLE and \({\varepsilon }_{{\beta }^{(1)}}\approx 0.09\) for ELE. c Dorogovtsev-Mendes SC with \({\overline{k}}^{(0,1)}=1.10\), \({\overline{k}}^{(0,2)}=0.00\) and \({\overline{k}}^{(1,2)}=1.45\), and β(2) = 0.25; \({\varepsilon }_{{\beta }^{(1)}}\approx 0.07\) for MECLE and \({\varepsilon }_{{\beta }^{(1)}}\approx 0.16\) for ELE. d Same as c but with β(2) = 0.50; \({\varepsilon }_{{\beta }^{(1)}}\approx 0.23\) for MECLE and \({\varepsilon }_{{\beta }^{(1)}}\approx 0.50\) for ELE.

We then leverage the prominent jump of the discontinuous transition found for random SCs, to illustrate, in Fig. 3, the existence of a hysteresis cycle, enclosing a bi-stable region. This is in line with recent findings30,31,33,34,35,36. Again, the simplicial ELE is outperformed by the MECLE, especially in predicting the ‘backward’ curves, as the overestimation made by the former (see next subsection) is emphasized in that case.

Fig. 3: Hysteresis cycle of the epidemic prevalence ρ with respect to the edge infection probability β(1) on different two-dimensional simplicial complexes (SCs).
figure 3

Results obtained from Monte Carlo (MC) simulations are depicted by dots, while lines represent the analytically computed prevalence using the simplicial Epidemic Link Equations (ELE) model and the Microscopic Epidemic Clique Equations (MECLE) model. The ʻforward' and ʻbackward' curves are obtained through small equilibrium transformations taking as initial value ρ0 the equilibrium value of ρ got at the next smaller and next greater value of β(1), respectively. The hysteresis cycle reveals the bi-stable region. The computed value of the (forward) epidemic threshold is marked with a vertical dotted line. The recovery probability is fixed to μ = 0.2. a Random SC with \({\overline{k}}^{(0,1)}=4.10\), \({\overline{k}}^{(0,2)}=1.58\) and \({\overline{k}}^{(1,2)}=2.37\) (p = 0.6), being \({\overline{k}}^{(g,r)}\) the mean number of (g, n + 1)-cliques incident on a node, and triangle infection probability β(2) = 0.25; the relative errors in locating the epidemic threshold are \({\varepsilon }_{{\beta }^{(1)}}^{\,\text{for.}\,}\approx 0.14\) and \({\varepsilon }_{{\beta }^{(1)}}^{\,\text{back.}\,}\approx 0.10\) for MECLE, and \({\varepsilon }_{{\beta }^{(1)}}^{\,\text{for.}\,}\approx 0.18\) and \({\varepsilon }_{{\beta }^{(1)}}^{\,\text{back.}\,}\approx 0.38\) for ELE. b Random SC with \({\overline{k}}^{(0,1)}=4.10\), \({\overline{k}}^{(0,2)}=0.00\) and \({\overline{k}}^{(1,2)}=3.95\) (p = 1.0), and β(2) = 0.15; \({\varepsilon }_{{\beta }^{(1)}}^{\,\text{for.}\,}\approx 0.06\) and \({\varepsilon }_{{\beta }^{(1)}}^{\,\text{back.}\,}\approx 0.15\) for MECLE, and \({\varepsilon }_{{\beta }^{(1)}}^{\,\text{for.}\,}\approx 0.09\) and \({\varepsilon }_{{\beta }^{(1)}}^{\,\text{back.}\,}\approx 0.47\) for ELE.

More interestingly, our analysis clearly shows that models with uncorrelated nodes’ states generally fail to pinpoint the (‘forward’) epidemic threshold at even a qualitative level. Indeed, by neglecting those dynamical correlations, Eq. (22) (see “Methods”) predicts \({\beta }_{{\rm{cr}}}^{(1)}\), the value of β(1) at which the epidemic-free state becomes unstable, to be independent from the values of the higher-order infection probabilities, \({\{{\beta }^{(s)}\}}_{s \,{> }\,1}\). In particular, in the case of two-dimensional SCs, increasing enough β(2) results in the appearance of a bi-stable region, but \({\beta }_{{\rm{cr}}}^{(1)}\) does not change in those models30,31. On the contrary, when the actual structure of the interactions is retained, increasing β(2) does lead \({\beta }_{{\rm{cr}}}^{(1)}\) to decrease, as arises from MC simulations and correctly predicted by the MECLE. We show this dependence in Fig. 4, while it can also be grasped by comparing panels (c) and (d) in Fig. 2.

Fig. 4: Dependence of the epidemic threshold on the triangle infection probability β(2).
figure 4

\({\beta }_{{\rm{cr}}}^{(1)}\), computed via Eqs. (15)–(19) in Methods, is shown against β(2) for a Dorogovtsev-Mendes simplicial complex with \({\overline{k}}^{(0,1)}=1.10\), \({\overline{k}}^{(0,2)}=0.00\) and \({\overline{k}}^{(1,2)}=1.45\), being \({\overline{k}}^{(g,r)}\) the mean number of (g, n + 1)-cliques incident on a node. Note that Eq. (22), disregarding dynamical correlations, wrongly predicts \({\beta }_{{\rm{cr}}}^{(1)}=\mu /\overline{k}=\mu /4\), β(2).

The calculation of the critical point in the MECLE (see “Methods”), together with the evidences coming from the MC simulations, reveal the necessity of accounting for higher-order dynamical correlations. An interaction (infection) of order s, with coupling (infection probability) β(s), requires s nodes in a simplex to be active (infected). Therefore, in order to preserve its contribution near the critical point, the state probability with s active nodes must not be neglected—as done instead in models with uncorrelated nodes’ states. In other words, near the epidemic threshold, higher-order interactions rely exclusively on comparatively higher-order correlations, so that their effect when varying the higher-order couplings, \({\{{\beta }^{(s)}\}}_{s \,{> }\,1}\), is observed only if such correlations are preserved. Accordingly, such dependence eludes the MF and MMCA approximations of the system. Besides, as second-order correlations within triangles are partially accounted for in the simplicial ELE, the latter predicts this dependence, but not as accurately as MECLE does. Finally, in the endemic state and away from the threshold, low-order correlations suffice to sustain higher-order interactions. Nevertheless, the higher the order of the accounted correlations, the more accurate is the quantification of the prevalence.

While a closed equation for \({\beta }_{{\rm{cr}}}^{(1)}\) is generally inaccessible for complex interaction structures, in Supplementary Note 3 and Supplementary Figs. 3 and 4, we explicitly derive the monotonous decrease of \({\beta }_{{\rm{cr}}}^{(1)}\) with respect to the higher-order infection probabilities for selected symmetrical structures. In particular, we study regular SCs, like the herein studied periodic triangular SC, and SCs built upon Friendship graphs, taken as proxies for homogeneous and heterogeneous structures, respectively. We prove that \({\beta }_{{\rm{cr}}}^{(1)}\) decays with β(2) whatever the system size N, and that the dependence considerably increases with the structural heterogeneity.

It should be noted that the herein observed and proved dependence of the critical point on the higher-order couplings, already showed up in previously reported numerical simulations, especially in SCs built from real data30. However, it went seemingly overlooked, eventually leading to claim42 that only pair-wise interactions govern the value of \({\beta }_{{\rm{cr}}}^{(1)}\). Even though we have considered a discrete-time dynamic, instead of the continuous-time’ used in those works upon which that claim is based on, we predict the qualitative shift brought by our analysis to hold in continuous-time as well. The continuous-time limit is here recovered by neglecting all those terms in the equations appearing as second or greater powers of any combination of the infection probabilities {β(s)} and the recovery probability μ, i.e., allowing only single-node state changes. Still, the linear terms proportional to any of the infection probabilities show up in Eq. (19), thus contributing to the value of the critical point. In Supplementary Note 4, we derive the continuous-time limit of the MECLE equations for simplicial 2-complexes. Taking the example of a Dorogovtsev-Mendes SC, the predicted dependence of the critical point on β(2) is shown in Supplementary Fig. 5.

Lastly, considering clique complexes with some fraction ps of edges shared by two or more 2-faces, we have studied how the MECLE behaves out of the bounds of the 0-connectedness. As shown in Fig. 5, when ps is low enough, it performs comparably to or still better than the simplicial ELE. As expected, the value of ps above which the MECLE performs worse turns out to specifically depend on the structure, preventing us to find a simple relation. Precisely, that value resulted to be around 0.06 for Dorogovtsev-Mendes SCs and around 0.04 for RSCs.

Fig. 5: Epidemic prevalence ρ as function of the link infection probability β(1) on 2-dimensional simplicial complexes (SCs) with low percentage ps of shared non-maximal edges.
figure 5

ps is computed as the fraction of edges within 2-faces which are included in more than one 2-face. Besides, with \(\overline{s}\) we indicate the average number of 2-faces in which the edges corresponding to the fraction ps are included (\(\overline{s}\,\geqslant\, 2\)). Results obtained from Monte Carlo (MC) simulations are depicted by dots, while lines represent the analytically computed prevalence using the indicated models. MMCA and MMCA(MECLE) refer to the Microscopic Markov Chain approximation of, respectively, the simplicial Epidemic Link Equations (ELE) model and the Microscopic Epidemic Clique Equations (MECLE) model, as obtained by considering the state probabilities of the nodes as uncorrelated; while MF and MF(MECLE) refer to their homogeneous mean-field approximations (see “Methods”). Note that MF and MF(MECLE) are indistinguishable at the used scale. The recovery probability is fixed to μ = 0.2. a Random SC with \({\overline{k}}^{(0,1)}=2.25\), \({\overline{k}}^{(0,2)}=0.00\) and \({\overline{k}}^{(1,2)}=5.20\), being \({\overline{k}}^{(g,r)}\) the mean number of (g, n + 1)-cliques incident on a node; ps = 0.03, \(\overline{s}=2.01\), and triangle infection probability β(2) = 0.15. b Dorogovtsev-Mendes SC with \({\overline{k}}^{(0,1)}=1.10\), \({\overline{k}}^{(0,2)}=0.00\) and \({\overline{k}}^{(1,2)}=1.60\), ps = 0.05, \(\overline{s}=2.06\), and β(2) = 0.25.

Possible approaches for group interactions sharing multiple nodes

In discrete-time models, allowing groups to share two or more nodes comes with the drawback of impeding the computation of the epidemic threshold, as shown in “Methods”. Forgoing the latter, let us discuss some approaches which may still come in handy.

The first option is the simplicial ELE model36. It describes a SIS dynamics in simplicial 2-complexes by means of a pair approximation that, via a specific triangle closure, is able to partially account for second-order correlations. However, considering the multiple channels of infection within a simplex as mutually uncorrelated, it is not possible to marginalize the 1-faces equations to get the single nodes one, hence neither the epidemic threshold. To further elucidate the effects of disregarding the correlations among the concurrent infections within a simplex, let us consider the toy example of a triangle upon the set of nodes {i, j, k}. In the MECLE model, the probability for i to be infected by j and k, reads

$${\beta }^{(1)}\left({P}_{jk| i}^{IS| S}+{P}_{jk| i}^{SI| S}\right)+\left[1-{\left(1-{\beta }^{(1)}\right)}^{2}\left(1-{\beta }^{(2)}\right)\right]{P}_{jk| i}^{II| S}$$

In particular, taking β(1) = 1, the dependence on β(2) correctly disappears: no matter the infectiousness β(2) of the couple {j, k}, the probability for node i to be infected equals the probability that at least one node between j and k is infected. On the contrary, the simplicial ELE neglects that correlation, and the above infection probability becomes

$$1-\left(1-{\beta }^{(1)}{P}_{j| i}^{I| S}\right)\left(1-{\beta }^{(1)}{P}_{k| i}^{I| S}\right)\left(1-{\beta }^{(2)}{P}_{jk| i}^{II| S}\right)$$

where \({P}_{jk| i}^{II| S}\) is expressed in terms of edge probabilities. Here, taking β(1) = 1, there is still a dependence on β(2), with the effect of both overestimating the probability of infection, hence the prevalence, and wrongly anticipating the position of the epidemic threshold. Evidently, the error grows with both β(2) and PIIS. To notice that this analysis does not make any reference to the correlations among nodes’ states a model considers. Consequently, a similar comparison holds also between the MMCA and MF approximations of the MECLE and those of the simplicial ELE, with the latter overestimating more the prevalence.

Curiously, when the 0-connectedness is heavily broken (making the MECLE unreliable), the simplicial ELE appears to gain accuracy, especially in locating the critical point36. However, while we could explain the reasons why the MECLE outperforms the simplicial ELE in (nearly) 0-connected SCs, the unexpected improvement of the latter for higher connectedness remains unclear. Changes in topological factors, like the spectral dimension (note that, independently from their geometrical dimension, 0-connected SCs have spectral dimension very close to that of a random tree43) or the triangles’ percolation, are probably able to soften the approximations of the model. Future work addressing the role of those factors may help to better understand the limits of the simplicial ELE, while giving new, general insights about the relation between dynamical correlations and topology.

An alternative approach, assuming the considered contagion would make it usable, is to opt for more general hypergraphs by dropping the hereditary property out. One could then generalize the approach of Matamalas et al.36 to any hypergraph, but at the aforementioned cost of accepting some inconsistencies in the marginalization of the probabilities. Besides, referring to the class of simple hypergraphs19, i.e., those in which a hyperedge—what in a SC is a face—cannot be a subset of any other hyperedge, one can still resort to the fully consistent approach of the MECLE. Indeed, in such structures, to each configuration of the group corresponds a unique channel of infection, so the necessity of constraining groups to share not more than a single node can be relaxed. A model can then be easily constructed adapting Eq. (2) by solely modifying the form of \({w}_{l,1}^{(r)}\), being g = 1 for hyperedges including more than two nodes. For example, supposing the contagion to be maximally conservative, we would write \({w}_{l,1}^{(r)}={\mathbb{1}}[l=r]\left(1-{\beta }^{(r)}\right)\). Moreover, for linear hypergraphs19, in which two hyperedges can only share one node, the critical point is again calculable.

Lastly, in a regime of both high and rapid infectiousness and recovery, each set of nodes shared by a sufficient number of groups (e.g., two partners carrying out many of their activities together) could be effectively treated as a single super-node, whose state always represents that of each of the nodes it contains. This effective approach can be combined with any of the ones that have been discussed.

Methods

Epidemic threshold

Here we derive the critical point \({\beta }_{{\rm{cr}}}^{(1)}\), defined as the value of β(1) at which the inactive (epidemic-free) state becomes unstable, thus marking the onset of the active (endemic) state. In the presence of a bi-stable region, it identifies the rightmost transition.

We linearize Eq. (2) by regarding of the same order ϵ 1, all the state probabilities containing at least one infected node, i.e., \({P}_{{i}_{1}\ldots {i}_{n},g}^{{\sigma }_{{i}_{1}}\ldots {\sigma }_{{i}_{n}}} \sim {\mathcal{O}}(\epsilon)\) iff \(\exists k:{\sigma }_{{i}_{k}}=I\); and consequently, \({P}_{{i}_{1}\ldots {i}_{n},g}^{S\ldots S} \sim 1-{\mathcal{O}}(\epsilon)\). Without this assumption, higher-order dynamical correlations would be lost, and the critical point would not be correctly located. This can be interpreted as an extension of the results in Matamalas et al.39, where it is shown that \({P}^{II} \sim {\mathcal{O}}(\epsilon)\) for the state probability of an edge having both nodes infected.

Being interested in stationary states, the value of the time step is omitted from now on. All the states appearing in \({q}_{{i}_{k},g^{\prime} }^{(r)}\), see Eq. (6), include at least one infected node, therefore it takes the form,

$${q}_{{i}_{k},g^{\prime} }^{(r)}= \, 1-\mathop{\sum}\limits_{{{\{{j}_{1},\ldots ,{j}_{r}\}}\atop \in {{{\Gamma }}_{{i}_{k},g^{\prime} }^{(r)}}}}\left[\mathop{\sum }\limits_{l=1}^{r}\frac{1-{w}_{l,g^{\prime} }^{(r)}}{l!\left(r-l\right)!}\right.\\ \, \left.\times \mathop{\sum }\limits_{{k}_{1}\ne \ldots \ne {k}_{r}=1}^{r}{P}_{{j}_{{k}_{1}}\ldots {j}_{{k}_{l}}{j}_{{k}_{l+1}}\ldots {j}_{{k}_{r}}{i}_{k},g^{\prime} }^{I\ldots IS\ldots SS}\right]+{\mathcal{O}}({\epsilon }^{2})$$
(10)

where the squared brackets contain \({\mathcal{O}}(\epsilon)\) terms only. It is important to remark that, if in the considered clique there is at least another node \({i}_{\tilde{k}}\) forming with ik an edge included in some other clique, let us say a \((g^{\prime} ,r)\)-clique, \({q}_{{i}_{k},g^{\prime} }^{(r)}\) cannot be linearized, since the product corresponding to that \((g^{\prime} ,r)\)-clique would be made of state probabilities conditioned to the state of both ik and \({i}_{\tilde{k}}\) (not of ik only, as in Eq. (10)). Indeed, those states in which \({i}_{\tilde{k}}\) is in state I, would give \({\mathcal{O}}(1)\) terms, for both numerator and denominator would be \({\mathcal{O}}(\epsilon)\). This is the reason why in markovian models, when aiming to account for the dynamical correlations within some subsets of nodes (e.g., cliques), a consistent expression for the critical point can be given only when those subsets are edge-disjoint.

Returning to the derivation, since \({P}_{{i}_{1}\ldots {i}_{n},g}^{S\ldots S}\) is fixed by the normalization condition,

$${P}_{{i}_{1}\ldots {i}_{n},g}^{S\ldots S}=1-\mathop{\sum}\limits_{{{\{{\sigma }_{1},\ldots ,{\sigma }_{n}\}}\atop {\ne \{S,\ldots ,S\}}}}{P}_{{i}_{1}\ldots {i}_{n},g}^{{\sigma }_{{i}_{1}}\ldots {\sigma }_{{i}_{n}}}$$
(11)

we only need the linearized equations for arrival states with at least one infected node, i.e., \(\{{\sigma }_{{i}_{1}}^{\prime},\ldots ,{\sigma }_{{i}_{n}}^{\prime}\}\,\ne\, \{S,\ldots ,S\}\). Retaining the \({\mathcal{O}}(\epsilon)\) terms in \({\phi }_{{i}_{k},g}\), after some algebra, we find

$${P}_{{i}_{1}\ldots {i}_{n},g}^{{\sigma }_{{i}_{1}}^{\prime}\ldots {\sigma }_{{i}_{n}}^{\prime}} = \, \mathop{\sum}\limits_{{{\{{\sigma }_{{i}_{1}},\ldots ,{\sigma }_{{i}_{n}}\}}\atop {\ne \{S,\ldots ,S\}}}}{P}_{{i}_{1}\ldots {i}_{n},g}^{{\sigma }_{{i}_{1}}\ldots {\sigma }_{{i}_{n}}}\left[{\left(1-\mu \right)}^{{N}_{I\to I}}\,{\mu }^{{N}_{I\to S}}{\left(1-{w}_{{N}_{I},g}^{(n-1)}\right)}^{{N}_{S\to I}}{\left({w}_{{N}_{I},g}^{(n-1)}\right)}^{{N}_{S\to S}}\right]\\ \, +{\mathbb{1}}[\exists !k:{\sigma }_{{i}_{k}}^{\prime}=I\,]\left\{\mathop{\sum}\limits_{(g^{\prime} ,r)\ne (g,n-1)}\mathop{\sum}\limits_{{{\{{j}_{1},\ldots ,{j}_{r}\}}\atop {\in {{{\Gamma }}}_{{i}_{k},g^{\prime} }^{(r)}}}}\left[\mathop{\sum }\limits_{l=1}^{r}\frac{1-{w}_{l,g^{\prime} }^{(r)}}{l!\left(r-l\right)!}\mathop{\sum }\limits_{{k}_{1}\ne \ldots \!\ne {k}_{r}=1}^{r}{P}_{{j}_{{k}_{1}}\ldots {j}_{{k}_{l}}{j}_{{k}_{l+1}}\ldots {j}_{{k}_{r}}{i}_{k},g^{\prime} }^{I\ldots IS\ldots SS}\right]\right.\\ \,\left.+\mathop{\sum}\limits_{{{\{{j}_{1},\ldots ,{j}_{n-1}\}}\atop {\in {{{\Gamma }}}_{{i}_{k},g}^{(n-1)}\backslash \{\neg {i}_{k}\}}}}\left[\mathop{\sum }\limits_{l=1}^{n-1}\frac{1-{w}_{l,g}^{(n-1)}}{l!\left(n-1-l\right)!}\mathop{\sum }\limits_{{k}_{1}\ne \ldots \ne {k}_{n-1}=1}^{n-1}{P}_{{j}_{{k}_{1}}\ldots {j}_{{k}_{l}}{j}_{{k}_{l+1}}\ldots {j}_{{k}_{n-1}}{i}_{k},g}^{I\ldots IS\ldots SS}\right]\right\}+{\mathcal{O}}({\epsilon }^{2})$$
(12)

where \({N}_{\sigma \to \sigma ^{\prime} }=\left|\{{i}_{k = 1,\ldots \!,n}| {\sigma }_{{i}_{k}}=\sigma ,{\sigma }_{{i}_{k}}^{\prime}=\sigma ^{\prime} \}\right|\) is the number of nodes going from state σ to state \(\sigma ^{\prime} \). The terms in curly brackets derive from those transitions starting in state {S, …, S} and arriving to a state with exactly one infected node.

In particular, Eq. (8), the dynamic equation for a single node, becomes

$${P}_{i}^{I}= \, {P}_{i}^{I}(1-\mu)+\mathop{\sum}\limits_{(g,r)}\mathop{\sum}\limits_{{{\{{j}_{1},\ldots \!,{j}_{r}\}}\atop {\in {{{\Gamma }}}_{i,g}^{(r)}}}}\left[\mathop{\sum }\limits_{l = 1}^{r}\frac{1-{w}_{l,g}^{(r)}}{l!\left(r-l\right)!}\,\right.\\ \, \left.\times \mathop{\sum }\limits_{{k}_{1}\ne \ldots \!\ne {k}_{r}=1}^{r}{P}_{{j}_{{k}_{1}}\ldots {j}_{{k}_{l}}{j}_{{k}_{l+1}}\ldots {j}_{{k}_{r}}i,g}^{I\ldots IS\ldots SS}\right]+{\mathcal{O}}({\epsilon }^{2})$$
(13)

At this point, to get an expression for the critical threshold, we put Eq. (13) in the form of an eigenvalue equation for the vector of single-node probabilities \({{\bf{P}}}^{I}=\left({\{{P}_{i}^{I}\}}_{i\in V}\right)\). To this purpose, given a (, n)-clique {i1, …, in}, we need to express every joint probability over its nodes states as a linear combination of the marginal probabilities, \({P}_{{i}_{1}}^{I},\ldots ,{P}_{{i}_{n}}^{I}\). Eq. (12) provides a linearized equation for each of the 2n − 1 unknown state probabilities, hence a system admitting a unique solution. At this point, instead of the n equations for the transition to a state with a single node in state I (second term in Eq. (12)), we use the n consistency relations for the marginal probabilities, i.e., \({P}_{{i}_{k}}^{I}={\sum }_{\{{\sigma }_{\neg {i}_{k}}\}}{P}_{{i}_{k}\{\neg {i}_{k}\},g}^{I\{{\sigma }_{\neg {i}_{k}}\}}\), k {1, …, n}. In this way, the system is still made of 2n − 1 equations, hence is determined, but now it includes the marginal probabilities. Eventually, with some algebra, one gets the decomposition with its linear coefficients. Alternatively, asserted the uniqueness of the solution of the linear system, the problem can be approached in the other way around. That is, firstly expressing each of the joint probabilities as the most general linear combination of the marginal probabilities and then inserting them into the 2n − 1 equations. Doing so, we get a new, determined linear system whose unknowns are the linear coefficients of the original system. In the end, given a (g, n)-clique {i1, …, in}, the most general and proper linear decomposition of \({P}_{{i}_{1}\ldots {i}_{n},g}^{{\sigma }_{{i}_{1}}\ldots {\sigma }_{{i}_{n}}}\) in terms of \({P}_{{i}_{1}}^{I},\ldots ,{P}_{{i}_{n}}^{I}\), reads

$${P}_{{i}_{1}\ldots {i}_{n},g}^{{\sigma }_{{i}_{1}}\ldots {\sigma }_{{i}_{n}}}={X}_{{N}_{I},g}^{(n-1)}\mathop{\sum}\limits_{k:{\sigma }_{{i}_{k}}=I}{P}_{{i}_{k}}^{I}+{Y}_{{N}_{I},g}^{(n-1)}\mathop{\sum }_{k:{\sigma }_{{i}_{k}} = S}{P}_{{i}_{k}}^{I}+{\mathcal{O}}({\epsilon }^{2})$$
(14)

where we can take \({Y}_{n,g}^{(n-1)}=0\), being null the term it multiplies, as there are no nodes in state S for NI = n. We get one coefficient for NI = n and two coefficients for each NI {1, …, n − 1}, leading to 2n − 1 of them in total. Summing for every order n from 2 to m0 for g = 0, and from 3 to m1 for g = 1, we get a maximum of \({{m}_{0}}^{2}+{{m}_{1}}^{2}-5\) linear coefficients to fix. These coefficients, as functions of all the m1 microscopic parameters of the model, \({\{{\beta }^{(s)}\}}_{s = 1,\ldots \!,{m}_{1}-1}\) and μ, weigh the probability of finding a clique in a given state, when the system approaches the critical point.

Once all the coefficients have been found by insertion of Eq. (14) in the original linear system of 2n − 1 equations, we substitute them in Eq. (13) to finally get an eigenvalue equation. To this end, we define the set of \({\mathcal{D}}({\mathcal{K}})\)-dependent adjacency matrices \(\left\{{\left\{{A}^{(0,r)}\right\}}_{r\in \{1,\ldots \!,{m}_{0}-1\}},{\left\{{A}^{(1,r)}\right\}}_{r\in \{2,\ldots \!,{m}_{1}-1\}}\right\}\), such that \({A}_{ij}^{(g,r)}\) equals 1 if nodes i and j share a common incident (g, r + 1)-clique in \({\mathcal{D}}({\mathcal{K}})\), and 0 otherwise. Besides, we define the (g, r) degree of node i, \({k}_{i}^{(g,r)}\), as the number of (g, r + 1)-cliques incident on node i in \({\mathcal{D}}({\mathcal{K}})\), computed as \({k}_{i}^{(g,r)}=\mathop{\sum }\nolimits_{j = 1}^{N}{A}_{ij}^{(g,r)}\). Being the decomposition edge-disjoint, only one of those matrices can have a non-zero element in the position corresponding to a given pair of nodes. Consequently, the \({\mathcal{D}}({\mathcal{K}})\)-independent adjacency matrix of \({{\mathcal{K}}}^{(1)}\), the underlying graph of \({\mathcal{K}}\), is simply obtained by the sum of all those \({\mathcal{D}}({\mathcal{K}})\)-dependent adjacency matrices. It also follows that the degree ki of node i in \({{\mathcal{K}}}^{(1)}\) is computed as \({k}_{i}={k}_{i}^{(0)}+{k}_{i}^{(1)}\), where \({k}_{i}^{(0)}=\mathop{\sum }\nolimits_{r = 1}^{{m}_{0}}r{k}_{i}^{(0,r)}\) and \({k}_{i}^{(1)}=\mathop{\sum }\nolimits_{r = 2}^{{m}_{1}}r{k}_{i}^{(1,r)}\) are the total number of neighbors of i within, respectively, (0, )-cliques and (1, )-cliques.

Substituting Eq. (14) in Eq. (13), and doing some algebra and combinatorics, we get

$$M{{\bf{P}}}^{I}=D{{\bf{P}}}^{I}$$
(15)

where we have defined the matrices M and D, of elements

$${M}_{ij}=\mathop{\sum}\limits_{(g,r)}{A}_{ij}^{(g,r)}\mathop{\sum }\limits_{l=1}^{r}\left(1-{w}_{l,g}^{(r)}\right)\left[\left(\begin{array}{l}{r-1}\\ {l-1}\end{array}\right){X}_{l,g}^{(r)}+\left(\begin{array}{l}{r-1}\\ {r-l-1}\end{array}\right){Y}_{l,g}^{(r)}\right]$$
(16)
$${D}_{ij}={\delta }_{ij}\left[\mu -\mathop{\sum}\limits_{(g,r)}{k}_{i}^{(g,r)}\mathop{\sum }\limits_{l=1}^{r}\left(\begin{array}{l}{r}\\ {l}\end{array}\right)\left(1-{w}_{l,g}^{(r)}\right){Y}_{l,g}^{(r)}\right]$$
(17)

being δij the elements of the N × N identity matrix. Equations (15)–(17) define a generalized eigenvalue problem. The form taken by M and D is easily understood in this way. Each sum multiplying \({A}_{ij}^{(g,r)}\) in Mij represents the marginal contribution to the infection of node i coming from its neighbor j through the (g, r + 1)-clique they share. Given j in state I, \(\left(\genfrac{}{}{0.0pt}{}{r-1}{a}\right)\) is the number of ways in which a out of r − 1 nodes can be chosen to be in state I (a = l − 1) or S (a = r − l − 1). Similarly, each sum multiplying \({k}_{i}^{(g,r)}\) in Dii represents the contribution coming from all the configurations of any of the (g, r + 1)-cliques incident on node i, in which i is in state S. \(\left(\genfrac{}{}{0.0pt}{}{r}{l}\right)\) is the number of ways in which l out of r nodes can be chosen to be in state I.

Now, M is non-negative. Indeed, for any fixed (g, r), the sum multiplying \({A}_{ij}^{(g,r)}\) must be positive whenever \({A}_{ij}^{(g,r)}> 0\), since it represents the by-definition positive contribution to the infection of a node (i) coming from one of its neighbors (j). Moreover, being \({\mathcal{K}}\) undirected and so M symmetric, it follows that M is also irreducible44. Looking now at D, which is diagonal, the non-negativity of M implies the diagonal elements of D to be positive. Indeed, Eq. (15) holds for any value of the microscopic parameters; so let us suppose μ = 0. Since the non-zero elements of M are positive \(\forall \mu \in \left[0,1\right]\), the sum multiplying \({k}_{i}^{(g,r)}\) in Dij must be negative whenever \({A}_{ij}^{(g,r)}\,> \, 0\), proving any diagonal element of D to be positive. Therefore, D is invertible and its inverse as well, with elements \({\left[{D}^{-1}\right]}_{ii}={\left({D}_{ii}\right)}^{-1}\), i = 1, …, N. Applying D−1 to both sides of Eq. (15), we finally get the sought eigenvalue equation,

$$M^{\prime} {{\bf{P}}}^{I}={{\bf{P}}}^{I}$$
(18)

where we have defined the matrix \(M^{\prime} \equiv {D}^{-1}M\), of elements \({M}_{ij}^{\prime}={M}_{ij}/{D}_{ii}\). Thus, \(M^{\prime} \) is a non-negative irreducible matrix as well and, by the Perron-Frobenius theorem44, it admits a unique leading eigenvector \({{\bf{P}}}_{\star }^{I}\). This is the only one associated with the largest eigenvalue \({{{\Lambda }}}_{\max }\left(M^{\prime} \right)\) and the only one with all its entries positive, hence representing the unique physically acceptable expected state of the system at the onset of the epidemic. Fixed the values of the recovery probability, μ, and the higher-order infection probabilities, \({\{{\beta }^{(s)}\}}_{s \,{> }\,1}\), the critical threshold \({\beta }_{{\rm{cr}}}^{(1)}\) is implicitly found as the smallest non-negative value of β(1) such that

$${{{\Lambda }}}_{\max }\left(M^{\prime} \right)=1$$
(19)

Mean-field approximation

The homogeneous mean-field (MF) approximation of Eq. (8) is found by neglecting both the state correlations and the local structural heterogeneity among the nodes, i.e., regarding every node as the "average node” in the structure45. Given any (g, r)-clique {j1, …, jr}, it follows \({P}_{{j}_{1}\ldots {j}_{l}{j}_{l+1}\ldots {j}_{r},g}^{I\ldots IS\ldots S}={\rho }^{l}{\left(1-\rho \right)}^{r-l}\); and, for any node i, \({k}_{i}^{(g,r)}={\overline{k}}^{(g,r)}\), where \({\overline{k}}^{(g,r)}\) is the average value of the (g, r)-degree of the nodes in the structure. Thus, Eq. (8) becomes

$$\rho \left(t+1\right)=\rho \left(t\right)\left(1-\mu \right)+\left(1-\rho \left(t\right)\right)\left(1-\mathop{\prod}\limits_{(g,r)}{\overline{q}}_{g}^{(r)}\right)$$
(20)

where

$${\bar{q}}_{g}^{(r)}=\left[1-\mathop{\sum }\limits_{l = 1}^{r}\left(\begin{array}{l}{r}\\ {l}\end{array}\right)\left(1-{w}_{l,g}^{(r)}\right){\rho }^{l} \left(1-\rho \right)^{r-l}\right]^{{{\bar{k}}^{(g,r)}}}$$
(21)

The stationary solution is then got imposing \(\rho \left(t+1\right)=\rho \left(t\right)\) in Eq. (20). The linearization around the epidemic-free state is then implemented by taking ρ = ϵ 1. Looking at Eq. (21), the only \({\mathcal{O}}(\epsilon)\) terms are given by l = 1, whatever the couple (g, r). That is, only the pair-wise probability β(1) contributes in the MF approximation. With few algebra, one gets the renowned formula

$${\beta }_{{\rm{cr}}}^{(1)}=\frac{\mu }{{\bar{k}}}$$
(22)

where \({\bar{k}}=\frac{1}{N}\mathop{\sum }\nolimits_{i = 1}^{N}{k}_{i}\), is the average degree of a node in \({{\mathcal{K}}}^{(1)}\).

More generally, Eq. (22) holds for any model treating the nodes states as independent, thus including the MMCA approximation of the MECLE, the MMCA and MF approximations of the simplicial ELE, and also, except for substituting \(\overline{k}\) with \(\overline{{k}^{2}}/\overline{k}\), the heterogeneous MF approximation45 of both. The same result is found in continuous time31.

Numerical simulations

The equilibrium value of the prevalence ρ is computed using synchronous Monte Carlo simulations and the quasistationary state (QS) method46. In the specific case of the simplicial 2-complexes, for each node iV, a simulated time step proceeds as follows: (1) if i is currently infected, it recovers with probability μ; (2) if i is currently susceptible, (2.1) it gets infected with probability \(1-{(1-{\beta }^{(1)})}^{{n}_{i}^{(1)}}\), being \({n}_{i}^{(1)}\) the number of currently infected neighbors of i through edges; (2.2) if i is still susceptible after sub-step (2.1), it gets infected with probability \(1-{(1-{\beta }^{(2)})}^{{n}_{i}^{(2)}}\), being \({n}_{i}^{(2)}\) the number of currently infected couples of neighbors of i through triangles. For higher dimensions, n > 2, step (2) consists of n − 2 additional sub-steps, analogously defined and ordered as shown here.

In accordance with the QS method, every time the absorbing state ρ = 0 is reached, it is replaced by one of the previously stored active states of the system, i.e., one of those states with at least an active individual. Since for finite systems, when approaching the critical point, a large number of realizations end up in the absorbing state, the QS method properly reduces to a single run the wasteful method of performing many simulations. We have made use of 50 stored active states and an update probability of 0.25. We have given the systems a transient time of 105 time steps, and then calculated ρ as an average over 2 × 104 additional time steps.

Connectivity structures

We have referred to various interaction structures. For the numerical evaluation, we have made use of synthetic simplicial 2-complexes with around N = 104 nodes, presenting dissimilar structural properties: regular structures, as the clique complexes built from a triangular lattice; homogeneous structures, generated from the random SC model30, in which both the (, 1)- and the (, 2)-degree follow a Poisson distribution; heterogeneous structures, derived from the Dorogovtsev-Mendes model41, having (, 1)- and (, 2)-degree distributions nearly following power-laws with exponent around 3.