Full reconstruction of simplicial complexes from binary time-series data

,


I. INTRODUCTION
In network science and engineering, a subfield of research is to find the network topology and nodal dynamical equations from data [1]. This is important because networks are ubiquitous in the real world but the details of their connection topology and the intrinsic dynamical systems governing the properties and physical observables of the network are often unknown. The details are desired not only for understanding but also for protecting, disabling, or controlling the network dynamical behaviors (depending on the specific applications), and a viable way is to solve the inverse problem of determining the network details through observational data if they are available. As for any inverse problems in mathematics and physical sciences, the network inverse problem is challenging. Previous works in this area focused on "conventional" networks with pairwise interactions only [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][16]. Existing methods include those which are based on drive-response [3,5], adaptive synchronization [2,11], noise correlation [6,15], compressive sensing [7,9,17], maximum likelihood estimation [13,14,16], and Granger causality [4,8]. The data can be from continuous-or discrete-time dynamical processes. For example, the drive-response and adaptive synchronization methods use data from continuous-time nonlinear coupled systems [2,3,5,11], while the maximum likelihood estimation method is suitable for data from discrete-time dynamics [13,14,16]. In this paper, motivated by the fact that high-order networks have become a stateof-the-art subfield of research in network science [18][19][20][21][22][23][24], we develop a reconstruction framework for finding from time-series data network topology with high-order interactions.
While pairwise or node-to-node interactions are the familiar type in networks, it has been recognized that high-order interactions are also ubiquitous and important. For example, in a social network, the collective recommendation of multiple friends can often be more persuasive than the recommendation of a single friend to convince the individual to buy a new product. In a rumor spreading process, a piece of false news is likely to be accepted by an individual if it is shared or promoted simultaneously by many people [25][26][27]. A similar situation occurs in neuronal networks, where a firing event is often the result of excitatory and inhibitory interactions among many neurons. In all these cases, the interaction arises simultaneously among a group of nodes in the network, and to describe the network by the conventional pairwise interactions is no longer adequate [28]: high-order interactions beyond the pairwise relationship must be taken into account. Mathematically, high-order interactions can be described as hypergraphs or simplicial complexes [29], i.e., networks containing high-order simplexes. In particular, a k-simplex describes the simultaneous interaction among (k + 1) nodes, where a zero-simplex specifies an isolated node (i.e., without any interaction), a 1-simplex represents the conventional pairwise interaction, a 2-simplex underlies the simultaneous interaction among three nodes, and so on, as shown schematically in Fig. 1.
The past two years have witnessed a growing interest in high-order networks. For example, random walks on hypergraphs were studied, where a walker chooses the next destination depending on the number and the size of the shared hyperedges [30]. A family of random walks on hypergraphs with a parameter controlling the bias of the dynamics towards hyperedges of small or large size was constructed and the impacts of walk strategy and walk time on community detection were elucidated [31]. The stability conditions of the general dynamical processes on hypergraphs were found [18], and a social contagion model on hypergraphs was constructed which presents dynamical phenomena such as first-and second-order transitions, bistability and hysteresis [32]. A simplicial model of social contagion was proposed and it was demonstrated that the reinforce- ment mechanisms in 2-simplex can lead to a discontinuous phase transition [33]. The impacts of the heterogeneity of simplicial complexes on the SIS (susceptible-infected-susceptible) spreading model with collective and individual contagion were analyzed [34], and a pair approximation theory to study the SIS dynamics in simplicial complexes was developed, which was argued to be more accurate than the Markov-chain and mean field methods [35]. A social communication model including idea integration and information transmission in simplicial complexes was proposed and the critical condition leading to the outbreak of information was identified [36]. In terms of network reconstruction, a statistical method to detect high-order interactions from network data of pairwise links was recently developed [21].
In this paper, we develop a framework to reconstruct complex networks with high-order interactions from time-series data. To be concrete, we focus on networks with 2-simplexes and assume that the dynamical process on the network is social contagion that generates binary time-series data. Our method is of the statistical inference type pivoted on maximum likelihood estimation, with the aim to fully reconstruct both pairwise interactions (links) and 2-simplexes at the same time, thereby distinguishing our work from the recent method based on link data [21]. In particular, the central task is to estimate the probabilities of each node connecting to the reconstruction or target node (pairwise interaction) and of any two nodes forming a three-body 2-simplex with the target node. We articulate a two-step process to greatly enhance the computational efficiency and an effective truncation process to determine the final reconstructed structure of the simplicial complex. Using three synthetic and four real-world simplicial complexes, we demonstrate the remarkable accuracy of our reconstruction method and establish its robustness with respect to variations in the average degree of the network and stochastic fluctuations. Our work represents the first effort in reconstructing complex networks with high-order interactions based on observed time-series data.

A. Basics
Simplicial complexes. A k-simplex σ is formed by a filled clique of a set of k + 1 nodes [v 0 , · · · , v k ], which defines a (k + 1)-body interaction [37]. As shown in Fig. 1, a 0-simplex is a single node, a 1-simplex is two nodes connected by an edge, a 2-simplex is three nodes connected pairwisely by edges and with an additional single face, i.e., a triangle, and a 3-simplex is four vertices connected pairwisely by edges and joined by four faces, which are filled in to form a solid tetrahedron, and so on. A simplicial complex K composed of a set of nodes V is a collection of simplexes, with the additional requirement [37,38] that if a simplex is in K (σ ∈ K ), then any simplex ρ composed of subsets of simplex σ should also be included in K . For example, a 2-simplicial complex K is a collection of 0-, 1-and 2-simplexes.
Social contagion dynamics. Peer influence and reinforcement mechanisms are ubiquitous in the dynamical process of social contagion [39], from which high-order interactions in the network are originated. A social contagion model taking reinforcement into account on 2-simplicial complexes was proposed [33], which exploits the SIS type of spreading dynamics with binary-state dynamical variables. In particular, let S t i be the state of node i at time t. Each node has two possible states: susceptible (S t i = 0) or infected (S t i = 1). At the initial time, a fraction ρ 0 of nodes is infected. A susceptible node i can get infection from an infected neighbor j through their pairwise interaction (i, j) with probability β 1 . Node i can also be infected through a 2-simplex (i, j, k), where both j and k have already been infected, with the probability β 2 , and this event can be understood as a synergistic reinforcement effect. In general, we have β 2 > β 1 so as to ensure that the role of 2-simplex is embodied in the spreading dynamics. For convenience, we set β 1 = α/k 1 and β 2 = ω/k 2 , where α and ω (α < ω) are two nonzero positive constants, k 1 and k 2 are the average degrees of two-body and three-body interactions in a 2-simplicial complex, respectively. Each infected node recovers to the susceptible state with the probability µ.
For SIS process taking place on a 2-simplicial complex of size N, the available time-series data representing the states of nodes at different time steps can be stored in a data matrix S, where each row is a time string representing all nodes' states at that time step and each column is a node's state at different time steps. See Sec. A in Methods for an example.
Quantification of reconstruction performance. We use F1 score to quantify the reconstruction accuracy [40], a global performance indicator defined as where P = TP/(TP + FP) and R = TP/(TP + FN), with TP, FP, TN, FN being the numbers of true positive, false positive, true negative and false negative classes, respectively. A larger value of F1 corresponds to a higher accuracy and F1 = 1 indicates that the original network structure has been fully reconstructed with zero error.
B. Main results: Reconstructing synthetic and real-world simplicial complexes Figure 2 presents results of reconstructing three synthetic 2-simplicial complexes (see Sec. B in Methods on how these networks are constructed), where the squares and circles represent the reconstruction accuracy for two-body and three-body connections, respectively. Several features can be seen from Fig. 2. First, the reconstruction accuracy increases with the length T of the time series and can reach the unity value for T 8000. Second, the average degrees k 1 and k 2 of twobody and three-body simplexes, respectively, have different effects on the reconstruction accuracy. In particular, as shown in Figs. 2(a-c), a small value of k 1 tends to increase the reconstruction accuracy of both types of simplexes. This can be understood by noting that a small value of k 1 means that there are fewer two-body connections that need to be reconstructed, thereby enhancing the accuracy of the two-body connections for the same length of the time series. At the same time, fewer two-body connections reduce the complexity in reconstructing three-body connections and thereby improving the reconstruction accuracy. Regarding the effects of k 2 , Figs. 2(d-f) reveal that its value affects only the reconstruction accuracy of three-body connections and has little effect on the accuracy of reconstructing two-body connections that have no dependence on the three-body connections in a 2-simplicial complex. Third, the reconstruction accuracy of threebody interactions is lower than that of two-body interactions owing to the complicated structure of former and its dependence on the latter. Figure 3 shows the results of reconstructing four real-world 2-simplicial complexes: Hypertex-t2009 [41], Thiers12 [42], InVS15 [43] and LyonSchool [44,45] (see Sec. B in Method for details of these real-world networks). The basic parameters of these 2-simplicial complexes constructed from the data sets are listed in Tab. I. It can be seen from Fig. 3 that, as for the real-world networks, the reconstruction accuracies for both the two-body and three-body interactions increase with the length of the time series. Remarkably, these network structures are quite irregular, complicating the reconstruction. Nonetheless, for T = 20000, the F1 score can exceed 80%.
An issue of practical significance is the robustness of our reconstruction framework against random perturbations. To address this issue, we randomly flip a fraction ρ of infected states and the same number of susceptible states in the data matrix S (see Sec. A in Methods) and investigate the effect of ρ on the reconstruction accuracy as characterized by F1. The results are shown in   Fig. 4 for three synthetic 2-simplicial complexes and three real-world 2-simplicial complexes. It can be seen that increasing the fraction ρ of flipping leads to a reduction of F1. In particular, the value of F1 for the two-body connections can be as high as 50% even when 30% of the infected states have been flipped (ρ = 0.3), attesting to the robustness of our framework in reconstructing pairwise links against stochastic fluctuations in the data.

III. DISCUSSION
To find the network structure from observational data has been an active research field for more than fifteen years [1]. In previous studies, the term "network structure" is largely referred to as the collection of pairwise connections as characterized by the adjacent matrix of the network. Since the goal is to figure out whether there is a link between any two nodes, the existing methods focused on measures that are suitable for ascertaining the "two-body" interactions, such as those based on pairwise correlation or synchronization. Since the beginning of modern network science and engineering slightly over two decades ago, networks with only pairwise connections represent the standard setting of study. Likewise, the inverse problem of data-based discovery of the network structure has been exclusively carried out in this setting. To our knowledge, in the current literature, the problem of finding high-order connections in complex networks from time-series data has not been addressed.
High-order interactions are nonetheless ubiquitous in complex networks and its importance has been gradually recognized with an accumulating interest, eventually generating an explosive growth of research recently [18][19][20][21][22][23][24]. The structure of networks with high-order connections, also known as simplicial complexes, are represented by tensors of high orders. For example, threebody interactions or 2-simplexes in a network can be described by a tensor of rank 3. Structurally, simplicial complexes are significantly more sophisticated than the conventional networks with pairwise links only, and richer dynamics can be anticipated in the former, which have begun to be studied. From the point of view of inverse problem, to reconstruct simplicial complexes from time-series data represents a great challenge.
We have taken the first step to address this inverse problem. Focusing on complex networks with 2-simplexes, we have developed a statistical inference framework by which all two-body and three-body interactions in the network can be found simultaneously from binary time-series data only, i.e., no prior knowledge about the network to be reconstructed is required. The backbone of our reconstruction framework is maximum likelihood estimation that yields the probabilities of all the possible pairwise and three-body connections and a criterion to associate the probabilities with the actual interactions. To significantly increase the computational efficiency, we have proposed and tested a two-step process and a truncation process to determine the true structure of the simplicial complexes. The reconstruction framework has withstood tests on synthetic and real-world simplicial complexes with respect to accuracy and robustness against random fluctuations.
Many open problems remain. For example, our reconstruction framework is formulated in terms of binary time series data from social contagion dynamics. How to reconstruct high-order networks from data generated by different dynamical processes needs to be studied. Also, our statistical inference method is developed for 2-simplicial complexes that are perhaps the "simplest" network structure beyond the conventional networks with pairwise interactions. To reconstruct networks with higher-order interactions such as 3-simplicial complexes and hypergraphs is worth pursuing. It is also necessary to develop methods to improve the reconstruction accuracy with shorter time series. We hope our work will stimulate further research in this emerging subfield of data-based reconstruction of complex networks with high-order interactions.

A. Statistical inference framework
Our statistical inference framework for reconstructing 2-simplicial complexes has three steps: (1) establishing the likelihood function based on the available data matrix S; (2) obtaining the connection probabilities of two-and three-body interactions by maximizing the likelihood function according to the idea of the expectation maximization (EM) method, and (3) executing an improved two-step reconstruction strategy to significantly increase the computational efficiency.

Establishing the likelihood function
A 2-simplicial complex with N = 30 nodes and its data matrix are illustrated in Fig. 5(a) and 5(b), respectively. For such a network hosting SIS dynamics, the probability of a susceptible node i (i.e., S t i = 0) being infected (i.e., S t+1 i = 1) is determined only by the infected neighbors and the infected 2-simplexes in which two other nodes in the 2-simplex are both infected, at time t. The transition probability from the infected state to the susceptible state does not depend on the states of the neighbors, so it is only necessary to consider the transition probability from the susceptible state to the infected state for constructing the network. We stress that the details of the dynamical process, such as the infection probabilities β 1 and β 2 as well as the recovery probability µ, are assumed to be unknown but only the binary time series of the nodal states are available.
Let j → i denote the event that node j has a direct impact on the state of node i. For example, node j can directly spread the virus or send a piece of information to node i, which means that node j is one of immediate neighbors of node i. Nodes i and j thus form a 1-simplex, a property that is independent of time t. Similarly, let jk → i denote the event that the synergistic reinforcement effect coming from nodes j and k has a direct impact on the state of node i, which is also independent of time. In the following, we determine the probabilities of node i and node j being connected and of three nodes i, j, k forming a three-body connection (i, j, k).
The conditional probability of S t+1 i = 1 and j → i given S t j = 1 and S t i = 0 can be written as where P j→i P is the probability of node i being infected by node j under the conditions S t i = 0, S t j = 1 and S t+1 i = 1, P j→i > 0 indicates that node j is a neighbor of node i, and P i is the probability of S t+1 i = 1 under the conditions S t i = 0 and S t j = 1, which can be estimated from the data matrix S. Take the matrix in Fig. 5(b) as an example and suppose we wish to estimate the value of P 13 7 , where nodes 13 and 7 are highlighted by red and green frames, respectively. It is necessary to extract each pair including the time string with S t 13 = 0 and its next time strings at t + 1. It can be seen that seven pairs of time strings can be extracted: (t, t + 1), (t + 1, t + 2), (t + 2, t + 3), (t + 3, t + 4), (t + 4, t + 5), (t + 6, t + 7), and (t + 8, t + 9). It can also be seen that two time moments: t + 1 and t + 8, satisfy the conditions that node 13 is in the susceptible state and node 7 is in the infected state. The only time at which node 13 can be infected at the next time step is t + 8. As a result, we have P 13 7 = 1/2. Similarly, the conditional probability of S t+1 i = 1 and jk → i given S t j = 1, S t k = 1 and S t i = 0 can be written as is the probability of node i being infected through the synergistic interaction from nodes j and k, under the conditions S t i = 0, S t j = 1, S t k = 1, and S t+1 i = 1, and P jk→i > 0 indicates that the three nodes i, j, k form a 2-simplex. The can be estimated from the data matrix S in a similar way. Again, take the three nodes 13, 28 and 30 in Fig. 5(b) as an example. It can be seen that the time instants at which S t 13 = 0, S t 28 = 1 and S t 30 = 1 are fulfilled are t + 6 and t + 8. Because S t+7 13 = 1 and S t+9 13 = 1, we have P 13 28,30 = 1. According to Eqs. (2) and (3), the expected number of susceptible node i being infected at t m + 1 is given by where Ψ t m j and Ψ t m k represent the events that nodes j and k, respectively, are infected at time t m , and their values are zero or one. For example, if Ψ t m j = 1, it means that node j is infected at time t m ; otherwise, Ψ t m j = 0 when it is not infected at time t m . The quantity ε i represents the noise due to the errors from the collected data.
In general, the probability of a given number of events occurring in a fixed interval of time is characterized by the Poisson distribution, so we use it to capture the random nature of the times that node i is infected. An advantage of the Poisson distribution is that it makes a mathematical analysis and computations with the EM algorithm feasible [46][47][48][49]. In particular, the likelihood function can be described as where Θ denotes the set of variables P j→i , P jk→i and ε i . We have Ψ t m +1 i is either zero or one.

Maximizing the likelihood function based on EM algorithm
We next use the EM method to maximize the likelihood function [50] for determining the parameter Θ in Eq. (5). Taking the logarithm form of Eq. (5), we get (6) Applying the Jensen's inequality to the logarithmic term on the right side of Eq. (6) yields where the equality holds if and The maximization problem of Eq. (6) can then be transformed into maximizing the following equation: Calculating the partial derivative ofL (Θ) with respect to P j→i , P jk→i and ε i and setting them to zero, we get which give The six equations Eqs. (8)(9)(10) and Eqs. (15)(16)(17) can be used to solve P j→i , P jk→i and ε i . In particular, by initializing all values of P j→i ,P jk→i ,ε i (∀ j ̸ = k ̸ = i) to be one and then calculating the values of ρ t m j , ρ t m jk and ρ t m ε i in Eqs. (8-10), we substitute them into Eqs. (15)(16)(17) to find the values of P j→i , P jk→i and ε i . We repeat this process until convergence is achieved. Since a single iterative process does not ensure global optimization, we carry out the above iteration process a number of times and choose the proper values that give the maximum of the quantity in Eq. (11).
As an example, as shown in Fig. 5(c), the values of P j→13 and P jk→13 are given according to this iteration process, where P j→13 > 0 and the top 10 values of P jk→13 are demonstrated. Similarly, all the values of P j→i and P jk→i can be calculated for each node i. As presented in Fig. 5(d), each column above the abscissa corresponds to the predicted 1-simplex probabilities [the left subgraph of Fig. 5(d)] and 2-simplex probabilities [the right subgraph of Fig. 5(d)] of a node, and the blue and red dots denote the actual and nonexistent two-body or three-body connections, respectively.

An improved two-step reconstruction strategy
For a 2-simplicial complex structure with N nodes, when predicting the 2-simplexes of a node i, we randomly choose two nodes (e.g., j and k) and calculate the probability P jk→i , which requires calculating C 2 N−1 values. To reduce the computational load and increase the reconstruction accuracy, we articulate an improved two-step strategy. The particularity of simplicial complexes stipulates that the other two nodes forming a 2-simplex with node i must be the neighbors of node i, so it is not necessary to calculate the probability P jk→i if node j or node k is not a neighbor of node i. The reconstruction process can then be divided into two steps. At the first step, the "approximate" neighborhood of each node is predicted and their corresponding columns in the data matrix S are extracted, leading to a compressed data matrix. At the second step, based on the compressed data matrix, the values of P j→i and P jk→i for each node i are predicted by iterating Eqs. (8)(9)(10) and (15)(16)(17).
For the first step, the predicted neighbors are not accurate because the three-body interactions have been ignored. In fact, the main purpose of this step is to determine an approximate range of neighbors to reduce the time for calculating P jk→i (∀ j ̸ = k ̸ = i). Without taking into account threebody interactions, the expected number of susceptible nodes being infected at t m + 1 can simply be expressed as where the notation P 0 j→i is used to emphasize that node j is only an "approximate" neighbor of node i. Assuming that the number Ψ i of times of node i being infected in each time period obeys the Poisson distribution, we obtain the likelihood function as whereΘ denotes the set of variables P 0 j→i and ε i . Taking the logarithm of Eq. (19), we have Using the EM method to maximize the likelihood function, we obtain the final parametersΘ as where With the initial conditions for P 0 j→i and ε i , the values of P 0 j→i and ε i can be obtained by iterating Eqs. (21)(22)(23)(24) until convergence is achieved. It is worth noting that P 0 j→i is a probability and we need to determine the "approximate" neighbors of the node under reconstruction. Theoretically, the "approximate" neighbors can be determined by testing whether P 0 j→i is non-zero. However, practically this is not feasible due to noise or deviations from the assumptions. For example, as shown in Fig. 5(f), nodes 6 and 14 are not neighbors of node 13 even though P 0 6→13 = 0.0002 and P 0 14→13 = 0.0006. To overcome this difficulty, we articulate a truncation method for determining the neighbors of node i, as follows.
First, note that the time complexity of the second step can be significantly reduced when fewer neighbors are predicted, but too few predicted neighbors can lead to missing neighbors. On the contrary, too many neighbors would increase the time complexity. A solution is to use a reasonable truncation to determine the "approximate" neighbors of each node. To this end, we re-rank the probability P 0 j→i (∀ j ̸ = i) in a descending order and place a threshold ∆ i in the maximum gap defined as [14]: Next, we use Eq. (25) again to find a new threshold∆ i which is smaller than ∆ i . Finally, node j is regarded as an "approximate" neighbor of node i if P 0 j→i >∆ i . The truncation method can ensure the detection of all real neighbors and 2-simplexes.
Once the "approximate" neighbors of node i have been obtained, the time series of these neighbors can be extracted [Figs. 5(f-g)]. The neighbors of node i and its 2-simplexes can be quickly re-predicted based on the second step, i.e., by iterating Eqs. (8)(9)(10) and Eqs. (15)(16)(17) based on the compressed data matrix. For example, the prediction results for node 13 are shown in Fig. 5(h) and the values of P j→i (∀ j ̸ = i) and P jk→i (∀ j ̸ = k ̸ = i) for each node are presented in Fig. 5(i). The actual two-and three-body connections of each node can then be determined based on the results in Fig. 5(i). Because the identification of two-body connections has been refined in the second step, we simply assume that node j is a neighbor of node i if P j→i > 0. Following previous work [14,51], we assume that nodes i and j are connected when P j→i > 0 or P i→ j > 0.
The case of three-body interactions is more complicated and the solution is sensitive to noise or errors. In fact, using the condition P jk→i > 0 as a criterion to detect (i, j, k) as a 2-simplex can lead to many false positives. Our solution is to re-rank P jk→i (∀ j ̸ = k ̸ = i) in a descending order and obtain a new threshold∆ i by using Eq. (25) again. As a result, an actual 2-simplex (i, j, k) is formed when P jk→i ≥∆ i . To remove the conflicts in the prediction, we assume that there exists a 2-simplex (i, j, k) when two of three conditions hold at least, e.g., P jk→i ≥∆ i , P ik→ j ≥∆ j and P i j→k <∆ k , but a three-body cannot form when P jk→i ≥∆ i , P ik→ j <∆ j and P i j→k <∆ k . Implementing the two-step strategy, we can reconstruct the whole 2-simplicial complexes. As shown in Fig. 5(a), the 2-simplicial complex has been accurately reconstructed. Overall, our twostep strategy not only greatly reduces the computational time but also significantly improves the reconstruction accuracy.
B. Construction of synthetic and real-world 2-simplicial complexes

Synthetic 2-simplicial complexes
Here we describe the main steps of constructing synthetic 2-simplicial complexes of size N, average degrees of two-body and three-body interactions k 1 and k 2 , respectively.
Random simplicial complex (ERSC). First, a random graph is generated by connecting any two nodes with the probability p 1 . We then add 2-simplexes between any three nodes with the probability p 2 , where the formulas of p 1 and p 2 are [33]: .
A random 2-simplicial complex with the specified average degrees can then be constructed using the probabilities p 1 and p 2 .
Scale-free simplicial complex (SFSC). First, a scale-free network is generated, in which each new node connects m edges to the old nodes with degree preference [52]. We then add 2-simplexes between any three nodes according to probability p 2 in Eq. (27). The average degree of 1simplexes can be calculated as Small-world simplicial complex (SWSC). First, a small-world network [53] is generated from a regular lattice (all the nodes have the same degree 2m) with rewiring probability p. We then add 2-simplexes between any three nodes according to probability p 2 in Eq. (27). The average degree of 1-simplexes is given by Eq. (28).

2-simplicial complexes from real-world data
In each real-world data set, the face-to-face interactions have been measured with a temporal resolution of 20 seconds. First, we generate a weighted network according to the data, where a weight represents the number of interactions between a pair of nodes in the whole time window. Second, we remove any link whose weight is less than a given threshold ζ and set the weights of retained links to one to generate an unweighted network. Finally, we cut the data into multiple segments with a temporal window of 5 minutes and record all the 2-simplexes. In particular, if three nodes communicate with each other in a short time, they are regarded as constituting a threebody connection. We record the frequencies of the 2-simplexes in each segment. According to the total frequency in all segments, we retain the first 50% of the 2-simplexes with the highest frequencies and count them as the actual 2-simplexes. The visualization of four real-world 2simplicial complexes is shown in Figs. 3(a-d).

DATA AVAILABILITY
All relevant data are available from the authors upon request.  Fig. 5 Schematic illustration of reconstructing a 2-simplicial complex based on binary time-series data of social contagion dynamics. (a) A 2-simplicial complex of size N = 30, where the black links represent 1-simplexes (i.e., edges) and the orange shadows represent 2-simplexes. (b) Data matrix S that stores all nodes' states at different time steps, where each row is a time string representing all nodes' states at that time step and each column is a node's state at different time steps. The black and blank squares denote the 1 and 0 states, respectively. Take node 13 as an example (highlighted by the red frame). The values of P 13 j (∀ j ̸ = 13) and P 13 jk (∀ j ̸ = k ̸ = 13) can be calculated from data matrix S: P 13 7 = 1/2 (highlighted by the green frame) and P 13 28,30 = 1 (highlighted by the purple frame). (c) The values of P j→13 and P jk→13 are obtained through the EM algorithm, where only nonzero values of P j→13 and the top 10 values of P jk→13 are shown. (d) The values of P j→i (each column in the left subgraph) and P jk→i (each column in the right subgraph) for each node i based on the method described in Sec. IV A 1, where the blue and red dots denote the actual and nonexistent two-body or three-body connections, respectively. (e) The 2-simplicial complex is inferred based on the probabilities in (d), in which the two-body connections are exactly predicted, but two 2-simplexes (5,20,22) and (13,18,29) in (a) cannot be predicted (marked by the lightyellow shadows). (f) The values of P 0 j→13 for node 13 obtained by iterating Eqs. (21)(22)(23)(24), and only nonzero values are shown. (g) Compressed data matrix that records only the columns in the data matrix S giving P 0 j→13 > 0. (h) The values of P j→13 and P jk→13 for node 13 based on the compressed data in (g). (i) The values of P j→i and P jk→i for each node i. Finally, the full 2-simplicial complex in (a) can be exactly reconstructed by determining whether P j→i > 0 or P jk→i >∆ i .