Full reconstruction of simplicial complexes from binary contagion and Ising data

Previous efforts on data-based reconstruction focused on complex networks with pairwise or two-body interactions. There is a growing interest in networks with higher-order or many-body interactions, raising the need to reconstruct such networks based on observational data. We develop a general framework combining statistical inference and expectation maximization to fully reconstruct 2-simplicial complexes with two- and three-body interactions based on binary time-series data from two types of discrete-state dynamics. We further articulate a two-step scheme to improve the reconstruction accuracy while significantly reducing the computational load. Through synthetic and real-world 2-simplicial complexes, we validate the framework by showing that all the connections can be faithfully identified and the full topology of the 2-simplicial complexes can be inferred. The effects of noisy data or stochastic disturbance are studied, demonstrating the robustness of the proposed framework.

I n network science and engineering, a subfield of research is to find the network topology and nodal dynamical equations from data 1 . This is important because networks are ubiquitous in the real world but the details of their connection topology and the intrinsic dynamical systems governing the properties and physical observables of the network are often unknown. The details are desired not only for understanding but also for protecting, disabling, or controlling the network dynamical behaviors (depending on the specific applications), and a viable way is to solve the inverse problem of determining the network details through observational data if they are available. As for any inverse problems in mathematics and physical sciences, the network inverse problem is challenging. Previous works in this area focused on "conventional" networks with pairwise interactions only [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][16] . Existing methods include those which are based on drive-response 3,5 , adaptive synchronization 2,11 , noise correlation 6,15 , compressive sensing 7,9,17 , maximum likelihood estimation 13,14,16 , and Granger causality 4,8 . The data can be from continuous-or discrete-time dynamical processes. For example, the drive-response and adaptive synchronization methods use data from continuous-time nonlinear coupled systems 2,3,5,11 , while the maximum likelihood estimation method is suitable for data from discrete-time dynamics 13,14,16 . In this paper, motivated by the fact that higher-order networks have become a state-of-the-art subfield of research in network science [18][19][20][21][22][23][24] , we develop a reconstruction framework for finding from time-series data network topology with higher-order interactions.
While pairwise or node-to-node interactions are the familiar type in networks, it has been recognized that higher-order interactions are also ubiquitous and important. For example, in a social network, the collective recommendation of multiple friends can often be more persuasive than the recommendation of a single friend to convince the individual to buy a new product. In a rumor spreading process, a piece of false news is likely to be accepted by an individual if it is shared or promoted simultaneously by many people [25][26][27] . A similar situation occurs in neuronal networks, where a firing event is often the result of excitatory and inhibitory interactions among many neurons. In all these cases, the interaction arises simultaneously among a group of nodes in the network, and to describe the network by the conventional pairwise interactions is no longer adequate 28 : higher-order interactions beyond the pairwise relationship must be taken into account. Mathematically, higher-order interactions can be described as hypergraphs or simplicial complexes 29 , i.e., networks containing higher-order simplexes. In particular, a ksimplex describes the simultaneous interaction among (k + 1) nodes, where a zero-simplex specifies an isolated node (i.e., without any interaction), a 1-simplex represents the conventional pairwise interaction, a 2-simplex underlies the simultaneous interaction among three nodes, and so on.
The past three years have witnessed a growing interest in higher-order networks 30 . For example, random walks on hypergraphs were studied, where a walker chooses the next destination depending on the number and the size of the shared hyperedges 31 . A family of random walks on hypergraphs with a parameter controlling the bias of the dynamics towards hyperedges of small or large size was constructed and the impacts of walk strategy and walk time on community detection were elucidated 32 . The stability conditions of the general dynamical processes on hypergraphs were found 18 , and a social contagion model on hypergraphs was constructed which presents dynamical phenomena such as firstand second-order transitions, bistability and hysteresis 33 . A simplicial model of social contagion was proposed and it was demonstrated that the reinforcement mechanisms in 2-simplex can lead to a discontinuous phase transition 34 . The impacts of the heterogeneity of simplicial complexes on the SIS (susceptible-infected-susceptible) spreading model with collective and individual contagion were analyzed 35 , and a pair approximation theory to study the SIS dynamics in simplicial complexes was developed, which was argued to be more accurate than the Markov-chain and mean field methods 36 . A social communication model including idea integration and information transmission in simplicial complexes was proposed and the critical condition leading to the outbreak of information was identified 37 . A simplicial activity driven model was proposed and the impact of both simplicial and temporally evolving interactions were analyzed 38 . In terms of network reconstruction, a statistical method to detect higher-order interactions from network data of pairwise links has recently been developed 21 .
In this paper, we develop a framework to reconstruct complex networks with higher-order interactions from data. To be concrete, we focus on networks with 2-simplexes and assume that the dynamical processes on the network are social contagion and simplicial Ising dynamics that generate binary time-series data. Our method is of the statistical inference type pivoted on maximum likelihood estimation, with the aim to fully reconstruct both pairwise interactions (links) and 2-simplexes at the same time, thereby distinguishing our work from the recent method based on link data 21 . In particular, the central task is to estimate the probabilities of each node connecting to the reconstruction or target node (pairwise interaction) and of any two nodes forming a three-body 2-simplex with the target node. We articulate a twostep process to greatly enhance the computational efficiency and an effective truncation process to determine the final reconstructed structure of the simplicial complex. Using three synthetic and four real-world simplicial complexes, we demonstrate the accuracy of our reconstruction method and establish its robustness with respect to variations in the average degree of the network and stochastic fluctuations. Our work represents an initial effort in reconstructing complex networks with higher-order interactions based on observed time-series data.

Results
Simplicial complexes. A k-simplex σ is formed by a filled clique of a set of k + 1 nodes v 0 ; Á Á Á ; v k Â Ã , which defines a (k + 1)-body interaction 39 . A 1-simplex is two nodes connected by an edge, a 2-simplex is three nodes connected pairwisely by edges and with an additional single face, i.e., a triangle, and a 3-simplex is four vertices connected pairwisely by edges and joined by four faces, which are filled in to form a solid tetrahedron, and so on. A simplicial complex K composed of a set of nodes V is a collection of simplexes, with the additional requirement 39,40 that if a simplex is in K (σ 2 K), then any simplex ϱ composed of subsets of simplex σ should also be included in K. For example, a 2-simplicial complex K is a collection of 0-, 1-and 2-simplexes.
Social contagion dynamics. Peer influence and reinforcement mechanisms are ubiquitous in the dynamical process of social contagion 41 , from which higher-order interactions in the network are originated. A social contagion model taking reinforcement into account on 2-simplicial complexes was proposed 34 , which exploits the SIS type of spreading dynamics with binary-state dynamical variables. In particular, let S t i be the state of node i at time t. Each node has two possible states: susceptible (S t i ¼ 0) or infected (S t i ¼ 1). At the initial time, a fraction ρ 0 of nodes is infected. A susceptible node i can get infection from an infected neighbor j through their pairwise interaction (i, j) with probability β 1 . Node i can also be infected through a 2-simplex (i, j, k), where both j and k have already been infected, with the probability β 2 , and this event can be understood as a synergistic reinforcement effect. For convenience, we set β 1 = α/k 1 and β 2 = ω/k 2 , where α and ω are two non-zero positive constants, k 1 and k 2 are the average degrees of the two-body and three-body interactions in a 2-simplicial complex, respectively. In general, we have α < ω so as to ensure that the role of 2-simplex is embodied in the spreading dynamics. Each infected node recovers to the susceptible state with the probability μ. In our work, the values of β 1 and β 2 are selected near their respective thresholds to facilitate efficient and accurate reconstruction of 2-simplicial complexes. The effects of varying the values of β 1 and β 2 on the reconstruction accuracy are also studied (Sec. I of Supplementary Information (SI)).
Simplicial Ising dynamics. The Ising model arises in many fields due to its fundamental role in phase transitions in statistical physics. It has also been applied to many social systems 42,43 . While the Ising dynamics on networks have been extensively studied [44][45][46] , previous studies were exclusively conducted for networks with pairwise interactions only. To our knowledge, Ising dynamics on networks with higher-order interactions have not been studied.
To address the synergistic reinforcement effect of 2-simplex, we define a simplicial Ising dynamics on 2-simplicial complexes. Each node has two possible states: spin-down (S t i ¼ À1) or spinup (S t i ¼ þ1). At the initial time, the state of each node i is randomly assigned as +1 or −1 with equal probability. Defining the Hamiltonian as: where J 1 and J 2 are the strengths of two-body and three-body interactions, and (i, j) and (i, j, k) denote the two-body and the three-body connections in the 2-simplicial complex, respectively. The first term in the Hamiltonian characterizes the interaction between the edges (i.e., two-body connections) and the second term contains three-body interactions from the 2-simplex. At each time step, the spin-flipping probability of each node i is given by ð1 þ e δΔE t i Þ À1 , where δ is the inverse temperature. The represents the change in the energy caused by a flipping of node i at time t, where ∂ i and ∇ i are the 1-simplex set and the 2-simplex set containing node i, respectively.
Statistical inference framework. For SIS and Ising processes taking place on a 2-simplicial complex of size N, the available time-series data representing the states of nodes at different time steps can be stored in a data matrix S, where each row is a time string representing all nodes' states at that time step and each column is a node's state at different time steps. We reconstruct 2-simplicial complexes from the data matrix S with our statistical inference framework. This task consists of three steps: (1) establishing the likelihood function based on the available data matrix S; (2) obtaining the connection probabilities of two-and three-body interactions by maximizing the likelihood function according to the idea of the expectation maximization (EM) method, and (3) executing an improved two-step reconstruction strategy to significantly increase the computational efficiency. The details of the framework are described in Methods.
Quantification of reconstruction performance. We use F1 score to quantify the reconstruction accuracy 47 , a global performance indicator defined as where P = TP/(TP + FP) and R = TP/(TP + FN), with TP, FP, TN, FN being the numbers of true positive, false positive, true negative and false negative classes, respectively. A larger value of F1 corresponds to a higher accuracy and F1 = 1 indicates that the original network structure has been fully reconstructed with zero error.
Reconstructing synthetic and real-world simplicial complexes. For readability, the results from the social contagion dynamics are presented in the main text, while those from the simplicial Ising dynamics are presented in Sec. III of SI. Figure 1 presents results of reconstructing three synthetic 2-simplicial complexes (see Sec. "Construction of synthetic and real-world 2-simplicial complexes" in Methods on how these networks are constructed), where squares, diamonds and circles denote the performance of reconstructing the two-body connections while triangles with different orientations demonstrate the performance of reconstructing three-body connections. Several features can be seen from Fig. 1. First, the reconstruction accuracy increases with the length T of the time series and can reach the unity value for T ≳ 8000. Second, the average degrees k 1 and k 2 of two-body and three-body simplexes, respectively, have different effects on the reconstruction accuracy. In particular, as shown in Fig. 1a-c, a small value of k 1 tends to increase the reconstruction accuracy of both types of simplexes. This can be understood by noting that a small value of k 1 means that there are fewer twobody connections that need to be reconstructed, thereby enhancing the accuracy of the two-body connections for the same length of the time series. At the same time, fewer two-body connections reduce the complexity in reconstructing three-body connections and thereby improving the reconstruction accuracy. Regarding the effects of k 2 , Fig. 1d-f reveal that its value affects only the reconstruction accuracy of three-body connections and has little effect on the accuracy of reconstructing two-body connections that have no dependence on the three-body connections in a 2-simplicial complex. Third, the reconstruction accuracy of three-body interactions is lower than that of twobody interactions owing to the complicated structure of former and its dependence on the latter. Figure 2 shows the results of reconstructing four real-world 2simplicial complexes: Hypertext2009 48 , Thiers12 49 , InVS15 50 , and LyonSchool 51,52 (see Sec. "Construction of synthetic and realworld 2-simplicial complexes" in Methods for the details of these real-world networks). The basic parameters of these 2-simplicial complexes constructed from the datasets are listed in Table 1. It can be seen from Fig. 2 that, as for the real-world networks, the reconstruction accuracies for both the two-body and three-body interactions increase with the length of the time series. Remarkably, these network structures are quite irregular, complicating the reconstruction. Nonetheless, for T = 20,000, the F1 score can exceed 80%.
An issue of practical significance is the robustness of our reconstruction framework against random perturbations. To address this issue, we randomly flip a fraction f of infected states and the same number of susceptible states in the data matrix S (see Sec. "Details of the statistical inference framework" in Methods) and investigate the effect of f on the reconstruction accuracy as characterized by F1. The results are shown in Fig. 3 for three synthetic 2-simplicial complexes and three real-world 2simplicial complexes. It can be seen that increasing the fraction f of flipping leads to a reduction of F1. In particular, the value of F1 for the two-body connections can be as high as 50% even when 30% of the infected states have been flipped (f = 0.3), attesting to the robustness of our framework in reconstructing pairwise links against stochastic fluctuations in the data.

Discussion
To find the network structure from observational data has been an active research field for more than fifteen years 1 . In previous studies, the term "network structure" is largely referred to as the collection of pairwise connections as characterized by the adjacency matrix of the network. Since the goal is to figure out whether there is a link between any two nodes, the existing methods focused on measures that are suitable for ascertaining the "two-body" interactions, such as those based on pairwise correlation or synchronization. From the beginning of modern network science and engineering slightly over two decades ago, networks with only pairwise connections have represented the standard setting of study. Likewise, the inverse problem of databased discovery of the network structure has been exclusively carried out in this setting. To our knowledge, in the current literature, the problem of finding higher-order connections in complex networks from time-series data has not been addressed.
Higher-order interactions are nonetheless ubiquitous in complex networks and its importance has been gradually recognized  Thiers12 InVS15 LyonSchool with an accumulating interest, eventually generating an explosive growth of research recently [18][19][20][21][22][23][24] . The structure of networks with higher-order connections, also known as simplicial complexes, are represented by tensors of high orders. For example, threebody interactions or 2-simplexes in a network can be described by a tensor of rank 3. Structurally, simplicial complexes are significantly more sophisticated than the conventional networks with pairwise links only, and richer dynamics can be anticipated in the former, which have begun to be studied. From the point of view of inverse problem, to reconstruct simplicial complexes from time-series data represents a great challenge.
We have taken the first step to address this inverse problem. Focusing on complex networks with 2-simplexes, we have developed a statistical inference framework by which all two-body and three-body interactions in the network can be found simultaneously from binary time-series data only, i.e., no prior knowledge about the network to be reconstructed is required. The backbone of our reconstruction framework is maximum likelihood estimation that yields the probabilities of all the possible pairwise and three-body connections and a criterion to associate the probabilities with the actual interactions. To significantly increase the computational efficiency, we have proposed and tested a two-step process and a truncation process to determine the true structure of the simplicial complexes. The reconstruction framework has withstood tests on synthetic and real-world simplicial complexes with respect to accuracy and robustness against random fluctuations.
Many open problems remain. For example, our reconstruction framework is formulated in terms of binary time-series data from social contagion dynamics and simplicial Ising dynamics (Sec. III of SI). How to reconstruct higher-order networks from data generated by different dynamical processes needs to be studied. Also, our statistical inference method is developed for 2-simplicial complexes that are perhaps the "simplest" network structure beyond the conventional networks with pairwise interactions. To reconstruct networks with higher-order interactions such as 3-simplicial complexes and hypergraphs is worth pursuing. It is also necessary to develop methods to improve the reconstruction accuracy with shorter time series. We hope our work will stimulate further research in this emerging subfield of data-based reconstruction of complex networks with higher-order interactions.

Methods
Details of the statistical inference framework. We describe the details of our statistical inference framework through an illustrative example, as shown in Fig. 4, where a 2-simplicial complex with N = 30 nodes and its data matrix are illustrated N is the number of nodes, k 1 and k 2 are the average degrees of two-body and three-body connections, respectively, ζ is a threshold to filter out certain connections with low interaction frequency. in Fig. 4a, b, respectively. For such a network hosting SIS dynamics, the probability of a susceptible node i (i.e., S t i ¼ 0) being infected (i.e., S t þ 1 i ¼ 1) is determined only by the infected neighbors and the infected 2-simplexes in which two other nodes in the 2-simplex are both infected, at time t. The transition probability from the infected state to the susceptible state does not depend on the states of the neighbors, so it is only necessary to consider the transition probability from the susceptible state to the infected state for constructing the network. We stress that the details of the dynamical process, such as the infection probabilities β 1 and β 2 as well as the recovery probability μ, are assumed to be unknown but only the binary time series of the nodal states are available. Figure 4 presents an illustrative example to describe the details of our method.
Establishing the likelihood function. Let j → i denote the event that node j has a direct impact on the state of node i. For example, node j can directly spread the virus or send a piece of information to node i, which means that node j is one of immediate neighbors of node i. Nodes i and j thus form a 1-simplex, a property b that is independent of time t. Similarly, let jk → i denote the event that the synergistic reinforcement effect coming from nodes j and k has a direct impact on the state of node i, which is also independent of time. In the following, we determine the probabilities of node i and node j being connected and of three nodes i, j, k forming a three-body connection (i, j, k).
The conditional probability of S t þ 1 i ¼ 1 and j → i given S t j ¼ 1 and S t i ¼ 0 can be written as indicates that node j is a neighbor of node i, and P i which can be estimated from the data matrix S. Take the matrix in Fig. 4b as an example and suppose we wish to estimate the value of P 13 7 , where nodes 13 (i.e., node 13 is the target node) and 7 are highlighted by red and green frames, respectively. It is necessary to extract each pair including the time string with S t 13 ¼ 0 and its next time strings at t + 1. It can be seen that seven pairs of time strings can be extracted: (t, t + 1), (t + 1, t + 2), (t + 2, t + 3), (t + 3, t + 4), (t + 4, t + 5), (t + 6, t + 7), and (t + 8, t + 9). It can also be seen that two time moments: t + 1 and t + 8, satisfy the conditions that node 13 is in the susceptible state and node 7 is in the infected state. The only time at which node 13 can be infected at the next time step is t + 8. As a result, we have P 13 7 ¼ 1=2. Similarly, the conditional probability of S tþ1 i ¼ 1 and jk → i given S t j S t k ¼ 1 and S t i ¼ 0 can be written as i ¼ 1Þ is the probability of node i being infected through the synergistic interaction from nodes j and k, under the conditions S t i ¼ 0, S t j S t k ¼ 1, and S t þ 1 i ¼ 1, and P jk→i > 0 indicates that the three 1Þ can be estimated from the data matrix S in a similar way. Again, take the three nodes 13, 28, and 30 in Fig. 4b as an example. It can be seen that the time instants at which S t 13 ¼ 0, S t 28 ¼ 1 and S t 30 ¼ 1 are fulfilled are t + 6 and t + 8. Because S t þ 7 13 ¼ 1 and S t þ 9 13 ¼ 1, we have P 13 28;30 ¼ 1. According to Eqs. 3, 4, the expected number of susceptible node i being infected at t m + 1 is given by where Ψ otherwise, Ψ t m j ¼ 0 when it is not infected at time t m . The quantity ε i represents the noise due to the errors from the collected data.
In general, the probability of a given number of events occurring in a fixed interval of time is characterized by the Poisson distribution, so we use it to capture the random nature of the times that node i is infected. An advantage of the Poisson distribution is that it makes a mathematical analysis and computations with the EM algorithm feasible [53][54][55][56] . Specifically, the likelihood function can be described as where Θ denotes the set of variables P j→i , P jk→i and ε i . We have Ψ is either zero or one.
Maximizing the likelihood function based on EM algorithm. We next use the EM method to maximize the likelihood function 57 for determining the parameter Θ in Eq. 6. Taking the logarithm form of Eq. 6, we get Applying the Jensen's inequality to the logarithmic term on the right side of Eq. 7 yields Fig. 4 Schematic illustration of reconstructing a 2-simplicial complex based on binary social contagion. a A 2-simplicial complex of size N = 30, where the black links represent 1-simplexes and the orange shadows represent 2-simplexes. b Data matrix S that stores all nodes' states at different time steps, where each row is a time string representing all nodes' states at that time step and each column is a node's state at different time steps, where the black and blank squares denote the 1 and 0 states, respectively. Take node 13 as an example (the red frame), the values of P 13 j 8j ≠ 13 ð Þand P 13 jk 8j ≠ k ≠ 13 ð Þcan be calculated from data matrix S: P 13 7 ¼ 1=2 (the green frame) and P 13 28;30 ¼ 1 (the purple frame). c The values of P j→13 and P jk→13 are obtained through the EM algorithm, where only non-zero values of P j→13 and the top 10 values of P jk→13 are shown. d The values of P j→i (each column in the left subgraph) and P jk→i (each column in the right subgraph) for each node i based on the method described in Secs. 4.1.1 and 4.1.2, where the blue and red dots denote the actual and nonexistent two-body or three-body connections, respectively. e The 2-simplicial complex is inferred based on the probabilities in (d), in which the two-body connections are exactly predicted, but two 2-simplexes (5,20,22) and (13,18,29) in (a) cannot be predicted (marked by the light-yellow shadows). f The values of P 0 j!13 for node 13 obtained by iterating Eqs. 22-25, and only non-zero values are shown. g Compressed data matrix that records only the columns in S giving P 0 j!13 > 0. h The values of P j→13 and P jk→13 for node 13 based on the compressed data in (g). i The values of P j→i and P jk→i for each node i. Finally, the full 2-simplicial complex in (a) can be exactly reconstructed by determining whether P j→i > 0 or P jk!i >Δ i .
The maximization problem of Eq. 7 can then be transformed into maximizing the following equation: Calculating the partial derivative ofL Θ ð Þ with respect to P j→i , P jk→i and ε i and setting them to zero, we get The six equations Eqs. 9-11 and Eqs. 16-18 can be used to solve P j→i , P jk→i and ε i . In particular, by initializing all values of P j ! i ;P jk ! i ;ε i 8j ≠ k ≠ i À Á to be one and then calculating the values of ρ t m j , ρ t m jk and ρ t m ε i in Eqs. 9-11, we substitute them into Eqs. 16-18 to find the values of P j→i , P jk→i , and ε i . We repeat this process until convergence is achieved. Since a single iterative process does not ensure global optimization, we carry out the above iteration process a number of times and choose the proper values that give the maximum of the quantity in Eq. 12.
As an example, as shown in Fig. 4c, the values of P j→13 and P jk→13 are given according to this iteration process, where P j→13 > 0 and the top 10 values of P jk→13 are demonstrated. Similarly, all the values of P j→i and P jk→i can be calculated for each node i. As presented in Fig. 4d, each column above the abscissa corresponds to the predicted 1-simplex probabilities (the left subgraph of Fig. 4d) and 2-simplex probabilities (the right subgraph of Fig. 4d) of a node, and the blue and red dots denote the actual and nonexistent two-body or three-body connections, respectively.
An improved two-step reconstruction strategy. For a 2-simplicial complex structure with N nodes, when predicting the 2-simplexes of a node i, we randomly choose two nodes (e.g., j and k) and calculate the probability P jk→i , which requires calculating 0:0ptN À 12 À Á values. To reduce the computational load and increase the reconstruction accuracy, we articulate an improved two-step strategy. The particularity of simplicial complexes stipulates that the other two nodes forming a 2-simplex with node i must be the neighbors of node i, so it is not necessary to calculate the probability P jk→i if node j or node k is not a neighbor of node i. The reconstruction process can then be divided into two steps. At the first step, the "approximate" neighborhood of each node is predicted and their corresponding columns in the data matrix S are extracted, leading to a compressed data matrix. At the second step, based on the compressed data matrix, the values of P j→i and P jk→i for each node i are predicted by iterating Eqs. 9-11 and 16-18. Our two-step method was not designed for the general challenging task of consistently inferring all the subfaces for arbitrarily higher-order simplices. In fact, our method requires the closure condition of simplicial complexes: it is necessary to know in advance that the network under reconstruction is a 2-simplicial complex. Given this premise, the two-step strategy infers first the two-body and then the three-body interactions (i.e., 2-simplex) from the inferred two-body interactions. While the two-step method is efficient to reconstruct 2-simplicial complexes, at the present it cannot be used to reconstruct the hypergraphs because its second step is to find the triangles from the neighbors (i.e., edges).
For the first step, the predicted neighbors are not accurate because the threebody interactions have been ignored. In fact, the main purpose of this step is to determine an approximate range of neighbors to reduce the time for calculating P jk ! i 8j ≠ k ≠ i À Á . Without taking into account three-body interactions, the expected number of susceptible nodes being infected at t m + 1 can simply be expressed as where the notation P 0 j!i is used to emphasize that node j is only an "approximate" neighbor of node i. Assuming that the number Ψ i of times of node i being infected in each time period obeys the Poisson distribution, we obtain the likelihood function as whereΘ denotes the set of variables P 0 j!i and ε i . Taking the logarithm of Eq. 20, we have ; ð24Þ With the initial conditions for P 0 j!i and ε i , the values of P 0 j!i and ε i can be obtained by iterating Eqs. 22-25 until convergence is achieved. It is worth noting that P 0 j!i is a probability and we need to determine the "approximate" neighbors of the node under reconstruction. Theoretically, the "approximate" neighbors can be determined by testing whether P 0 j!i is non-zero. However, practically this is not feasible due to noise or deviations from the assumptions. For example, as shown in Fig. 4f, nodes 6 and 14 are not neighbors of node 13 even though P 0 6!13 ¼ 0:0002 and P 0 14!13 ¼ 0:0006. To overcome this difficulty, we articulate a truncation method for determining the neighbors of node i, as follows.
First, note that the time complexity of the second step can be significantly reduced when fewer neighbors are predicted, but too few predicted neighbors can lead to missing neighbors. On the contrary, too many neighbors would increase the time complexity. A solution is to use a reasonable truncation to determine the "approximate" neighbors of each node. To this end, we re-rank the probability P 0 j ! i 8j ≠ i À Á in a descending order and place a threshold Δ i in the maximum gap defined as 14 : Next, we use Eq. 26 again to find a new threshold Δ i which is smaller than Δ i . Finally, node j is regarded as an "approximate" neighbor of node i if P 0 j ! i > Δ i . The truncation method can ensure the detection of all real neighbors and 2-simplexes.
Once the "approximate" neighbors of node i have been obtained, the time series of these neighbors can be extracted (Fig. 4f, g). The neighbors of node i and its 2-simplexes can be quickly re-predicted based on the second step, i.e., by iterating Eqs. 9-11 and Eqs. 16-18 based on the compressed data matrix. For example, the prediction results for node 13 are shown in Fig. 4h and the values of P j ! i 8j ≠ i À Á and P jk ! i 8j ≠ k ≠ i À Á for each node are presented in Fig. 4i. The actual two-and threebody connections of each node can then be determined based on the results in Fig. 4i. Because the identification of two-body connections has been refined in the second step, we simply assume that node j is a neighbor of node i if P j→i > 0. Following previous work 14, 58 , we assume that nodes i and j are connected when P j→i > 0 or P i→j > 0.
The case of three-body interactions is more complicated and the solution is sensitive to noise or errors. In fact, using the condition P jk→i > 0 as a criterion to detect (i, j, k) as a 2-simplex can lead to many false positives. Our solution is to rerank P jk ! i 8j ≠ k ≠ i À Á in a descending order and obtain a new thresholdΔ i by using Eq. 26 again. As a result, an actual 2-simplex (i, j, k) is formed when P jk ! i ≥Δ i . To remove the conflicts in the prediction, we assume that there exists a 2-simplex (i, j, k) when two of three conditions hold at least, e.g., P jk ! i ≥Δ i , P ik ! j ≥Δ j , and P ij ! k <Δ k , but a three-body cannot form when P jk ! i ≥Δ i , P ik ! j <Δ j , and P ij ! k <Δ k . Implementing the two-step strategy, we can reconstruct the whole 2-simplicial complexes. As shown in Fig. 4a, the 2-simplicial complex has been accurately reconstructed. Overall, the two-step strategy not only greatly reduces the computational time but also significantly improves the reconstruction accuracy (more details in Sec. II in SI).
Construction of synthetic and real-world 2-simplicial complexes Synthetic 2-simplicial complexes. Here we describe the main steps of constructing synthetic 2-simplicial complexes of size N, average degrees of two-body and threebody interactions k 1 and k 2 , respectively.
Random simplicial complex (ERSC): First, a random graph is generated by connecting any two nodes with the probability p 1 . We then add 2-simplexes between any three nodes with the probability p 2 , where the formulas of p 1 and p 2 are 34 : A random 2-simplicial complex with the specified average degrees can then be constructed using the probabilities p 1 and p 2 .
Scale-free simplicial complex (SFSC): First, a scale-free network is generated, in which each new node connects m edges to the old nodes with degree preference 59 . We then add 2-simplexes between any three nodes according to probability p 2 in Eq. 28. The average degree of 1-simplexes can be calculated as Small-world simplicial complex (SWSC): First, a small-world network 60 is generated from a regular lattice (all the nodes have the same degree 2m) with rewiring probability p. We then add 2-simplexes between any three nodes according to probability p 2 in Eq. 28. The average degree of 1-simplexes is given by Eq. 29.
2-simplicial complexes from real-world data. In each real-world data set, the face-toface interactions have been measured with a temporal resolution of 20 s. First, we generate a weighted network according to the data, where a weight represents the number of interactions between a pair of nodes in the whole time window. Second, we remove any link whose weight is less than a given threshold ζ and set the weights of retained links to one to generate an unweighted network. Finally, we cut the data into multiple segments with a temporal window of 5 min and record all the 2-simplexes. In particular, if three nodes communicate with each other in a short time, they are regarded as constituting a three-body connection. We record the frequencies of the 2-simplexes in each segment. According to the total frequency in all segments, we retain the first 50% of the 2-simplexes with the highest frequencies and count them as the actual 2-simplexes. The visualization of four real-world 2-simplicial complexes is shown in Fig. 2a-d.