Hypergraph reconstruction from uncertain pairwise observations

The network reconstruction task aims to estimate a complex system’s structure from various data sources such as time series, snapshots, or interaction counts. Recent work has examined this problem in networks whose relationships involve precisely two entities—the pairwise case. Here, using Bayesian inference, we investigate the general problem of reconstructing a network in which higher-order interactions are also present. We study a minimal example of this problem, focusing on the case of hypergraphs with interactions between pairs and triplets of vertices, measured imperfectly and indirectly. We derive a Metropolis-Hastings-within-Gibbs algorithm for this model to highlight the unique challenges that come with estimating higher-order models. We show that this approach tends to reconstruct empirical and synthetic networks more accurately than an equivalent graph model without higher-order interactions.


I. INTRODUCTION
Networks are a convenient model for the intricate structure of complex systems, in which interactions between any pair of the system's constituting elements can be directly interpreted as edges between the corresponding vertices of a graph.In typical network analyses, these pairwise interactions will initially be unknown as we cannot observe them directly; one must instead define a model of what is and is not an interaction and put this model to the data to identify the relevant network.For instance, we might define a pollinator and a plant species as interacting if a pollinator prefers a particular species over others.This definition will then let us infer a plant-pollinator interaction network by observing how often each pollinator visits each plant and processing the data with an appropriate statistical model [1,2].
Numerous methods have been proposed to perform this critical step of the network analysis process, commonly called graph reconstruction.They span a broad range of statistical and machine learning techniques and are often tailored to the specific field for which they have been developed [3].Gene regulatory networks, for instance, have been reconstructed with methods ranging from random forests [4] to methods based on Pearson correlation in temporal windows [5] or in ordinary differential equations [6].Bayesian frameworks based on genomic features [7] or random-walk-based algorithms [8] have been used to estimate protein-protein interaction networks; while brain networks have been measured with a broad range of methods like cross-frequency phase synchronization [9], Granger causality [10], and matrix-regularized network learning frameworks [11].More general methods have also been developed to reconstruct a broad range of datasets [12][13][14][15][16].
While convenient, graphs are fundamentally limited to encoding dyadic connections because higher-order interactions aren't always reducible to a set of pairwise ties [17,18].For example, empirical evidence shows that accounting for such higher-order interactions can enhance models of cortical dynamics [19], of the diversity of species in natural communities [20][21][22], and of social group formation [23].If they are to reap the benefits of such representations, network science methods should always be able to handle higher-order interactions whenever dyadic relationships are insufficient.
There has been significant recent progress in adapting the network science methods to higher-order representations [24], but the higher-order reconstruction problem has no fully satisfactory solution to date.For instance, direct generalizations of dyadic algorithms are impractical due to the extraordinary amount of data they require to function [24]-a network of n vertices can support up to 2 n hyperedges so naively measuring every edge becomes rapidly infeasible with growing n.Recent work has addressed this issue partially by using pairwise data to make inferences about possible higher-order structures [25] or filters on incomplete hyperedge data [26].However, no method to date can simultaneously handle reconstruction and noise in the pairwise measurements.
This paper introduces a Bayesian framework to infer higher-order structural interactions from imperfect pairwise measurements.We study a minimal example of this problem, focusing on the case of hypergraphs with interactions between pairs and triplets of vertices, measured imperfectly and indirectly.Instead of providing a point estimate, this framework offers a distribution of the possible hypergraphs compatible with all the available observations.The range of structures provided by this distribution allows us to compute error bars for various network measurements and the outcomes of network processes.We also present a network modeling approach that encodes the projection of hyperedges as different types of pairwise interactions to analyze the importance of the correlation induced by these higher-order interactions.Finally, we compare the reconstruction accuracy of these two frameworks on synthetic observations generated from various synthetic and empirical hypergraphs.

II. METHODS
Let us assume that we possess some measurements X = [x ij ] i,j=1,...,n of the pairwise interactions of the units of a complex system composed of n elements.In general reconstruction problems, these observations could take on many forms, such as time series correlation of brain regions [27] or the direct observation of the presence (or absence) of edges in a networked system [12], to name only two examples.To keep our presentation of the methods concrete, we will focus on the case where x ij is an integer number of observed interactions for vertices i and j.Our objective is to infer the interactions in a hidden latent structure S under the assumption that these interactions shape the observed behavior of the system (i.e., the measurements).This latent structure could be any type of structural representation such as graphs, simplicial complexes, or hypergraphs.
We expect the observation data to be noisy, meaning that remeasuring the system will lead to different values X.We also expect that similar (different) interactions in S could lead to very different (similar) measurements.For instance, two pairwise observations x ij and x rs could be identical even if the pair (i, j) interacts in S while (r, s) does not.To account for these fluctuations, we develop a Bayesian inference framework, a fully probabilistic approach producing a probability distribution over the different structures S compatible with the data X.

A. Data model
We first specify the likelihood P (X|S, µ), which expresses how the observations X are related to the latent structure S and any additional parameters of the observation processes µ.We assume that the structure S encodes three types of symmetrical interactions: each pair (i, j) can interact weakly ij = 1, interact strongly ij = 1 or not interact ij = 0.For instance, measurements X of a social network could be the number of conversations recorded between acquaintances ( ij = 1), friends ( ij = 2) or strangers ( ij = 0).
For a fairly broad range of measurement processes, it will often be reasonable to model the observed number of interactions x ij between vertices i and j with conditionally independent Poisson point processes.In such processes, the observation x ij is only determined by its associated type of interaction ij and average µ ij , leading to the likelihood where µ = (µ 0 , µ 1 , µ 2 ). Figure 1 illustrates the distribution of pairwise observations modeled by Eq. ( 1).This model will only be appropriate if the errors on two distinct measurements x ij and x rs are not correlated, and every x ij is the outcome of numerous independent observations of an ongoing measurement process with constant success rate.We make these assumptions to provide a simple illustration of our inference framework, but we stress that it is general enough to account for more general and diverse types of data, distributions, and structure.

B. Structural models
The next step is to specify the latent structural model P (S|φ), which is a prior probability on each interaction ij conditioned on some additional parameters collectively denoted by φ.This distribution encodes our hypothesis on the structure of interactions of the system before we make any measurements.For instance, we might expect person i to be more likely to develop a friendship with person j than with person k because i and j live in the same neighborhood.
To highlight the role of latent higher-order interactions in the reconstruction procedure (or lack thereof), we consider two models for the structure S: a hypergraph model (S = H) and a categorical-edge model with a simpler graph structure (S = G).
We define the hypergraph structure H = (V, E, T ) as a set of vertices V with 2-edges E and 3-edges T .We limit the size of the hyperedges to 3 for the sake of simplicity, although larger hyperedges could easily be considered by adapting the data model in Eq. (1) accordingly.We opt for a simple hypergraph model in which the existence of each hyperedge is conditionally independent from the others.Denoting as p (q) the probability of existence of (a) (b) FIG. 2. Examples of structural configurations with hidden edges.The presence or absence of the (a) 2-edge and (b) 3edge shown in red does not alter the type of interaction ij of the vertices, which is the same for all configurations.Hence, the likelihood in Eq. ( 1) has the same value, and we say that these red hyperedges are hidden by the other 3-edges.
3-edges (2-edges), the probability of H is where φ H = {p, q} are the parameters, h 1 = |E| is the number of 2-edges and h 2 = |T | is the number of 3-edges.
We connect this structure to the data model by assigning a type ij to each pair of vertices as where ∆ is the set of pairs covered by a 3-edge To make further progress, we must make a few arbitrary choices since the full model-the joint distribution of the data and latent structure-can be re-parametrized in ways that do not affect the distribution over labels and, therefore, over data.These symmetries will cause identifiability problems when we use the model to make inferences about latent hypergraphs, so we address them immediately.
First, since the mapping from hypergraph to labels is lossy, the presence of some edges can be hidden by others.For example, if vertices i and j are connected by both a 2-edge and a 3-edge, then the interaction will be considered of type ij = 2, as if the 2-edge did not existremoving them does not affect the interaction type and consequently does not change the value of the likelihood given at Eq. (1).3-edges can also hide other 3-edges, as depicted in Fig. 2. Hence, we must bear in mind that we will only be able to make inferences about "visible" edges.
Second, the full model is susceptible to label-switching and thus needs additional adjustments.Indeed, while a non-interacting pair ( ij = 0) and a pair of vertices connected by a 2-edge ( ij = 1) are associated with different distributions of observations because they have distinct means µ 0 and µ 1 , it is possible to change the structure H and the parameters µ in a way that will not affect the overall likelihood of a dataset X.This can be done by replacing every non-interacting pair of H by a 2-edge and vice-versa while also swapping the value of µ 0 and µ 1 .We address this label-switching symmetry it in the standard way by imposing that µ 0 < µ 1 or, equivalently, by thinking of non-interacting pairs as associated with a smaller expected number of interactions than interacting pairs.
The label ij = 2 can also technically be exchanged with the labels ij = 0 and ij = 1, but because they are inherited from a latent hypergraph that correlates multiple pairs of vertices, the problem will only manifest itself in very specific situations.Namely, every 2-edge has to belong to at least one triangle formed by two other 2edges or projected 3-edges (this worst-case hypergraph is described in section III C).Since a vanishing fraction of hypergraphs exhibit this specific configuration, imposing µ 1 < µ 2 is unnecessary to disambiguate most configurations.That said, in practice, we found it useful to impose µ 0 < µ 2 .Type 1 and type 2 interactions are typically sparse, which means that type 0 interactions are dense.Non-interacting pairs could therefore seem to form many triangles and could be interpreted as the projection of 3-edges.Imposing µ 0 < µ 2 avoids any confusion.

Categorical-edge model
Our second model involves graph with categorical edges G = (V, E 1 , E 2 ) defined as a set of vertices V , of weak edges E 1 , and of strong edges E 2 .The types of interaction are then Much like in the hypergraph case, we adopt an agnostic model and assume a priori that the categorical edges are placed randomly according to a simple two-step generative process: strong edges are created independently with probability q 2 and weak edges are created independently in the remaining unconnected pairs with probability where are the number of weak edges and strong edges respectively.There are no hidden edges in this model but the label switching problem is now three-fold: ij = 0 can be swapped with ij = 1, but also ij = 0 with ij = 2 and ij = 1 with ij = 2.As for the hypergraph model, we address this issue by imposing µ 0 < µ 1 < µ 2 .Hence, we suppose that non-interaction pairs are less frequently measured than interactions and that weak interactions are less frequently measured than the strong ones.8) and ( 9)], while the average structure contains the edges and hyperedges that exist in at least half of the samples of the posterior distributions.The estimator for the type of interaction, noted ˆ ij , is used to build the confusion matrix.It corresponds to the most likely type of interaction for vertices i and j.

C. Posterior distributions
Combining the quantities defined above, the Bayes formula yields the posterior distribution P (S, µ, φ|X) of each structural model P (S, µ, φ|X) = P (X|S, µ)P (S|φ)P (µ, φ) P (X) , where P (µ, φ) is a conjugate prior distribution (see Appendix A for details) and P (X) is a normalization factor that need not be specified.Combining Eqs. ( 2) and ( 6) with ( 7) yields the following posterior distributions and which both weight every structure-parameters tuple (S, µ, φ) according to their compatibility with the observations X and the prior probabilities.
Equations ( 8)-( 9) are not closed forms of known distributions, with the main complication being due to the presence of edge labels ij in the likelihood.Hence, any meaningful use of these posterior distributions will require the generation of samples from it, which in turn will be used to estimate statistics such as percentiles, the average and the variance of various functions f (S, µ, φ).To this end, we have derived a Metropolis-within-Gibbs algorithm whose details are discussed in Appendix.B. A C++/Python implementation is available at https://github.com/DynamicaLab/hypergraph-bayesian-reconstruction.

A. Case study: Zachary's Karate Club
We first illustrate the framework with a simple case study based on Zachary's Karate Club [28].Our goal will be to recover the latent structure of this system, encoded as a hypergraph H, given synthetic data X generated with the likelihood of Eq. ( 1) and µ 0 = 0.01, µ 1 = 20 and µ 2 = 30.These make it fairly easy to discern noninteracting pairs but lead to some overlap between the two other types of interactions, which will allow us to highlight the influence of higher-order interactions on the accuracy on the inference (see Fig. 1 which illustrates the TABLE I. Properties of the synthetic and empirical hypergraph datasets.The table shows the number of vertices (n), the number of type-1 and type-2 interactions, the relative reconstruction error ( ) for both the categorical edges graph model and the hypergraph model, as well as the fraction of 2-edges that are hidden under a 3-edge (E∆).The relative reconstruction error ( ) shown here is the median of the relative reconstruction error for 10 observation matrices generated with (µ0 = 0.01, µ1 = 40, µ2 = 50).With this hypergraph structure fixed, we generate a synthetic dataset X and approximate the posterior distribution using samples generated with the Monte Carlo Markov chain (MCMC) algorithms described in Appendix B. Using these samples, we then calculate two estimators of the structure: the maximum a posteriori (MAP) estimator

Hypergraph
corresponding to the latent structure that maximizes the posterior distribution, and the edge-wise estimator ŜEW that only contains the weak/strong edges or 2-edges/3edges with a marginal posterior probability above 0.5, e.g., for the hypergraph model where P (e|X) is the marginal probability that edge e is present.We complement these structural estimators with an estimator of the type of each pairwise interaction, the maximum marginal estimator where is the most likely type of interaction type for vertices i and j (ties are broken by choosing a type at random).
Figures 3c and 3d show ŜMAP and ŜEW for both models given one realization of the data X.In both cases, we see that our inference framework reconstructs the original structure quite accurately.Interestingly, we see that both estimators missed a few 3-edges.While some of them are genuine errors, quite a few missing 3-edges are simply hidden and thus unrecoverable (as defined in section II B).
Thankfully, these missing hidden 3-edges have little impact on the accuracy of our framework when it comes to predicting the interaction label ij .To show this, Figs.3e and 3f also display confusion matrices, a generalization of statistical errors (type I and type II errors) for multiple classes.In the confusion matrix, the element c rs is the number of times a pairwise interaction of type ij = r has been predicted as ˆ ij = s by the maximum marginal estimator ŜMM .Hence, a perfect reconstruction corresponds to a diagonal matrix.The major difference between both confusion matrices is that the categorical edges graph model uses weak edges and strong edges somewhat interchangeably, which results in classification errors in both ways.In contrast, the hypergraph model has no false positive 3-edges.This is due to the restrictive nature of 3-edges: each type-2 pairwise interaction must be associated with at least two other type-2 pairwise interactions (as long as the 3-edge is not hidden).As a result, our framework will err on a more conservative side when assigning larger hyperedges: The framework will assign ij = 1 unless there is sufficient evidence in the neighborhood of vertices i and j that supports a 3edge.This additional neighborhood information is what allows the hypergraph model to have a smaller sum of off-diagonal elements in the confusion matrix, meaning that it more accurately retrieves the interaction types.

B. Expanded dataset
Next, we study the performance of our inference framework on a broader collection of synthetic and empirical hypergraphs.The empirical datasets are a crime network [29], a network of sexual contacts [30], a plantpollinator network [31] and a network of spoken languages [32].The original datasets are all bipartite, so we again adapt them for the experiment by interpreting one of the two vertex types as hyperedges: individuals are vertices and crimes are hyperedges, sex workers are vertices and hyperedges are their clients, pollinators are the vertices and the plants they pollinate are hyperedges, vertices are countries and hyperedges are languages spoken.We ignore hyperedges with more than five vertices to keep a sufficient number of 2-edges in the hypergraph, and we also remove any isolated vertex.We also re-use the hypergraph derived from Zachary's karate above.
We complement these empirical datasets with hypergraphs generated with the three computer models, namely (i) the superimposed stochastic block model [33] (two unequal communities of 30 and 70 vertices with connection probabilities of q 11 = 0.05, q 12 = q 21 = 0.001 and q 22 = 0.02 for 2-edges, and of p 1 = 0.005 and p 2 = 0.0001 for 3-edges inside communities and p out = 0.00001 outside communities), (ii) a triangle-edge configuration model of 100 vertices [34] (with degrees drawn from independent geometric distributions of mean 2 and 3 for 2-edges and 3-edges, respectively), and (iii) the β-model for layered hypergraphs [35] (with vertex propensities of 2-edges and 3-edges drawn from normal distributions of averages −4.5 and −5 and of standard deviations 2.5 and 2, respectively).
As before, we generate a series of synthetic observations with the likelihood in Eq. ( 1) and µ = (µ 0 = 0.01, µ 1 = 40, µ 2 = 50), and then sample the posterior distribution to compute the confusion matrices of both models.We summarize our results using the fraction of misclassified type-1 and type-2 interactions, a quantity where c rs are the entries of the confusion matrix.The results are reported in Table I where we see that the hypergraph model performs at least as well as the categoricaledge model.The following section explores the factors influencing the performance of the hyperedge model.

C. When are the hyperedges most relevant
To gain better insights on the factors influencing the performance of the hyperedge model, we consider two extreme cases: a "worst-case hypergraph" and a "bestcase hypergraph".
In the best-case hypergraphs, groups of 3 vertices can only be connected by a 3-edge.This means that vertices (i, j, k) can form a triangle in projected pairwise interaction only if ij = ik = jk = 2.As a result, there is no ambiguity on whether or not triangles are a mix of 2-edges and projected 3-edges, and 3-edges can be distinguished from triangles of non-interacting pairs since they have greater pairwise measurements.This effectively makes the neighborhood of any pair of vertices very informative on its type of interaction.We generate such hypergraphs by removing the 2-edges that do not respect the imposed constraint from a hypergraph generated with the prior (2) (see Fig. 4).
The worst-case hypergraphs only contain 2-edges if they form a triangle in the projection.In other words, ij = 1 is only possible if there exists another vertex k such that ik jk > 0. As a result, there is no longer a difference in the observations between a 3-edge and a triangle comprised of a mixture of 2-edges and projected 3-edges; the neighborhood of a pairwise observation is uninformative.To produce these worst-case hypergraphs, we generate graphs with cliques of 2-edges where each triangle is promoted randomly to a 3-edge (see Fig. 4).
To check whether a given hypergraph resembles the best-case or the worst-case, we compute the proportion of 2-edges inside triangles The closer E ∆ is to 0, the closer the hypergraph is to a best-case hypergraph, and the closer the E ∆ is to 1, the closer the hypergraph is to a worst-case hypergraph.
Revisiting Table I, we see that E ∆ is related to the error and that errors for each hypergraph range between the best-case and the worst-case.However, the proportions {ρ k } also play a role in : when a type of interaction is being observed at a similar rate to another, models will most likely favor the type with the largest proportion as it leads to a better fit.The hypergraph model displays (a) a smaller misclassification error (b) a larger entropy and (c) lower residuals than the categorical edges graph model, which indicates a better reconstruction.Symbols represent the median, light colored shadings are percentiles 2.5 and 97.5 and dark colored shadings are percentiles 25 and 75 of the metrics for 100 synthetic observations.Residuals were evaluated using 200 predictive observation matrices and the best-case hypergraph was generated using p = 0.00017 and q = 0.019.
Table I also shows that empirical hypergraphs are generally closer to a best-case hypergraph than to a worstcase.This is due to the sparsity of interactions of empirical complex systems: we expect that most 2-edges are not part of projected triangles.For that reason, the hypergraph model works better than the categorical edges graph model for the majority of systems.And when the hypergraph model errs, both models tend to err as confirmed by the last two lines of Table I.

D. Impact of data means
To complete our analysis, we study the impact of the parameters µ on the reconstruction by varying µ 1 while keeping µ 0 = 0.05 and µ 2 = 50 fixed, for the two families of extreme hypergraphs described above (with n = 100 vertices).Doing so allows us to identify the regimes in which the hypergraph model displays a better performance.In addition to the relative reconstruction error , we also consider two additional summary statistics: the entropy S of the label distribution, and the sums of residuals R k .FIG. 6. Impact of the measurement rate (µ1) of type-1 interactions on the reconstruction of a worst-case hypergraph (see Fig. 5 for details).While the categorical edges graph has a similar performance to the best-case hypergraph (Fig. 5), the hypergraph model cannot distinguish 3-edges from 2-edges with triangles, which results in a worse reconstruction.This is seen with (a) a larger misclassification error (b) a smaller entropy and (c) larger residuals.The worst-case hypergraph was constructed from 20 5-cliques in which triangles were promoted to 3-edges with probability 0.19.
We define the entropy of the label distribution as where is the proportion of pairs predicted as type k.This statistic measures the effective number of interactions predicted by the models: it is 0 if only one type of interaction exists and it is Because the empirical datasets we consider are sparse, most pairs of vertices do not interact, meaning that S is small.Nevertheless, comparing entropy values allows us to detect when a model completely ignores a type of interaction.
The sums of residuals R k are defined as where X = [x ij ] i,j=1,...,n is an observation matrix generated synthetically from the posterior-predictive distribution [16,36].For each sample point S, μ ∼ P (S, µ|X), we generate predictive matrices X from the likelihood (1).This is known as a form of posterior-predictive check, and it quantifies the goodness of fit of a model by checking that the fitted model can adequately reproduce the original data.The statistics R k will reveal biases in the fitted model, with R k ≈ 0 only when the predicted pairwise observations xij are on average equal to the pairwise observations x ij for the interactions of type k.
Figures 5 and 6 show that the relative reconstruction error generally increases as µ 1 approaches µ 0 or µ 2 .This behavior is expected because there is a greater overlap between the corresponding Poisson distributions in the observations X.When this overlap is large, interaction types are represented similarly in the observations X, which makes them difficult to infer.Figures 5 and 6 also show that the entropy generally decreases and stabilizes to a lower plateau as µ 1 approaches µ 2 .This is due to a similar phenomenon: with the increasing overlap, models favor one type of interaction over the other to the point where one type of interaction disappears.Once the interaction types have "merged", the entropy remains constant.
For the best-case hypergraph, we clearly see in Fig. 5 that the hypergraph model overall outperforms the categorical edges graph model.Figure 5a shows that the hypergraph model makes very little classification errors for all sets of parameters.This translates to a higher entropy, as seen in Fig. 5b, and to a smaller predictive bias in Fig. 5c.We conclude that the worse performance observed for the categorical edges graph model is explained by weak and strong edges ending up being interchangeable because of their pairwise nature.Without the information from the neighborhood that 3-edges imply, the interaction type of a pair ij must be deduced from its observation x ij alone.
For the worst-case hypergraph, Fig. 6 illustrates that the categorical edges graph model slightly outperforms the hypergraph model.We believe this is due to the prior distribution of the 3-edge probability p: because there are n 3 possible 3-edges compared to n 2 possible 2-edges, there is a much larger number of 3-edges than strong edges for the same probability.In this worst-case setting, 3-edges are almost indistinguishable from 2-edges since triangles are mixture of 2-edges and projected 3edges.Thus, there is no improvement brought by the hypergraph model, which suggest that this hypergraph representation is not appropriate.

IV. CONCLUSION
Mounting evidence collected in recent years support that the behavior of many complex systems require taking into account high-order interactions.However, many of the tools of this rapidly expanding field have yet to find practical applications still as measurements of higherorder systems remains challenging to this day We presented a minimal Bayesian inference framework that makes progress in this direction, by reconstructing hypergraphs from noisy observations of their pairwise projection.Using synthetic and empirical datasets, we illustrated the impact that taking into account high-order interactions has on the accuracy of the reconstruction.Notably, we identified the regimes where high-order interactions yield fewer reconstruction errors, due to the fact that hyperedges require the use of local information contained in the neighborhood of vertices.
Although the inference framework introduced here is fairly general, we illustrated it using simple data and hypergraph models to avoid obfuscating its presentation unnecessarily.Thus, future work should be done to apply our framework to hypergraphs with hyperedges larger than 3-edges, and to non-Poissonian data models.Doing so will require to treat carefully the way higher-order interactions are assumed to be encoded in the pairwise observation data; as we have shown, hidden hyperedges can hinder high quality reconstruction.A possible solution worth investigating involves the use of simplicial complexes, a more restricted higher-order structure in which a hyperedge of size k implies every hyperedge of size k − 1.Yet, how to connect pairwise interactions to such higher-order interactions remains an open question and is a testament to the bright future Bayesian inference of higher-order interactions has over the coming years.
Combining Eqs. ( 2) and (B1) yields for the hypergraph model where µ − = min{µ 1 , µ 2 }.Since this step is revisited often by our algorithm, we combine three sampling methods to ensure rapid and accurate sampling in all cases [39]: rejection sampling using a gamma distribution if the rejection probability is low, a more costly inverse transform sampling using incomplete gamma inverse function, and rejection sampling with an adjusted "linear distribution" if all other methods fail.The main interest in using the linear distribution is that it provides a good approximation of the density for small intervals.The inverse transform sampling often works, but can suffer from numerical instabilities especially for small truncation intervals.
We define the linear probability density function as where c is the slope.A sample from this distribution is obtained using its inverse cumulative distribution function where u is a continuous random variable uniformly distributed on [0, 1].
In the rejection sampling algorithm, the support of this distribution is adjusted to match the truncated gamma distribution and c is the slope of a line connecting the truncated gamma density evaluated at the lower bound to the density evaluated at the upper bound.

Sampling graphs with categorical edges
The distribution used to sample the categorical edges graph model is derived by following a similar reasoning as for Eq.(B1).We first observe that P (S|θ, X) = P (S, θ|X) P (θ|X) ∝ P (S, θ|X).

(B10)
Combining this expression with Eqs. ( 9) and (A1) yields The edge labels ij induce complicated interactions between the parameters, so we turn to a Metropolis-Hastings (MH) algorithm to generate samples from this distribution as it does not appear to correspond to a well known closed-form distribution.The MH algorithm is initialized at the ground truth hypergraph projection and the ground truth parameters except in Table I where it is initialized at a graph with no strong edges and weak edges wherever x ij > 0 and at parameters µ and φ G set to the maximum likelihood estimator obtained from a Poisson mixture model.At each iteration, we propose to increment a interaction type with probability η and to decrement a interaction type with probability 1 − η.We use η = 0.5 in our numerical simulations.
If the algorithm reaches a point where the graph is fully connected with strong edges (or empty), than we propose to decrement (or increment) a type with probability 1.The pair (i, j) whose type is to be decremented is chosen uniformly among all pairs whose type is not zero.The pair (i, j) whose type is to be incremented is chosen proportionally to the weight The proposal probability of a new graph G * conditioned on the current graph G is where a = 1 if the label is to be incremented and a = 0 if it is to be decremented.Finally, the proposal is accepted with probability where Q(G|G * , X) is the probability of reverting the proposed move.
If the algorithm reaches a point where either no 2-edge or 3-edge can be added (removed), then one is removed (added) with probability 1.If a move in which a hidden 2edge should be added/removed has been chosen and that move is not possible (e.g.there are no hidden 2-edge to be removed), a completely new move is randomly chosen.
The proposed move, that would transform the hypergraph H into a new one H * , is accepted with probability We now detail the proposal probability ratio Q(H|H * ,X) for each of the 6 possible moves.When a 3-edge is to be removed, it is chosen uniformly among the existing 3-edges.When a 3-edge is to be added, the three vertices (i, j, k) are chosen in three steps: pick i ∼ P (i), pick j ∼ P (j|i) and pick k ∼ P (k|i) where Since the order in which vertices are chosen does not matter, the probability that triplet (i, j, k) is chosen is P (i, j, k) = 2P (i)P (j|i)P (k|i) + 2P (j)P (i|j)P (k|j) + 2P (k)P (i|k)P (j|k).(B18) If this selection process results in the triplet (i, j, j) or chooses an existing 3-edge, then the proposed move is automatically rejected since the distribution is only supported on simple hypergraphs.Altogether, the proposal probability ratio for moves involving 3-edges can be summarized as When a 2-edge needs to be removed, it is chosen uniformly among the existing 2-edges.When a 2-edge (i, j) needs to be added, it is chosen proportionally to the weight Altogether, the proposal probability ratio for moves involving 2-edges can be summarized as  Our definition of the types of interactions ij [Eq.( 3)] implies that hidden 2-edges do not contribute to the likelihood [Eq.( 1)]; their addition/removal depends solely on the hypergraph model.However, the cost of removing a 3-edge depends on the number of hidden 2-edges underneath.Because of this asymmetry, we found that running the MH algorithm with the four previous moves tend to get stuck with certain configurations of hidden 2edges.Our solution has been to propose two additional moves specifically targeting hidden 2-edges.
To propose the addition/removal of a hidden 2-edges, we first regroup every existing hidden 2-edges into a set (B23) In the simulations, we use χ 0 = 0.99 and χ 1 = 0.01 as we want to remove more frequently than add hidden 2-edges.

Convergence
We stop the two previous MH algorithms whenever the likelihood stabilizes, meaning the chains have reached stationarity.We consider that this has happened when the relative change in the average likelihood of the last W iterations is smaller than a tolerance parameter δ.We use W = 20000 and δ = 0.02 in our simulations.
Further, to ensure the MH algorithm runs long enough but not too long, we set a minimum I min and maximum I max number of iterations.We adjust these values empirically with a test run, but they are roughly I min = 200000 and I max = 1000000.Finally, for each posterior distribution sample, we run four chains and keep the one with the highest average likelihood.

Appendix C: Regime µ1 > µ2 and confusion matrices
We mentioned in Sec.II B that conditions such as µ 1 < µ 2 or µ 1 > µ 2 need not be imposed in the prior distributions since 2-edges and 3-edges are fundamentally different.As a complement to the analysis presented in Sec.III C, we investigate the case where µ 2 is varied between µ 0 = 0.05 and µ 1 = 50.
Comparison between Figs. 5 and 6 and Figs.7 and 8 suggests that both scenarios are quite similar, as expected.In particular, note that the apparent swap in the sums of residuals for the categorical edges graph model is simply due to the redefinition of ij to accommodate the restriction that µ 1 < µ 2 in the model.Indeed we redefine ij =      1 (i, j) ∈ E 2 , 2 (i, j) ∈ E 1 , 0 otherwise.

(C1)
The only noteworthy difference between the two sets of simulations occurs when µ 2 approaches µ 0 .We observe the same phenomenon than when µ 1 approaches µ 2 from the left: the information in the neighborhood used by the hypergraph model allows for a more accurate reconstruction (i.e., smaller , larger S).Interestingly, this effect is also apparent in the worst-case hypergraphs.Because the structures we consider are sparse, many non-interacting pairs form triangles such that when µ 2 approaches µ 0 , they can easily be confused with 3-edges.For instance, the element c21 is the proportion of projected 3-edges predicted as 2-edges in the hypergraph model.
When the categorical edges graph model ends up inferring only one type of interaction, there are two equivalent reconstructed graphs: all interactions are weak edges or all interactions are strong edges.Noting that in less extreme cases, the model naturally favors strong edges due to the larger associated variance in the likelihood, we set all interactions to strong edges whenever classifies them all as a weak edges.
We see that for the best-case structure in Figs. 9 and 7d, the hypergraph model makes little to no error.As we increase µ 1 , we also observe a gradual increase of the number of misclassified weak edges for the categorical edges model.For the worst-case structure, the results in Figs. 10 and 8d show the the hypergraph model favors 3edges and that the categorical edges model favors strong edges.

FIG. 1 .
FIG. 1. Illustration of a typical distribution of pairwise interactions X produced by the data model.The frequencies of the pairwise interactions are shown in gray.The contribution of each type of interaction to the likelihood is shown in red.

FIG. 3 .
FIG. 3. Example of the method inference process on a small dataset.(a) Original network of Zachary's karate club [28].(b) Hypergraph representation of the Zachary's karate club, see main text.(c) Illustration of the structure corresponding to the estimators ŜMAP and ŜEW for the categorical-edge model.Strong edges are shown in orange.(d) Same as (c) but using the hypergraph model.(e) Confusion matrix built using the ŜMM estimators for the interaction types.(f) Same as (e) but using the hypergraph model.The inference was done on synthetic observations generated using µ0 = 0.01, µ1 = 20 and µ2 = 30.The Maximum a posteriori (MAP) structure maximizes the posterior distributions [Eqs.(8) and (9)], while the average structure contains the edges and hyperedges that exist in at least half of the samples of the posterior distributions.The estimator for the type of interaction, noted ˆ ij , is used to build the confusion matrix.It corresponds to the most likely type of interaction for vertices i and j.

FIG. 4 .
FIG.4.Illustration of the generation of the best-case and worst-case hypergraphs.(a) The best-case hypergraphs are obtained by first generating a random hypergraph using Eq.(2) and then removing any 2-edge that creates triangle when projecting the hypergraph onto the pairwise interactions.(b) The worst-case hypergraphs are generated from a graph with cliques of 2-edges and in which each triangle can be promoted to a 3-edge with a given probability.

FIG. 5 .
FIG. 5.Impact of the measurement rate (µ1) of type-1 interactions on the reconstruction of a best-case hypergraph.(a) Relative reconstruction error .(b) Entropy S. (c) Sums of residuals R k .The observations were generated with µ0 = 0.01, µ2 = 50 and various µ1 using the hypergraph model (blue) and the categorical edges graph model (orange).The hypergraph model displays (a) a smaller misclassification error (b) a larger entropy and (c) lower residuals than the categorical edges graph model, which indicates a better reconstruction.Symbols represent the median, light colored shadings are percentiles 2.5 and 97.5 and dark colored shadings are percentiles 25 and 75 of the metrics for 100 synthetic observations.Residuals were evaluated using 200 predictive observation matrices and the best-case hypergraph was generated using p = 0.00017 and q = 0.019.

FIG. 7 .
FIG. 7. Impact of the measurement rate (µ2) of type-2 interactions on the reconstruction of a best-case hypergraph.(a) Relative reconstruction error .(b) Entropy S. (c) Sums of residuals R k .(d) Normalized confusion matrix.The observations were generated with µ0 = 0.01, µ1 = 50 and various µ2 using the hypergraph model (blue) and the categorical edges graph model (orange).The hypergraph model displays (a, d) less misclassification errors (b) a larger entropy (c) lower residuals than the categorical edges graph model, which indicates a better reconstruction.See the caption of Fig. 5 for details on the numerical experiment.

FIG. 8 .
FIG. 8. Impact of the measurement rate (µ2) of type-2 interactions on the reconstruction of a worst-case hypergraph.(a) Relative reconstruction error .(b) Entropy S. (c) Sums of residuals R k .(d) Normalized confusion matrix.The observations are generated with µ0 = 0.01, µ1 = 50 and various µ2 using the hypergraph model (blue) and the categorical edges graph model (orange).The hypergraph model displays (a, d) more misclassification errors (b) a smaller entropy (c) greater residuals than the categorical edges graph model, which indicates a worse reconstruction.See the captions of Figs. 5 and 6 for details on the numerical experiment.

1 FIG. 9 . 1 ,
FIG.9.Normalized confusion matrix associated to the simulation of Fig.5.The categorical edges graph model favors the strong edges when µ1 approaches µ2, which leads to an inferior reconstruction compared to the hypergraph model that commits little to no error.

1 FIG. 10 .
FIG. 10.Normalized confusion matrix associated to the simulation of Fig.6.While the categorical edges model still favors strong edges to weak edges, the hypergraph model favors more strongly 3-edges and displays a worse performance for the worst-case hypergraph.

Figures 9 ,
10, 7d and 8d show the normalized confusion matrix for the best-case and worst-case hypergraphs.The entries of the normalized confusion matrix crs are the proportion of interactions of type ij = r that were predicted as ˆ ij = s by the model crs = c rs c r0 + c r1 + c r2 .(C2)