Reconstructing direct and indirect interactions in networked public goods game

Network reconstruction is a fundamental problem for understanding many complex systems with unknown interaction structures. In many complex systems, there are indirect interactions between two individuals without immediate connection but with common neighbors. Despite recent advances in network reconstruction, we continue to lack an approach for reconstructing complex networks with indirect interactions. Here we introduce a two-step strategy to resolve the reconstruction problem, where in the first step, we recover both direct and indirect interactions by employing the Lasso to solve a sparse signal reconstruction problem, and in the second step, we use matrix transformation and optimization to distinguish between direct and indirect interactions. The network structure corresponding to direct interactions can be fully uncovered. We exploit the public goods game occurring on complex networks as a paradigm for characterizing indirect interactions and test our reconstruction approach. We find that high reconstruction accuracy can be achieved for both homogeneous and heterogeneous networks, and a number of empirical networks in spite of insufficient data measurement contaminated by noise. Although a general framework for reconstructing complex networks with arbitrary types of indirect interactions is yet lacking, our approach opens new routes to separate direct and indirect interactions in a representative complex system.

contrast to the networked two-player game, such as the prisoner's dilemma game, snowdrift game and ultimatum game 5,6 , in the networked PGGs, players play the PGGs with not only their immediate neighbors but also their neighbors' neighbors 23 . However, only direct interactions are associated with links among nodes, whereas indirect interactions are not. Thus, the key for achieving network reconstruction lies in distinguishing between direct and indirect interactions. We accomplish this goal by developing a two-step strategy. Firstly, we reconstruct a combined matrix composed of both direct and indirect interactions in terms of the mapping of the reconstruction problem into a sparse signal reconstruction problem and employ the Lasso to solve the problem 34,35 . Secondly, we identify all direct interactions (links) from the combined matrix by virtue of matrix transformation and optimization method. For homogeneous networks, full reconstruction can be achieved by using our method from a small amount of data. For heterogeneous networks, full reconstruction of networks can be achieved as well, but requires much larger amounts of data because of the existence of hubs. To better mimic the real situation, data measurement that is contaminated by noise has been used to implement reconstruction. We find that high reconstruction accuracy can be still achieved, demonstrating the robustness of our approaches against noise. Moreover, our method can be used to identify hidden node without any accessible information and reconstruct the connections among the rest nodes.
It is noteworthy that quite recently two methods, namely the silencing method and network deconvolution are proposed to separate direct correlation and indirect correlation to infer direct connections 36,37 . However, the indirect interactions are different from indirect correlation that naturally presents in any complex networks. Indirect correlation implies similar behavior between two disconnected nodes, whereas indirect interactions mean that there are physical interactions between two nodes without immediate connection. In other words, two nodes can indirectly interact with each other through a common neighbor of them in spite of the lacking of a direct connection between them. It is imperative to distinguish between direct and indirect interactions; otherwise, fake connections corresponding to indirect interactions will arise, accounting for the failure of network reconstruction. Although a general framework for reconstructing networked systems with arbitrary dynamics and any types of indirect interactions is still an open question at present, our work as the first attempt to deal with the PGG systems opens a route towards eventually resolve the problem in a comprehensive manner.

Results
Networked public goods games. In the original PGG with m ≥ 2 players, at each round, every player is allocated with an endowment of e points, and is required to contribute c i (0 ≤ c i ≤ e) points to a common pool. The total contribution is multiplied with an enhancement factor b (1 < b < m) and the result is distributed among all m members. Thus, the payoff of player i in the original PGG is In the networked PGGs, a player, say i, can take part in k i + 1 original PGGs (one self-centered and k i neighbor-centered) simultaneously, and contribute the same c i to all k i + 1 public pools. Thus, the total payoff of player i obtained from the networked PGGs at a given round t can be formulated as where Γ i is the set of i's neighbors and N is the number of players in the networked PGGs, adjacency matrix A ≡ {a ij } N×N represents direct links among players (a ij = 1 if i and j are connected and a ij = 0 otherwise). In Eq. (1), the first term on the right-hand side is the payoff of participating in the self-centered PGG, and the second term is the payoffs of participating in k i neighbor-centered PGGs. In the self-centered PGG, player i interacts with his/her direct neighbors, whereas in the neighbor-centered PGGs, player i interacts with not only his/her direct neighbors but also the neighbors' neighbors, as shown in Fig. 1(a). Figure 1(b,c) illustrate the need for distinguishing between direct and indirect interactions in the PGG. As shown in Fig. 1(b), the payoff of focal player 1 stems from two PGGs, one self-centered PGG and the other PGG centered on player 2. The indirect interaction (the dashed line) between player 1 and 3 stems from the fact that they both participate in the PGG centered on player 2 and their payoff is affected by the action of each other, although there is no direct link between them. It is necessary to accurately discern whether the interaction between player 1 and 3 is direct or indirect interactions; otherwise, if we fail to distinguish the interaction and treat the indirect interaction as a direct interaction (see Fig. 1(c)), the payoff of the focal player 1 will become quite different from the actual payoff. In this fake scenario, the focal player 1 participates in the self-centered PGG, the PGG centered on player 2 and the PGG centered on player 3, which apparently leads to a different payoff from the actual payoff of player 1 in Fig. 1(b). Thus, it is imperative to discern indirect interactions to successfully reconstruct network structure. The payoff difference between the actual scenario and the fake scenario offers sufficient information to achieve full reconstruction.
In the evolutionary PGGs, players are allowed to update their strategies by referring to their direct neighbors' information according to, for example, the learn-from-best rule, random-selection rule, Fermi rule and so on. Here, without loss of generality, we choose the Fermi rule as our strategy-updating rule. The strategy updating probability W i←j of player i is where player i choose his/her neighbor j randomly in i's neighborhood and adopts player j's strategy with probability W, κ is characterizes the stochastic uncertainties in the game dynamics. Indeed, when κ = 0, player i always adopts the strategy of the player with better payoff with probability 1, while as κ → ∞ , player i updates his/her strategy with probability 1/2 regardless of the payoff difference. For simplicity, we set κ = 0.1 by following existent research in the literature 38 .
Reconstructing combined matrix. As the first step of the two-step reconstruction strategy, we aim to reconstruct a combined matrix C consisting of both direct and indirect interactions. The key to the accomplish of the first step lies in the establishment of formula Y i = Φ i · X i (i = 1, 2, … , N), in which vector X i includes all direct and indirect interactions of node i, and vector Y i and matrix Φ i can be constructed exclusively from the time series of individual payoff p i and strategy c i . To simplify our description of networked PGGs, we denote G ≡ A + I, where I is an identity matrix. Thus, Eq. (1) can be expressed as . The reconstruction formula can be established by focusing on the payoff of player i, determined by Eq. (3), which for convenience can be expressed in the matrix form (see Supplementary Note 1)  The focal player 1 (red node) has one direct link with player 2 (blue node) and one indirect interaction (dashed line) with player 3 (gray node) due to their common neighbor (player 2). In this situation, the focal player 1 belongs to 2 different groups of PGGs (one self-centered PGG and the other PGG centered on player 2) and the payoff of player 1 will be affected by the action of player 3. (c) If the indirect interaction between player 1 and 3 in (b) is changed to a direct interaction, the focal player 1 will participate in 3 different groups of PGGs (one selfcentered PGG and the other two groups centered on player 2 and 3, respectively), rendering the player 1's payoff different from that in (b).
We can simplify the payoff of player i in an arbitrary round as follows In the evolutionary PGG, all players play M-round games from t 1 to t M with their neighbors, providing sufficient information for building the reconstruction formula In the formula, matrix Φ i and vector Y i can be obtained from individual strategies and payoffs of players, respectively, allowing us to reconstruct X i directly from Φ i and Y i by using the Lasso, a convex optimization method (see Methods for details). In a similar fashion, we can reconstruct vector X j for node j and for all nodes as well, giving rise to the combined matrix Note that C is similar to second-order transfer matrix 39 .

Separation between direct and indirect interactions.
Insofar as the combined matrix C is successfully inferred, we can distinguish between direct and indirect interactions in matrix C, giving rise to the adjacency matrix A, namely, the whole network structure. Specifically, according to Eqs. (10) and (11), we have where C has been obtained and matrix D can be obtained in virtue of ∑ X i = d i (see Supplementary Note 2). Therefore, in principle, matrix G = A + I can be derived based on C and D and consequently, yields network structure A. However, deriving G directly from Eq. (12) is still challenging, calling for mathematical techniques and optimization to solve G. First, we rewrite Eq. (12) to be = .
( )( ) We perform similarity transformation on matrix D GD We can thus formulate where P, Λ and P T can be obtained by similarity transformation and the optimization based on the linear least squares method (see Methods for detailed derivation and optimization). Then the network matrix can be directly inferred via Scientific RepoRts | 6:30241 | DOI: 10.1038/srep30241 = − . A G I (16) An intuitive illustration of our two-step reconstruction process is shown in Fig. 2. As shown in Fig. 2(a), node 1 (the red node) has one direct interaction (solid line) with node 2 and two indirect interactions (dashed lines) with node 3 and node 7. Moreover, node 1 has a virtual self-loop (dashed lines) due to intrinsic dynamics of the PGGs. In fact, all nodes interact with themselves, and the self-interaction captured by the virtual self-loop is subject to indirect interactions, because of the absence of the self-loop in the network structure. By using the Lasso to optimize the reconstruction formula ( Fig. 2(b)), we can reconstruct both direct and indirect interactions of node 1, included in vector X 1 . By repeating the reconstruction process shown in Fig. 2(a,b), we can obtain the combined matrix C (Fig. 2(c)) that includes both direct and indirect interactions of all the nodes. As shown in Fig. 2(d), direct interactions (solid lines) and indirect interactions (dashed lines) cannot be distinguished in this stage. To separate the two types of interactions and identify direct links, we decompose the combined matrix C by exploiting similarity transformation and the linear least squares method to solve the Eq. (15) (Fig. 2(e)), allowing us to obtain adjacency matrix A (Fig. 2(f)) from matrix C eventually. Compared to Fig. 2(d), the indirect interactions captured by dashed lines are removed in Fig. 2(g), recovering the original network in Fig. 2(a) accurately.
Network reconstruction performance. We numerically simulate the PGGs occurring on both homogeneous and heterogeneous networks, including Erdös-Rényi (ER) random networks 40 , Watts-Strogatz (WS) small-world networks 41 , Newman-Watts (NW) small-world networks 42 and Barabási-Albert (BA) scale-free networks 43 , and several real networks. Without loss of generality, we set b = 1.5 and e = 10 in our simulations. Initially, each node is occupied by a player with a random contribution c i , where 0 ≤ c i ≤ e is an integer for simplicity. In each round, all players engage in the PGGs simultaneously and update their strategies according to the Fermi rule (see Methods). The strategies and payoffs of all players are recorded for network reconstruction by employing our method. Different amounts of data (Data ≡ M/N) are considered, where M is the length of time series (the number of observable rounds) and N is the network size. As shown in Fig. 3(a), for a small amount of insufficient data, e.g., Data = 0.4, elements C ij in the combined matrix C corresponding to direct interactions (actual links), indirect interactions and null elements (zeros in combined matrix) disperse in a wide range, rendering the complete separation of the three types of elements in C and the accurate identification of actual links impossible. By exploiting similarity transformation and the linear least squares method, we can eliminate indirect interactions, giving rise to reconstructed adjacency matrix A. However, because of insufficient amounts of data, A ij in the reconstructed adjacency matrix A corresponding to direct interactions (actual links) and null connections (null elements) still overlap a little bit, demonstrating that a full reconstruction of network structure was not achieved yet. In contrast, for a relatively more data, e.g., Data = 0.6, as shown in Fig. 3(b), a vast gap arises between actual links and null elements in the reconstructed adjacency A, although the direct and indirect interactions are mixed in the reconstructed matrix C. Hence, full reconstruction of networks can be ensured by our method from sufficient amount of but sparse data that could be less than the network size N. Although direct and indirect interactions cannot be distinguished in matrix C, an accurate reconstruction of matrix C is the prerequisite for implementing similarity transformation and optimization to precisely reconstruct matrix A. In this regard, we introduce a data-based index to measure the precision of reconstructing matrix C. To be specific, we define 1 , where ⋅ 1 represents L 1 entrywise norm and matrix C(M) is obtained with M time series. The fact that Θ approaches zero indicates that the reconstructed matrix C becomes stable and close to the actual C with small difference. As shown in Fig. 3(c), we see that as the amount of data increases, the value of Θ decreases rapidly. When the value of Θ is very small, e.g., Θ < 0.1, the gap between the two types of interactions and null elements in reconstructed C emerges (e.g., Fig. 3(b)), and thus matrix C can be considered to be accurately reconstructed. Insofar as C is accurately reconstructed, we can derive the degree of each node from k i = ∑ C i − 1. Consequently, the top k i values of i's row in adjacency matrix A can be regarded as actual links. The predicted node degree from C thus offers criterion for determining the neighborhood size of each node, without relying on an additional threshold to separate actual links from null connections for each node, and the use of the area under the receiver operating characteristic curve (AUROC) (see Supplementary Figure S1) 6,8 . To verify our method, we use the true positive rate (TPR) and true negative rate (TNR) as explicit indices to quantify the reconstruction performance, where TPR is defined as the average ratio of the number of successfully inferred links to the number of actual links for all nodes, and TNR is similarly defined for null elements in the adjacency matrix. If and only if both TPR and TNR reach 1, the network is said to be fully reconstructed. As shown in Fig. 3(d), for WS small-world networks, when the amount of data exceeds 0.3, 80% success rates for both TPR and TNR are achieved, which implies that most of links can be successfully inferred even for a quite small amount of data, e.g., Data= 0.3. When Data exceeds 0.6, full reconstruction with 100% success rates is achieved from a small amount of data that are still less than the network size N.
Systemic reconstruction results for different types of networks, average degree 〈 k〉 and variance of measurement noise σ are shown in Table 1. We see for all the considered cases, accurate reconstruction is achieved. In particular, for relatively large homogeneous networks, e.g., N = 500 for ER, WS and NW networks, only a small amount of data is required to ensure full reconstruction. However, compared with homogeneous networks, larger amounts of data are required for reconstructing heterogeneous networks, e.g., BA networks. For heterogeneous networks, there are hubs with much more neighbors compared with other nodes. The hubs immediately induce much more indirect interactions in their neighborhoods than that in homogeneous networks. As a result, the combined matrix C of heterogeneous networks is usually much denser than homogeneous networks. Note that the reconstruction method based on the Lasso needs less data for reconstructing a sparse vector X i . Thus, the much denser matrix C in heterogeneous networks accounts for the requirement of a larger amount of data to fully reconstruct heterogeneous networks by using the Lasso. In the presence of small measurement noise, e.g., , full reconstruction can be achieved by using slightly larger amounts of data compared to the results without noise, as displayed in Table 1. When data are contaminated by strong noise, e.g., , we can still reconstruct networks quite accurately from a relatively large amount of data for both homogeneous and heterogenous networks. These results suggest that our method is of both high efficiency and strong robustness against noise for reconstructing complex networks with indirect interactions. Table 2 shows the performance of our method in reconstructing several real social networks.  We see that precise reconstruction can be achieved for all the real-world networks, which offers additional evidence for the practical applicability of our reconstruction approach.
Identifying the hidden node. In the real situation, we may miss the information of some nodes because the nodes are inherently inaccessible or we are not aware of their existence. We explore the robustness of our reconstruction method against the presence of such hidden nodes to test the applicability of our method. Specifically, we assume a hidden node whose strategies and payoffs can not be recorded exists in the network, and our purpose it to identify the nodes who interact with the hidden node and reconstruct the rest nodes and their connections.
The basic idea of ascertaining and identifying the hidden node is based on missing information from the hidden node when attempting to reconstruct direct and indirect interactions of the network by using the first step in our reconstruction framework. In particular, to reconstruct the direct and indirect interactions belonging to the hidden node accurately, time series from the hidden node are needed to generate the matrix Φ i and the vector Y i . However, no time series from the hidden node are available, leading to reconstruction inaccuracy and, consequently, anomalies in the predicted interaction patterns of the nodes who interact with the hidden node. It is then possible to detect the nodes who interact with the hidden node by identifying any abnormal interaction patterns 3,44,45 , which can be accomplished by using different data segments. If the inferred interactions of a node are stable with respect to different data segments, the node can be deemed to have no interaction with the hidden node; otherwise, if the result of inferring a node's interactions varies significantly with respect to different data segments, the node is likely to interact with the hidden node. To identify the nodes who interact with the hidden node, we can define the standard variance of predicted interactions with respect to different data segments as follows: where C ij q ( ) denotes the element value in the combined matrix C of the rest nodes inferred from the qth group of the data, 〈 〉 = ∑ = C C ij z q z ij q 1 1 ( ) is the mean value of C ij , N is the network size, and z is the number of data segments. Figure 4(a) illustrates that a hidden node interacts with five nodes in the exemplified network. Applying Eq. (17) to the reconstructed combined matrices yields the results shown in Fig. 4 (b), where the values of α i associated with the nodes who interact with the hidden node are much larger than those of the other nodes (that are essentially zero).
In the presence of a small number hidden nodes, their influence of missing their data to the reconstruction of the whole network is negligible. Thus, we can directly using the two-step reconstruction method by ignoring the hidden node to reconstruct the connections among the nodes other than the hidden node. As shown in Fig. 4(c), all true existent links are inferred successfully. However, there are two redundant (fake) links (marked as red) that stems from the influence of the hidden nod. In Fig. 4(d), as the measurable data increase, the TPR and TNR of reconstructing the connections among the rest nodes approach unit, indicating that the network can be reconstructed quite accurately in spite of the existence of a hidden node in the network. The reconstruction performances for different types of networks with hidden nodes are shown in Table 1. It is worth noting that in the presence of a large fraction of hidden nodes, ignoring the influence of the hidden nodes will lead to many fake links, accounting for the failure of directly using the reconstruction method. At present, how to tackle a network with an arbitrary fraction of hidden node is still an extremely challenging question.

Discussion
We have developed an approach with a two-step strategy to reconstruct interaction structure in networked PGGs with indirect interactions among nodes from measurable time series of individual strategies and payoffs. We first reconstruct both direct and indirect interactions among nodes by transferring the network reconstruction problem into a sparse signal reconstruction problem, which is solved by using the Lasso in an efficient and robust manner. Subsequently, we distinguish between direct and indirect links by virtue of similarity transformation and an optimization method based on the linear least squares. Moreover, our framework is able to locate hidden node and reconstruct the network structure except the hidden node. We have validated our method in terms of both homogeneous and heterogeneous networks, finding that high reconstruction accuracy can be achieved for all the studied cases. In general, less amounts of data are required for reconstructing homogeneous networks than that  for heterogeneous networks, due to the existence of hubs. Our method is also resilient to measurement noise and hidden nodes, accounting for its practical importance in real networked systems. Our work also raises a number of questions, answers to which could improve our ability to reconstruct complex networks with indirect interactions. First, with respect to a variety of indirect interactions in the real world, our approach cannot offer a general solution to the reconstruction problem at present, and only complex systems characterized by networked PGGs can be reconstructed by our framework, prompting us to wonder if a general reconstruction framework can be developed for complex networks with any indirect interactions? Second, relatively large amounts of data are required to reconstruct heterogeneous networks. A practically significant question is how can we reduce the data amount based on the current method? Third, although we can locate a hidden node by identifying all its interactions, how to distinguish direct and indirect interactions between the hidden node and the other nodes accurately and how to locate a large fraction of hidden nodes are still open questions. Fourth, the empirical test of our method is still lacking in spite of our systematic numerical investigations. Our approach is expected to be available in laboratory experiments of the networked PGGs by recruiting subjects. From the information of subjects in the game, the network structure can be reconstructed accurately. Analogously, our method is also applicable to networked climate game experiments 25 that is closely related with the PGGs. Since usually direct and indirect interactions play a joint role in the dynamics of the whole system and their effects are hidden in measurable data in a complicated manner, we anticipate that it is challenging to answer (a) Illustration of a hidden node. The hidden node (node 50 in red) interacts with five nodes (in orange), including direct interactions (solid lines) and indirect interactions (dashed lines). For clarity, the virtual selfloop of each node and interaction weights are not shown. The direct (red solid lines) and indirect (red dashed lines) interactions between the hidden node and its neighbors and neighbors' neighbors can be inferred in terms of the standard variance (Equation (17)). However, the direct and indirect interactions cannot be distinguished because of missing the data of the hidden node. The time series of the other nodes except the hidden node are measurable. The interactions (orange lines) between the neighbors of the hidden node and the other nodes except the hidden node can be reconstructed accurately by simply ignoring the hidden node. However, indirect (orange dashed lines) and direct interactions (orange solid lines) still cannot be distinguished. The other connections (gray lines) can be accurately reconstructed and classified. (b) The standard variance of reconstructed interactions α i of each node. The five nodes who interact with the hidden node exhibits much larger values of α than the other nodes. (c) The reconstructed network except the hidden node. The gray lines are true existent links which are reconstructed successfully, and the red lines are false positive (fake) links which do not exist in the original network. (d) The success rate of reconstructing the WS small-world network in (c). In the WS small-world network, the network size N is 50 (including the hidden node), the average degree 〈 k〉 = 2 and the rewiring probability is 0.1. The results are obtained by averaging over 10 independent realizations.
Scientific RepoRts | 6:30241 | DOI: 10.1038/srep30241 these questions. Nonetheless, our approach as the first attempt opens a new route to solve the difficult problem with implications for understanding many social networked systems in a wide range of fields.

Methods
The Lasso for sparse signal reconstruction. The Lasso is a convex optimization method to reconstruct a vector X i ∈ R N from linear measurements Y i and Φ i in the form where Y i ∈ R N and Φ i is an M × N matrix. X i can be reconstructed by applying the Lasso for solving where λ is a nonnegative regularization parameter. The sparsity of solution is assured by X i 1 in the Lasso according to the compressed sensing theory 46 . Meanwhile, the least square term − Φ Y X i i i 2 makes the solution more robust against noise in time series than L 1 -norm-based optimization method. T via similarity transformation where Λ = diag(λ 1 , λ 2 , … , λ n ), and P is an N × N nonsingular orthogonal matrix, which can be derived from similarity transformation of D CD Although we can not ascertain each element in matrix D GD where L is the number of zeros in matrix D CD 1 2 1 2 . The above equation also satisfy the formation β = Ψ · α, in which α = (λ 1 , λ 2 , , λ N ) T , β =   ( , , , ,0, , 0) , and Ψ is the corresponding matrix. In this situation, most of values in α are not zero, thus we can obtain α via the linear least squares method 2 where vector α ∈ R N from vector β ∈ R K and matrix Ψ (N+L)×N . The optimization also known as L 2 norm minimization, a basic optimization paradigm for solving an overdetermined system of linear equations. Due to the sparsity of adjacency matrix, the combined matrix C also contains lots of zeros, leading to +  N L N in the matrix Ψ , thus Eq. (22) can be well solved by the linear least squares method. Now we can get Λ via this method, then get D GD