Finding and analysing the minimum set of driver nodes required to control multilayer networks

It is difficult to control multilayer networks in situations with real-world complexity. Here, we first define the multilayer control problem in terms of the minimum dominating set (MDS) controllability framework and mathematically demonstrate that simple formulas can be used to estimate the size of the minimum dominating set in multilayer (MDSM) complex networks. Second, we develop a new algorithm that efficiently identifies the MDSM in up to 6 layers, with several thousand nodes in each layer network. Interestingly, the findings reveal that the MDSM size for similar networks does not significantly differ from that required to control a single network. This result opens future directions for controlling, for example, multiple species by identifying a common set of enzymes or proteins for drug targeting. We apply our methods to 70 genome-wide metabolic networks across major plant lineages, unveiling some relationships between controllability in multilayer networks and metabolic functions at the genome scale.


Analysis of the Size of MDSM
In this section, we analyze the size of MDS for multilayer networks (i.e., the size of MDSM).

Formal Definition of MDSM
Recall that for a graph G(V, E), U (U ⊆ V ) is called a dominating set (DS) of G if (∀v ∈ V )(v ∈ U ∨ (∃u ∈ U )({u, v} ∈ E)) holds. A DS with the minimum cardinality is called a minimum dominating set (MDS).
MDSM is defined by extending this definition of MDS. Let G = {G i (V i , E i )|i = 1, . . . , N } be a set (multiset) of undirected networks. That is, G corresponds to multilayer networks. Let V = i∈{1,...,N } V i . U (U ⊆ V ) is called a dominating set for multilayer networks G (DSM for holds. A DSM with the minimum cardinality is called a minimum dominating set for multilayer networks (MDSM). Clearly, S M DS (G) ≥ S M DS (G i ) holds for each G i because an MDSM is also a DS for G i . Conversely, i∈{1,...,N } V M DS (G i ) becomes a DSM for G. Therefore, we have the following.

Simple Upper and Lower Bounds
Although this result is almost obvious, it gives tight bounds in the worst case: • if G i s are identical, S M DS (G i ) = S M DS (G) holds for all G i , This fact suggests that we should consider special family of networks in order to derive meaningful bounds. Furthermore, to be discussed later, results of computational experiments suggest that the size of MDSM for k-regular random networks is much smaller than this upper bound. In the following subsections, we theoretically explain this empirical finding.

Estimation of MDSM Size for k-Regular Random Networks
We assume that graphs are given uniformly at random on the same set of nodes V with |V | = n under the constraint that every node has degree k for a constant k. We utilize the recursive probabilistic estimation technique that was recently developed for analysis of the size of an MDS in k-regular random networks and maximally assortative scale-free networks [5]. It is shown in [5] that this technique yields very accurate estimates of the MDS size for random graphs. It is to be noted that this technique does not give rigorous analysis methods but gives approximate analysis methods as done in many studies on complex networks using mean-field approximation and other approximate analysis techniques.
Let G(U ) denote the subgraph of G that is induced by a set of vertices U , and let N G (U ) denote the set of neighbors of U in G excluding U (i.e., N G (U ) = {v|{u, v} ∈ E, u ∈ U, v / ∈ U }). We consider the following virtual procedure that outputs a DSM for G = {G 1 , . . . , G N }.
(i) Let DS 1 be the dominating set for G 1 obtained by the method in [5]. Let V 1 ← V .
(V i is the set of vertices in G i that are not dominated by a combined dominating set for G 1 , . . . , G i−1 .) (iv) Let DS i be the dominating set for G i (V i ) obtained by the method in [5].
We estimate the size of DSM obtained by this procedure. Although DSM obtained by this procedure is not necessarily the minimum DSM, it is expected that the size of DSM gives an approximate upper bound of the size of MDSM.
As shown in [5], the size of DS 1 is estimated as Then, the size of N G 2 (DS 1 ) is estimated as Note that since G 2 is a random k-regular graph, DS 1 can be regarded as a random subset for G 2 . Therefore, the size of V 2 is estimated as Accordingly, the size of DS 2 is estimated as Here, we let For example, α 1 and α 2 are estimated as α 1 = 1 k+1 and . By repeatedly applying the above argument, we have the following estimates: It seems difficult to obtain a simple analytical form of α i . Therefore, we computed α N n and compared them with the sizes of MDSM for randomly generated k-regular networks. Table S4 suggests that theoretical estimates are very accurate.

Estimation of MDSM Size for Maximally Assortative Scale-Free Networks
The above estimation method can be modified for analysis of maximally assortative scale-free networks in which the degree distribution follows a power-law ∝ k −γ . A network is called maximally assortative if exchange of any pair of edges does not increase the assortative coefficient [5]. Note that the assortative coefficient r for network G(V, E) is given by where e i and e j denote the degrees of endpoints of an edge e, and M = |E|. In order to approximately analyze an upper bound of MDSM size, we employ the same virtual procedure as in Section 1.3. It is shown in [5] that a maximally assortative network is approximately regarded as a collection of k-regular networks. By using this property, as a base case, an upper bound of the MDS size ratio for a single maximally assortative network is estimated as Assume that β i has already been obtained as a size of MDSM DS for i networks. Then, there exist (1 − β i )n vertices in G i+1 that do not belong to DS i . The probability that a node v with degree k in V − DS i is not dominated by DS i is estimated as because each edge from v is connected to a node in DS i with probability β i . Therefore, the expected number of nodes with degree Here we assume that V −DS i −N G i+1 (DS i ) is a maximally assortative power-law network where the original degrees are preserved (we ignore the effect of degree change due to removal of edges to N G i+1 (DS i )). Since the size of MDSM is upper bounded by the size of )|, an upper bound of the MDSM size ratio for i + 1 network is estimated as Again, it seems difficult to obtain a simple analytical form of β i . Therefore, we computed β N n and compared them with the sizes of MDSM for artificially generated maximally assortative networks. It is seen from Table S5 that theoretical estimates are larger than MDSM sizes of artificially generated networks but both have similar tendencies. This result seems reasonable because this analysis gives estimates of upper bounds.

Convergence of MDSM Size for Multilayer Networks
Upper bound estimations and the result of computational experiments suggest that the MDSM size converges to n as the number of networks grows. We show that this speculation is true as shown below.
Proposition 2 Suppose that each graph in multilayer networks has the minimum degree d min . Then, the MDSM size is at least n − d min if a sufficient number of distinct graphs are given and n is sufficiently large.
(Proof) Since any number of graphs can be given and n is sufficiently large (e.g., n ≥ 2d min for d-regular graphs), we can assume w.l.o.g. that every set of d min + 1 nodes constitutes a star as an induced subgraph of some input graph. Now we prove the proposition by contradiction. Suppose that there exists an MDSM DS with size less than n − d min . Then, there must exist d min + 1 nodes v i 1 , . . . , v i d min +1 any of which does not belong to DS. From the assumption, the star centered at v i 1 with leaves v i 2 , . . . , v i d min +1 appears as an induced subgraph of some input graph G j . However, v i 1 is not dominated by any of its neighbors in G j . This contradicts to the assumption. 2

Hardness of Computation of MDSM
It is known that the maximum bipartite matching for up to two-layer networks can be solved in polynomial time, whereas computation of such a matching for three or more layer networks is NP-hard [6]. This fact suggests that the minimum set of driver nodes under linear structural controllability [3] can be obtained in polynomial time only up to two-layer networks.
On the other hand, it is known that the computation of an MDS is NP-hard even for one network [7]. However, the situation changes if we consider special graph classes: it is known that MDS can be computed in polynomial time if a given network is a partial k-tree (for a constant k) [7]. As a special case, it is seen that an MDS can be computed in polynomial time if networks are forests or consist of cycles and stars. We show that computation of MDSM is NP-hard even for such simple networks.
Theorem 3 The MDSM problem for two-layer networks is NP-hard even if a graph in each layer consists of cycles and at most one star.
(Proof) We use a reduction from a special case of 3-SAT in which each variable occurs at most three times, at least once as a positive literal and at least once as a negative literal (see also Fig.  S5). It is known that this special case remains NP-hard [8]. Let {x 1 , . . . , x n } and {c 1 , . . . , c m } be sets of variables and clauses, respectively, in an instance of this restricted 3-SAT.
We construct G 1 (V 1 , E 1 ) by For each literal in a clause if l j k is positive and the first occurrence of x j k , v 4 j k , if l j k is positive and the second occurrence of x j k , v 2 j k , if l j k is negative and the first occurrence of x j k , v 5 j k , if l j k is negative and the second occurrence of x j k .
We construct G 2 (V 2 , E 2 ) by The reduction can be done in polynomial time. We prove the correctness of this reduction. First, observe that an MDSM must contain v 0 , and either {v 1 i , v 4 i } or {v 2 i , v 5 i } for each i ∈ {1, . . . , n}. Therefore, the size of an MDSM is at least 2n + 1. Next, suppose that the given 3-SAT instance is satisfied by an assignment x i = b i (i = 1, . . . , n, b i ∈ {0, 1}). We begin with S = {v 0 }. If b i = 1, we add v 1 i and v 4 i to S. Otherwise (i.e., b i = 0), we add v 2 i and v 3 i to S. Then, the resulting S is clearly a dominating set for both G 1 and G 2 , and its size is 2n + 1. Conversely, suppose that (G 1 , G 2 ) has an MDSM of size 2n + 1. As mentioned above, it must contain v 0 , and either {v 1 i } is contained, we let x i = 1. Otherwise, we let x i = 0. Then, the resulting assignment clearly satisfies all clauses. Therefore, the given 3-SAT instance is satisfiable iff (G 1 , G 2 ) has an MDSM of size 2n + 1. 2

Theorem 4
The MDSM problem for three-layer networks is NP-hard even if a graph in each layer does not contain any cycle.
(Proof) We use a reduction from the same special case of 3-SAT as in the proof of Theorem 3 (see also Fig. S6).
We construct G 1 (V 1 , E 1 ) by We construct G 2 (V 2 , E 2 ) by For each literal in a clause if l j k is positive and the first occurrence of x j k , v 3 j k , if l j k is positive and the second occurrence of x j k , v 2 j k , if l j k is negative and the first occurrence of x j k , v 4 j k , if l j k is negative and the second occurrence of x j k .
The reduction can be done in polynomial time. We prove the correctness of this reduction. First, observe that an MDSM must contain either . . , n}, from which it follows that the size of an MDSM is at least 6n. Next, suppose that the given 3-SAT instance is satisfied by an assignment Then, the resulting S is clearly a dominating set for each G i (i = 1, . . . , 3), and its size is 6n. Conversely, suppose that (G 1 , G 2 , G 3 ) has an MDSM of size 6n. As mentioned above, it must contain either Otherwise, we let x i = 0. Then, the resulting assignment clearly satisfies all clauses. Therefore, the given 3-SAT instance is satisfiable iff (G 1 , G 2 , G 3 ) has an MDSM of size 6n. 2 The complexity of the MDSM problem of this special case for two layer networks is open. (Table S1-S3 are given as Excel files)    Table S1 but considering a 6-layer network analysis in which each group consist of up to three species. This leads to a set of 25 groups for the 6-layer analysis. Computational time is small in spite of the large size of the networks. The file also shows results for the 3-layer and 4-layer and protein networks.

Table S4
Comparison of theoretical estimates and computational results on the size of MDSM for multilayer k-regular random networks, where n = 50 and the average size over 5 trials is shown for each of the computational results.
Black edges and red edges represent those in G 1 and G 2 , respectively.  In these examples, all three graphs lead to the same |M DS|=3 (filled nodes).  Tables S4-S5 (Tables S1-S3 are given as Excel files) Supplementary Material Figures S1-S7