Input node placement restricting the longest control chain in controllability of complex networks

The minimum number of inputs needed to control a network is frequently used to quantify its controllability. Control of linear dynamics through a minimum set of inputs, however, often has prohibitively large energy requirements and there is an inherent trade-off between minimizing the number of inputs and control energy. To better understand this trade-off, we study the problem of identifying a minimum set of input nodes such that controllabililty is ensured while restricting the length of the longest control chain. The longest control chain is the maximum distance from input nodes to any network node, and recent work found that reducing its length significantly reduces control energy. We map the longest control chain-constraint minimum input problem to finding a joint maximum matching and minimum dominating set. We show that this graph combinatorial problem is NP-complete, and we introduce and validate a heuristic approximation. Applying this algorithm to a collection of real and model networks, we investigate how network structure affects the minimum number of inputs, revealing, for example, that for many real networks reducing the longest control chain requires only few or no additional inputs, only the rearrangement of the input nodes.


S1. COMPUTATIONAL COMPLEXITY OF THE LCC-CONSTRAINED MINI-MUM INPUT PROBLEM
In the following we prove that the computational complexity of the LCC-constrained minimum input problem belongs to the NP-complete class by showing that its corresponding decision problem is both NP and NP-hard [1]. To prove that it is NP, we provide a polynomialtime algorithm to verify for every node set S whether S is a valid input node set or not.
Proving that it is NP-hard involves reducing the minimum dominating set problem, a known NP-complete problem, to the LCC-constrained minimum input problem, where the reduction is carried out in polynomial time.
Theorem 1. For a given directed network G(V , E) and positive integer ℓ > 0, a node set S ∈ V is a valid input node set if (i) there exists a matching in the bipartite representation B of network G, such that S is the set of unmatched nodes; (ii) and S is a dominating set in the accessibility graph G ℓ of network G.
Determining whether a valid input node set S ⊆ V exists such that |S| = M > 0 is an NP-complete problem.
Proof. To prove that the problem is NP, we present the following algorithm to check the validity of an input node set S. To check if S satisfies the matching condition (i), we obtain a pruned bipartite network B ′ by taking B and removing all vertices from V − that correspond to vertices in S. If there exists a prefect matching in B ′ then there exists a matching in B such that only nodes in S are unmatched. We can check whether a perfect matching exists in a bipartite network using, for example, the Hopkroft-Karp algorithm which has worst case runtime O( |V |||E|) [2].
To check the accessibility condition (ii), we fist construct the accessibility graph G ℓ by calculating the distance between all node pairs in G, which can be done using breath-first search in at most O (|V |(|V | + |E|)) steps [1]. Verifying that S is a dominating set in G ℓ can 2 be done, for example, by iterating over all edges in G ℓ which takes at most O(|V | 2 ) steps.
Consequently, verifying a solution can be done in a polynomial time.
To prove NP-hardness, we reduce the NP-hard minimum dominating set (MDS) problem to our problem [3]. For a given directed network G(V , E) a set of nodes D ∈ V is a dominating set if each node in V is either in D or has a link pointing at it that starts from a node in D.
To reduce the MDS to our problem, we first create an augmented network G ′ by adding a self-loop to all nodes in G, solving the LCC-constrained minimum input problem with ℓ = 1 in G ′ is equivalent to solving the MDS in G. To see this, note that the set of self-loops provide a perfect matching in G ′ ; therefore we only have to check that the minimum input set satisfies the accessibility condition. The accessibility graph G ′ 1 is the same as G ′ and self-loops do not affect the dominating sets; therefore the minimum input node set in G ′ for ℓ = 1 is also a minimum dominating set in G.
Therefore, the computational complexity of the LCC-constraint minimum input problem belongs to the NP-complete class.

S2. EFFICIENT INTEGER LINEAR PROGRAMMING FORMULATION OF THE LCC-CONSTRAINED MINIMUM INPUT PROBLEM
Analyzing the performance of the integer linear programming (ILP) formulation (8) in the main text shows that a na ive implementation of the matching results in poor performance and run-time quickly increases with the number of links in the network (Fig. S2).
The performance can be improved by a constant factor using the graph-cycling formulation introduced by Ref. [4].
The basic idea of this approach comes from this idea that a matching partitions the digraph into disjoint paths, see figure (S1). Thus, they partition the digraph into disjoint cycles to find a matching.
We start from a directed graph G(V , E) (Fig. S1a). We begin by creating an augmented graph G ′ (V ′ , E ′ ) by adding an auxiliary node x to G, representing the external control signals. We connect this node to all network nodes by a pair of out-going and in-coming links ( Fig. S1b). Then, we partition G ′ with cycles in a way that cycles are only allowed to overlap at node x ( Fig. S1c). Links that participate in cycles in G ′ form a matching in G, and the disjoint cycles correspond to disjoint paths in the matching. Therefore the links exiting the auxiliary node as a part of cycle point at input nodes, which are unmatched nodes in the matching (Fig. S1d). Note that there are different ways to partition G ′ into disjoint cycles.
To find a maximum matching, we aim to partition the G ′ with a minimum number of cycles.
To define the associated ILP formulation, we define binary variables y i→j ∈ {0, 1} corresponding to each link (i, j) in the augmented graph. Here, y i→j = 1 means that (i, j) is a part of cycle partitioning, otherwise, y i→j = 0. To ensure that the solution is a cycle partition, each node v in the network is forced to have exactly one in-coming and one out-going link: Then, among all possible cycle partitions, we select the one that creates the minimum number of cycles. The number of cycles can be determined by the number of outgoing links from the auxiliary node that participate in cycles. Thus, we minimize the j∈V y x→j objective function. Note that the number of out-going and in-coming links of the auxiliary node must be equal, thus they have to satisfy the j∈V y x→j = i∈V y i→x constraint. Next, we should ensure the accessibility. Thus, for each node i, the auxiliary node must connect to at least one node of set V ℓ i , where V ℓ i is the set of nodes from where we can reach node i in at most ℓ steps. This is enforced by the j∈V ℓ i y x→j ≥ 1 constraint. Putting it all together, we obtain the following ILP formulation: Figure S2 compares the performance of the na ive ILP formulation (8) and the graphcycling implementation in Eq. (S2). We measure the run-time by the number of nodes searched in the branch-and-bound tree to solve the problem. We find that although the run-time for both formulations grows exponentially, the graph-cycling implementation is significantly faster, allowing us to explore larger network instances.

S3. REAL NETWORKS
The descriptions of the datasets and the results of created core and required cost for different LCC constraints are summarized in Table SI and SII.
Networks n i (ℓ) n core C(ℓ) n i (ℓ) n core C(ℓ) n i (ℓ) n core C(ℓ) in G ℓ . Therefore, a naïve approach to find a valid input node set is to identify a maximum matching in B and a minimum dominating set in G ℓ independently using the greedy algorithms described in Sec. 4.1 and 4.2, respectively. A valid input node set is then provided by the union of the unmatched nodes in B and the dominating nodes in G ℓ . We compare the performance of the naïve greedy algorithm to the exact solution in Fig. S4, finding that the naïve algorithm performs poorly in most cases except for networks that require a very large number of inputs. In the worst case, the estimated number of input nodes is 3.6 times larger than the optimal, while the approximation algorithm in the main text estimates N i (ℓ) within 5% error for the same analyzed networks.