Zero-determinant strategy in stochastic Stackelberg asymmetric security game

In a stochastic Stackelberg asymmetric security game, the strong Stackelberg equilibrium (SSE) strategy is a popular option for the defender to get the highest utility against an attacker with the best response (BR) strategy. However, the attacker may be a boundedly rational player, who adopts a combination of the BR strategy and a fixed stubborn one. In such a condition, the SSE strategy may not maintain the defensive performance due to the stubbornness. In this paper, we focus on how the defender can adopt the unilateral-control zero-determinate (ZD) strategy to confront the boundedly rational attacker. At first, we verify the existence of ZD strategies for the defender. We then investigate the performance of the defender’s ZD strategy against a boundedly rational attacker, with a comparison of the SSE strategy. Specifically, when the attacker’s strategy is close to the BR strategy, the ZD strategy admits a bounded loss for the defender compared with the SSE strategy. Conversely, when the attacker’s strategy is close to the stubborn strategy, the ZD strategy can bring higher defensive performance for the defender than the SSE strategy does.


I. INTRODUCTION
Stochastic security games attract more and more attention in many fields such as the cyber-physical system (CPS), the unmanned aerial vehicle (UAV), and the moving target defense (MTD) [1][2][3][4].The stochastic Stackelberg asymmetric security game is one of the important categories to characterize players' behaviors when the defender faces persistent threats from the attacker.As a fundamental model discussed in [5,6], the attacker tends to choose the best response (BR) strategy after observing the defender' strategy, while the defender aims to maximize its utility considering the attacker.Actually, the defender, as a leader, has an advantage in guiding the attacker's decision, and the defender picks the optimal strategy based on predicting the attacker's BR strategy.The corresponding equilibrium is defined as the strong Stackelberg equilibrium (SSE) [5,7,8].
However, players may not always be completely rational due to subjective or objective factors [9,10] in practice.As a typical case, a boundedly rational attacker does not strictly adopt the BR strategy in practice.This may result from the limitation of the attacker's observation, the disturbance of the environment, or the imitative behavior of the attacker.For instance, in MTD problems, the attacker may not directly observe the certain defense strategy because of the disturbance from the administrator (defender) [2,11].In UAV systems, the malicious UAV may not observe the location or the flight attitude of the legitimate UAV (defender) due to the obstruction in the wild [3].In CPS, the attacker may design a stealthy attack scheme instead of the BR strategy to avoid the fault detection of the defender [1,12].
When a boundedly rational attacker loses the ability or interest to achieve the BR strategy, it may likely turn to a fixed stubborn strategy in most cases.For example, a player prefers a stubborn strategy to avoid being induced to an unsatisfactory outcome in the CPS security [13] and may choose a fixed credible strategy when the player cannot calculate the BR strategy timely in MTD problems [14].Against a stubborn attacker, the SSE strategy fails to be regarded as the optimal solution for the defender.In fact, a boundedly rational attacker may choose a mixed strategy, composed by the BR strategy and the stubborn strategy, and such boundedly rational players are common in security problems.For instance, in CPS, the attacker is hesitating between adopting a stubborn strategy or moving as a follower since it needs to consider the failure probability of its own data acquisition system to avoid potential loss [15].In UAV security problems, a UAV also faces different choices in different stages, since the UAV may lose the location of the defender when going through some complex terrains like in the forest, but fully observes the defender on plains [3].Thus, various factors, including the potential preference, inherent cognition, and available resources, make the boundedly rational attacker nonnegligible to the defender.
Clearly, due to the stubborn element within a boundedly rational attacker, the original SSE strategy may be no longer suitable for the defender.Thus, it is important to consider other strategies to help the defender maintain its defensive performance.Fortunately, zero-determinate (ZD) strategies provide a powerful idea to unilaterally enforce an advantageous relation between players' expected utilities, no matter what strategy the opponent selects.Proposed by [16] in iterated prisoner's dilemma (IPD), a ZD strategy means that one player can unilaterally enforce the two players' expected utilities subjected to a linear relation.Afterward, various ZD strategies have been widely studied to promote cooperation or unilaterally extortion in public goods games (PGG), human-computer interaction (HCI), evolutionary games, etc [17][18][19][20][21][22][23].
Besides, asymmetric matrix games are more realistic than symmetric ones, and there are some challenges to solve the asymmetric games due to the different preferences [24,25].Currently, there are not many breakthroughs by applying ZD strategies in symmetric games.For example, some works adopt the ZD strategies to persuade the service provider to cooperate in iterated data trading dilemma games [26] and to deploy as special active defense strategies in IoT devices [27].Considering the universality and importance of asymmetric games in security, it is necessary to explore the performance of ZD strategies in asymmetric games under security scenarios, since the original analysis of ZD strategies in IPD cannot be directly applied to the asymmetric security game.
In this paper, we are inspired to reveal whether the defender can adopt the ZD strategy against a boundedly rational attacker in stochastic Stackelberg asymmetric security games, in order to make up for SSE strategies' deficiencies.To this end, we show that the ZD strategy gives a better performance than the SSE strategy does.The main contribution of this work is summarized as follows.
• We apply ZD strategies in asymmetric security games.We verify the existence of ZD strategies, in order to ensure the availability of the defender to adopt a ZD strategy.Besides, against the two special attackers, we investigate the defensive performance of ZD strategies compared with SSE strategies.Specifically, against an attacker with the BR strategy, the ZD strategy admits a bounded loss in the utility compared with the SSE strategy, while against a stubborn attacker, the ZD strategy performs well and brings the defender a higher utility than the SSE strategy does.
• We further analyze a general case where the boundedly rational attacker adopts mixed strategies.The extension takes on analogous tolerable results.
When the attacker's strategy is close to the BR strategy, we provide the defender with appropriate ZD strategies to maintain a bounded loss in defensive performance compared with the SSE strategy, and save the computing resources.Also, when the attacker is close to a stubborn attacker, we show suitable ZD strategies for the defender to get higher defensive performance than SSE strategies.
• We verify our results in two experiments by providing the defender with proper ZD strategies to compare with an SSE strategy [5].First, we show its performance in MTD problems, where the boundedly rational attacker can directly observe the defender's strategy and derive its explicit BR strategy [2,11].Then we show its performance in CPS problems.The setting is more complicated but practical than the considered MTD problems, where the attacker can only observe players' action history and calculate the BR strategy based on certain mechanisms, like the fictitious play and the Q-learning [7,28].

II. STOCHASTIC STACKELBERG ASYMMETRIC SECURITY GAME
It is known that, in a stochastic asymmetric security game with the memory of the last stage, an attacker aims to invade two targets and a defender prevents the attack in each stage [2,29].Consider the stochastic Stackelberg asymmetric security game G = {S, N, D, A, r, P }. S = {11, 12, 21, 22} is the set of states, which is composed by the previous attack and defense targets.N = {d, a} is the set of players.D = {1, 2} and A = {1, 2} are the defender's action set and the attacker's action set, respectively.r = {r d , r a } is the reward set of players, where r i : In this security game, since the state presents for the previous players' actions, P (s |s, d, a) = 1 if and only if s = (da) for any s ∈ S. Thus, the next state depends on players' strategies and the current state.For convenience, denote P (s |s) as the state transition probability to state s from state s, where s , s ∈ S. Furthermore, in the game G, each player's strategy depends on the current state, which is also a memory-one strategy.The strategy of the defender is a probability distribution π d , where π d (d|s) ∈ ∆D with ∆D denoting a probability simplex defined on the space D. Similarly, the strategy of the attacker is π a with π a (a|s) ∈ ∆A.Thus, P (s, s ) = π d (d|s)π a (a|s), where s = (da).Set M = {P (s|s )} s,s ∈S as the state transition matrix of this security game.As discussed in [16,30], we carry forward the investigation with a regular matrix M .At stage t in G, each player observes the current state s t , and adopts an action according to its strategy.The defender chooses an action d t ∈ D, while the attacker chooses an action a t ∈ A. The reward of the the defender in stage t is denoted by r d (d t , a t ) = U d dtat , where U d dtat is the defender's utility when the defender protects target d t and the attacker invades target a t .Similarly, the reward of the attacker in stage t is denoted by r a (d t , a t ) = U a dtat .The utility martix in each stage is shown Table I.
The expected long-term utilities in the repeated security game are denoted by Different from IPD [16,31], the asymmetry in this security game comes from the actual security mechanism.Specifically, the defender tends to resist attacks, that is, to protect the vulnerable target, and the attacker tends to implement invasions on the unprotected target.The above represents a wide class of asymmetric game in security scenarios, which is summarized as Assumption 1. Similar investigations have been broadly discussed in the literature of various security games [6,32,33].

III. BOUNDEDLY RATIONAL ATTACKER
In the stochastic Stackelberg asymmetric security game, the defender is a leader and declares a strategy in advance, while the attacker is a follower and chooses its strategy after observing the defender's strategy.In most cases, the attacker may choose the BR strategy when it obtains the defender's strategy.
After observing the defender's strategy π d , the attacker may choose the BR strategy [7] as follows: Without loss of generality, the follower can break ties optimally for the leader if there are multiple options.In this case, the defender aims to maximize its utility considering the attacker, and the equilibrium is defined as the strong Stackelberg equilibrium (SSE) [5,7,8].
When the attacker chooses the BR strategy after observing the defender's strategy, the SSE strategy π SSE d is optimal for the defender, and the defender has an advantage in guiding the attacker's strategy decision.Besides, if the attacker only observes players' action history instead of the defender's strategy directly, the attacker can also choose the BR strategy by some methods such as the fictitious play and the Q-learning method [7,28].
However, the attacker may not always choose the BR strategy in security problems, due to subjective or objective factors such as the limitation of the attacker's observation, the disturbance of the environment, and the imitative behavior of the attacker [1,2,11].In practice, the attacker may turn to other strategies.A fixed stubborn strategy, which is not influenced by the defender, is one of the most likely options for the attacker due to its potential preference, inherent cognition, and available resources [13,14].
Denote the stubborn strategy in this security game by π * a , while the corresponding attacker is actually a stubborn player.In fact, the attacker intends to keep its action once it finds the most attractive target.For instance, in MTD, there always exists the most vulnerable target for the hacker, and the hacker has no intention to change its attack target once it finds the target [34].Besides, a UAV tends to keep attacking the current optimum target when it has a limited vision and lacks resources to detect others [35].Without loss of generality, we consider that there exists a target which is more attractive than the other for the attacker, and summarize the above in the following assumption, which was also broadly considered in [13,34,35].
In fact, either the BR strategy or the stubborn strategy may not be the single optimal option for the attacker.The attacker may adopt a mixed strategy composed by both strategies.The attacker, in this case, is actually called a boundedly rational player, and is not unusual in reality.For instance, the attacker may be hesitating between BR strategies and stubborn strategies due to the errors of the data acquisition system in MTD, and data missing in UAV [3,15,36].Thus, we formulate the mixed strategy as follows.
For the defender's strategy π d , we consider that the boundedly rational attacker adopts the BR strategy π BR a (π d ) ∈ BR(π d ) with probability λ and the stubborn strategy π * a with probability 1 − λ [10,37].Therefore, when the attacker selects the stubborn strategy, the defender loses the advantage in guiding the attacker's strategy decision, and the SSE strategy may not maintain the defensive performance due to the stubborn elements therein.Thus, the SSE strategy is no longer suitable for the defender against a boundedly rational attacker.It is important to study other strategies to help the defender maintain its defensive performance.

IV. PERFORMANCE OF ZD STRATEGY
In this section, we introduce the ZD strategy for the defender in the stochastic Stackelberg asymmetric security game.At first, we show the definition of the ZD strategy for the defender and analyze the existence of the ZD strategy.Besides, we explore the performance of the ZD strategy compared with the SSE strategy.

A. ZD Strategy for the Defender
Proposed by [16], ZD strategies mean that one player can unilaterally enforce the two players' expected utilities subjected to a linear relation, which have been widely studied to promote cooperation or unilaterally extortion in public goods game (PGG), human-computer interaction (HCI), and evolutionary games [17][18][19][20].For this stochastic Stackelberg asymmetric security game G, the defender's ZD strategy [2,16,31] is defined as follows: T , l ∈ {d, a}.The defender's all feasible ZD strategies are denoted as the following set It is called zero-determinant (ZD) that, if the defender adopts the ZD strategy with (3), then players' expected utilities are subjected to a linear relation: With the help of the ZD strategy's unilateral enforcement in players' utilities, we aim to investigate whether the defender can adopt the ZD strategy to better maintain its defensive performance than the original SSE strategy against a boundedly rational attacker.
In what follows, we investigate the existence of ZD strategies to guarantee the availability for the defender.

B. Existence of ZD Strategy
Actually, the ZD strategy cannot enforce an arbitrary linear relation between two players' utilities since it must belong to the implementer's strategy set.Thus, a feasible linear relation enforced by ZD strategies is fundamental for further analysis.The following lemma provides a necessary and sufficient condition for the feasibility of a linear relationship between players' utilities, whose proof can be found in Appendix B.
Lemma 1 Under Assumption 1, there exists a ZD strategy which enforces ηU d +βU a +γ = 0 if and only if either of the following two inequalities is satisfied.
Lemma 1 implies that the defender can adopt the ZD strategy π ZD (η, β, γ), where η, β, and γ satisfy (4) or (5), to enforce an ideal linear relation between players' excepted utilities.Actually, Lemma 1 extends the application of ZD strategies since the payoff matrix in security games is not as symmetric as that in IPD games [38].Moreover, Lemma 1 covers the following cases.
• The defender can unilaterally restrict attacker's utility if U a 11 > U a 12 .
If the defender takes π ZD (η, β, γ) with η = 0, β = 0, and 11 , then the defender ZD can unilaterally restrict the attacker's utility as , which is the same as the equalizer strategy in IPD games [16,39]., which is consistent with the result in MTD problems [27].
Based on the existence condition of feasible linear relations enforced by ZD strategies, we can further analyze whether there exists at least one ZD strategy in the security game G.For simplification, for any Actually, Γ − (Γ + ) is the region above (below) the line going through points (x 1 , y 1 ) and (x 2 , y 2 ).Then the following theorem shows a sufficient condition for the existence of a ZD strategy in G, whose proof can be found in Appendix C. Hence, in this paper, we focus on the situation where there exists at least a ZD strategy, since selecting ZD strategies for the defender is based on its existence.
In fact, the two conditions are interchangeable.For understanding easily, we start with two special cases: λ = 1, where the attacker takes BR strategies [7], and λ = 0, where the attacker is a stubborn player with stubborn strategies [13,14].
When Lemma 2 reveals that the SSE strategy always brings the highest utility for the defender when facing an attacker with the BR strategy.In this case, the upper limit of the performance of ZD strategies cannot surpass the defender's utility with adopting an SSE strategy.In spite of this, the following theorem tells that ZD strategies admit a bounded loss compared with SSE strategies, whose proof can be found in Appendix D.
Theorem 2 Under Assumption 1, The corresponding ZD strategy is , and Theorem 2 shows that the defender can adopt ZD strategies to get an tolerable loss in the utility compared with SSE strategies.On the one hand, when U a 11 U a 21 , the ZD strategy in Theorem 2 is an SSE strategy and brings the defender the same utility as SSE strategies.On the other hand, if the defender can endure the bounded loss, then adopting the corresponding ZD strategy is also a good choice to avoid the complex calculation for SSE strategies, since deriving SSE strategies needs solve a bilevel optimization problem.
When λ = 0, the attacker chooses the stubborn strategy π * a .The SSE strategy may not bring the defender a tolerable utility, since the stubborn attacker does not choose the BR strategy as the defender's expectation.In this case, due to the ZD strategy's unilateral enforcement in players' utilities, a ZD strategy can exactly play an essential role to enforce desired utilities for the defender and even to bring a higher utility than the original SSE strategy does, and the following theorem's proof can be found in Appendix E Theorem 3 Under Assumptions 1 and 2, there exists a ZD strategy In fact, the ZD strategy π ZD (−k, 1, kU ) is not lower than that of any other ZD strategy, including the ones which can unilaterally set the defender's utility [27].Moreover, according to its proof, when facing the stubborn attacker, this ZD strategy π ZD d brings an increase in utility for the defender compared with the SSE strategy.

D. ZD Strategy in General Case
It is time to consider the general case when λ ∈ [0, 1].
Here, the boundedly rational attacker chooses the BR strategy with probability λ and the stubborn strategy π * a with probability 1 − λ, i.e., π λ a (π d , π * a ) in Definition 2. Intuitively, the ZD strategy may bring the defender a similar performance as shown in Theorem 2 when λ is close to 1, and a similar performence as given in Theorem 3 when λ is close to 0. In fact, one main result of this subsection is given in the following theorem, whose proof can be found in Appendix F.
Theorem 4 provides the set Γ 1 for the defender, in which the ZD strategy brings a bounded and tolerable loss in defensive performance, even though the ZD strategy cannot surplus the SSE strategy.Thus, for the boundedly rational attacker with λ ∈ Γ 1 , if the defender does not care too much about losing a little utility, then the defender can adopt the corresponding ZD strategy since adopting the ZD strategy in Theorem 4 avoids paying vast resources to solve a bi-level optimization problem for the SSE strategy.Moreover, if U a 11 U a 21 and 22 , the ZD strategy can bring the defender the same utility as an SSE strategy, which means that the defender can still adopt ZD strategies.
Although Γ 1 seems complicated to verify, some typical value of λ is easy to be confirmed whether it belongs to Γ 1 .For instance, λ = 1 is always in Γ 1 .In this case, the attacker tends to take the BR strategy, which is consistent with Theorem 2. Actually, λ is in Γ 1 if λ is close to 1, which means that the attacker tends to choose the BR strategy.Also, we provide a subset of Γ 1 , which can be verified easily by the defender.then

Corollary 1 Under Assumptions 1 and 2, if λ ∈ [
, π * a , λ), otherwise.At last, we analogously consider the situation when λ is close to 0, i.e., the attacker tends to take the stubborn strategy in the following theorem, whose proof can be found in Appendix G.
Theorem 5 Under Assumptions 1 and 2, if λ ∈ Γ 2 in (1), then there exists a ZD strategy where k = Theorem 5 provides the set Γ 2 for the defender to adopt the ZD strategy to get a higher utility than the original SSE strategy.Notice that the corresponding ZD strategy yields wonderful performance.If λ ∈ Γ 2 , the defender can confidently select the corresponding ZD strategy, since the ZD strategy brings the defender higher defensive performance than an SSE strategy does.
Clearly, λ = 0 is always in Γ 2 , which is consistent with the results in Theorem 3. Actually, λ is in Γ 2 if λ is close to 0, which means that the attacker tends to be a stubborn attacker.Also, we provide a subset of Γ 2 for the defender, which can be verified easily by the defender.

V. APPLICATIONS
For illustration, we provide experiments to verify that the ZD strategy can help the defender maintain its defensive performance against a boundedly rational attacker, where the baseline is the original SSE strategy.

A. In MTD Problems
Let us consider an MTD problem, where the attacker can directly observe the defender's strategy and take its explicit BR strategy [2,11].Take Y i as the cost of the defender moving the defense resource from target i to the other target, and take C i as the cost of the attacker invading target i.Similar to [27], we also use the average to approximate the transfer cost.Also, take R d i and R a i as the reward and loss for two players in the state i ∈ S. Thus, the utility matrix in MTD problems is shown in Table II.Take R d s = d 1 s θ + d 0 s , and R a s = a 1 s θ + a 0 s for any s ∈ S, where θ ∈ [0, 1], and d k s and a k s are parameters in players' rewards and losses, respectively, for any s ∈ S and k ∈ {1, 2}.
As shown in Fig 2(a), for an attacker with the BR strategy, the expected utility of the defender with an SSE strategy is always higher than that with a ZD strategy.Besides, in Fig 2(b), for an attacker with the stubborn strategy, the expected utility of the defender with adopting the ZD strategy in Theorem 3 is always higher than that with adopting an SSE strategy.Moreover, in Fig 2(c), U ZD d is always higher than U SSE d when λ < 0.21, which is consistent with Theorem 5. Also, U SSE d is always higher than U ZD d when λ > 0.78, which is consistent with Theorem 4, and the difference between the two utilities is bounded and tolerable.

B. In CPS Problems
Here we consider a CPS problem with a defender as a system administrator and an attacker as a jammer or an eavesdropper.Here, different from the previous experiment, the attacker can only observe players' action history without directly receiving the defender's strategy.The attacker adopts the BR strategy based on the fictitious play [28] or the Q-learning method [7], whose details are provided in Appendix H.
As shown in  d), (h), although the average utility with the ZD strategy is lower than that with the SSE strategy, the loss is small enough to tolerate.Thus, the defender can also adopt ZD strategies to maintain its defensive performance with a bounded and tolerable loss, and avoid the complex computing in SSE strategies.

VI. DISCUSSION
We have focused on stochastic Stackelberg asymmetric security games in this paper.Due to the stubborn ele-ments within the boundedly rational attacker, we have investigated the defensive performance of ZD strategies, and have analyzed whether the ZD strategies can make up for the deficiencies of SSE strategies in such circumstances.Also, we have provided experiments to support our methodology by employing proper ZD strategies for the defender.
Actually, our results can be extended to some security problems with multiple targets.For example, consider the case that the targets in the security game can be divided into two categories.Each player's utilities are the same when choosing any two targets belonging to one category, while the player's utilities are different when choosing any two targets belonging to different categories.In such a situation, the defender can still adopt similar ZD strategies in this paper to improve its defensive performance.
Indeed, in general multi-target asymmetric stochastic security games, there are challenges in applying the ZD strategy against boundedly rational attackers.We show the barrier from a simple viewpoint.With multi-target settings, the expected utility of the defender will be composed of complex polynomials in λ, and each polynomial is the determinant of a (n 2 × n 2 ) matrix with a very high degree.Applying ZD strategies in general multi-target security games is still an open problem and deserves more and more exploration.
For convenience, we take and to simplify the writing.Moreover, take Actually, for any defender's strategy π d , when the attacker chooses the strategy π a with π a The attacker always has a strategy to get a utility no lower than U a 12 , which is conflict with ) is feasible for the defender according to Lemma 1, where . Thus, after observing the defender's strategy, the optimal utility for the attacker is U a 12 .In this case, the defender's corresponding utility is U d 12 , and Moreover, according to Lemma 1, for any ZD strategy π d that enforces ηU d + βU a + γ = 0, η • β > 0 always holds when U a 11 < U a 21 .Thus, the attacker's BR strategy also minimizes the defender's utility, and max Then min Therefore, min Appendix E: Proof of Theorem 3 According to [16], 1) and U a (π d , π a ) = D(π d ,πa,S a ) D(π d ,πa,1) .For the stubborn attacker with π * a (1|11) = π * a (1|21) = 1, we have , and we ) is feasible for the defender according to Lemma 1.Moreover, according to Theorem 2, this ZD strategy is also an SSE strategy.Therefore, this ZD strategy brings the defender the same utility as the SSE strategy against a boundedly rational attacker.Thus, min Otherwise, the ZD strategy ) is feasible for the defender by Lemma 1.According to Theorem 2, As a result, . Similarly, .
According to [16], C(π ZD d , π SSE d , π * a , λ) = 0, which was defined in Appendix A. Without loss of generality, we consider Actually, for any where Based on the above equation, we obtain Similarly,

(F4)
By taking the subtraction between the above two equations, Recall the definitions of B 1 , B 2 , B 3 , and B in Appendix A, and we have Recall (F1) and (F2), and we obtain

Appendix H: Algorithms
We show the details of the mentioned algorithms in Applications.Here, we utilize the fictitious play method based on [28] and the Q-learning method based on [7] for the BR strategy of the attacker according to players' action history.The defender takes dt ∼ π d .

4:
Reach the state s(t), and players get the payoff r d (dt, at) and ra(dt, at).The defender takes dt ∼ π d .
1] is the transition function, where P (s |s, d, a) shows the probability to the next state s ∈ S from the current state s when players take d, a, and s ∈S P (s |s, d, a) = 1 for s ∈ S, d ∈ D, and a ∈ A.

FIG. 1 .
FIG. 1. (U d 11 , U a 11 ) and (U d 12 , U a 12 ) lie in the one side of the line ηUa + βU b + γ = 0, while (U d 21 , U a 21 ) and (U d 22 , U a 22 ) lie in the other side.
λ = 1, the attacker chooses BR(π d ) after observing the defender's strategy π d .Recall the definition of (π SSE d , π SSE a ) as a SSE and U SSE d as the defender's utility from an SSE strategy.Lemma 2 Under Assumption 1, for any π ZD d ∈ Ξ and π a ∈ BR(π ZD d ), U d (π ZD d , π a ) U d (π SSE d , π SSE a).

FIG. 2 .
FIG.2.Performance of the ZD strategy compared with the SSE strategy in MTD problems.Red dotted lines describe the defender's expected utilities adopting an SSE strategy, while blue solid lines describe them adopting the corresponding ZD strategy.In (c), the red (blue) region shows the bound of Γ1 (Γ2) according to Theorem 4 (Theorem 5).

1 2 9 FIG. 3 .
FIG. 3. Performance of the ZD strategy compared with the SSE strategy in CPS with different mechanisms.Red dotted lines show the defender's average utility adopting an SSE strategy, while blue solid lines show the defender's average utility adopting the corresponding ZD strategy in Theorem 4 or Theorem 5.
Fig 3(a)-(f), when λ is close to 0, like 0.1 in Fig 3(a), (e), and 0.2 in Fig 3(b), (f), the average utility of the defender with the ZD strategy is higher than that with the SSE strategy.Besides, when λ is close to 1, like 0.8 in Fig 3(c), (g), and 0.9 in Fig 3(

TABLE I .
Utility matrix d t , a t ) T , where {d t ∼ π d (•|s t )} t 0 , {a t ∼ π a (•|s t )} t 0 , and {s t ∼ P (•|s t−1 , d t−1 , a t−1 )} t>0 describe the evolution of states and actions over stage.Additionally, s 0 is the initial state ramdomly samplied from S, and the expected utility of each player is the same for any s 0 since M is convergent.
Theorem 1 Under Assumption 1, there exists at least one ZD strategy of the defender in G if either of the following two relations is satisfied.

TABLE II .
Utility matrix in MTD problems