Strategically influencing an uncertain future

Many of today’s most pressing societal concerns require decisions which take into account a distant and uncertain future. Recent developments in strategic decision-making suggest that individuals, or a small group of individuals, can unilaterally influence the collective outcome of such complex social dilemmas. However, these results do not account for the extent to which decisions are moderated by uncertainty in the probability or timing of future outcomes that characterise the valuation of a (distant) uncertain future. Here we develop a general framework that captures interactions among uncertainty, the resulting time-inconsistent discounting, and their consequences for decision-making processes. In deterministic limits, existing theories can be recovered. More importantly, new insights are obtained into the possibilities for strategic influence when the valuation of the future is uncertain. We show that in order to unilaterally promote and sustain cooperation in social dilemmas, decisions of generous and extortionate strategies should be adjusted to the level of uncertainty. In particular, generous payoff relations cannot be enforced during periods of greater risk (which we term the “generosity gap”), unless the strategic enforcer orients their strategy towards a more distant future by consistently choosing “selfless” cooperative decisions; likewise, the possibilities for extortion are directly limited by the level of uncertainty. Our results have implications for policies that aim to solve societal concerns with consequences for a distant future and provides a theoretical starting point for investigating how collaborative decision-making can help solve long-standing societal dilemmas.


Beta discounting in repeated games
We start with describing in detail the probabilistic discounting framework presented in the main text. Let us assume the discount factors are described by a random variable x, whose probability density function f (x, α, β), defined for all x ∈ [0, 1], is of the beta form where B(α, β) is the Beta function: B(α, β) = Γ(α)Γ(β) Γ(α + β) .
Observe that for t = 0 it holds that d(t) = 1. This is expected because payoffs received now should not be discounted. Subsequently, the average discounted payoff of player i in the repeated game with beta discounting is .

(S4)
Interestingly, for the case of beta discounting for β > 1 the series of the effective discount function has an explicit solution and evaluates as Now, observe that after some simplification we have Using the property of the Gamma function that for z > 1 it holds that Γ(z + 1) = zΓ(z), we can write Because β > 0, it holds that d(t+1) d(t) < 1 and thus the effective discount function is decreasing over time, which is expected. Interestingly, as time increases the fraction d(t+1) d(t) < 1 is increasing, and approaches 1 asymptotically. This implies that two distant future payoffs are discounted with a similar discount factor. Hence, also beta discounting supports the characteristic feature of hyperbolic discounting.

Risk-adjusted strategies in repeated games
Now, let p t σ ∈ [0, 1] denote the probability that the strategic player cooperates at the next round given that the current action profile 1 is σ ∈ {C, D} n . By stacking the conditional probabilities for all possible outcomes into a vector, we obtain a time dependent memory-one strategy that determines the probability for the strategic player to cooperate in the next round: Accordingly, the repeated memory-one strategy given by p rep = (p rep σ ) σ∈{C,D} n determines the probability to cooperate when the current decision is simply repeated. Let us denote this strategic player by i. We will use the common notation σ = (σ i , σ −i ) ∈ {C, D} n . Then the entries of the repeated memory-one strategy are given by p rep (C,σ−i) = 1 and p rep (D,σ−i) = 0 for all σ −i ∈ {C, D} n−1 . Now, let v σ (t) denote the probability that the outcome at round t is σ ∈ {C, D} n , and let v(t) = (v σ (t)) σ∈{C,D} n be the vector of outcome probabilities at round t. Using the limit of the series of the effective discounting function in (S5), the mean distribution of the action profiles is In order to relate the average discounted payoff to the mean distribution, we introduce some additional notation adopted from 1 . Let g i σ denote the one-shot payoff that i receives when the action profile is σ ∈ {C, D} n . By stacking the possible outcomes into a vector, one obtains g i = (g i σ ) σ∈{C,D} n , which contains all possible payoffs that player i can obtain in a round of play.
The expected payoff of player i at round t can then be expressed by multiplying this one-shot payoff vector by the vector of outcome probabilities. That is, π i (t) = g i · v(t). Consequently, the average discounted payoff of i can be written as π i = g i · v, and thus expected payoffs of the repeated game follow from the mean distribution v and the one-shot payoffs. To obtain the relation between p t and v, let us define For real β > 1 and α > 0 we have lim Furthermore, dividing by the series in (S5) we obtain where p 0 = q C (0) is the strategic player's initial probability to cooperate.
Remark 1. The relation in (S13) can be seen as a probabilistic discounting extension of Akin's result on the relation between a memory-one strategy and the mean distribution of an infinitely repeated game without discounting 2 (Theorem 1.3).
We are now ready to formulate a risk-adjusted strategy for repeated games with beta discounting. For this, let 1 = (1) σ∈{C,D} n . Definition 1 (risk adjusted strategy). A time dependent memory-one strategy p t with entries in the closed unit interval for all time t ≥ 0, is a risk-adjusted strategy for a symmetric n-player game if there exist shape parameters α > 0, β > 1, constants (s, l) ∈ R 2 , weights w j for 1 ≤ j ≤ n and φ such that under the requirement that w i = 0, n j=1 w j = 1 and φ = 0. Proposition 1 (Enforcing a linear payoff relation). Assume the probabilistic discount factor has a fixed beta distribution with real parameters α > 0 and β > 1. If a player applies a fixed riskadjusted strategy as in Definition 1, then, independent of the fixed strategies of the n − 1 group members, the average discounted payoffs obey the equation Proof. Substituting the expression for p t into (S13) we obtain (S17) From the distributive and commutative properties of the dot product, this implies where we have used the fact that v · 1 = 1. Finally, because φ = 0 it follows that Remark 2. At time t = 0, in the deterministic limit α → ∞ and β < ∞, the risk-adjusted strategy in (S15) recovers a zero-determinant strategy for infinitely repeated multiplayer games 1 .
, then the limit α → ∞, recovers a zero determinant strategies for a deterministically discounted multiplayer social dilemma.
To obtain a well-defined risk-adjusted strategy the parameters of the linear payoff relation (s, l) ∈ R 2 cannot be chosen arbitrarily. This is formalized in the following definition.
Definition 2 (Enforceable linear payoff relations). A payoff relation (s, l) ∈ R 2 with weights w ∈ R n−1 is enforceable by p t if there exist shape parameters α > 0, β > 1 such that for all t ≥ 0 the elements of p t are in the closed unit interval. Now, let us consider the following useful Lemma concerning the time-varying elements of the risk-adjusted strategy.

Lemma 1 (Monotonically increasing upper bounds).
If the entries of p 0 are in the closed unit interval, then the entries of p t remain in the closed unit interval for all t ≥ 0.
Proof. It holds that Since this inequality needs to hold for all t ≥ 0, it must hold for the minimum upper bound in t.
We continue to show that this minimum occurs at t = 0. To this end, observe that Clearly, this is satisfied for any t ≥ 0 and β > 0.

The existence of risk-adjusted strategies
Lemma 1 implies that the existence of risk-adjusted strategies can be shown by the implications of the inequality which leads to the following proposition.
Proposition 2. When future payoffs are discounted by (S3), generous payoff relations with 0 < s < 1 and l = a n−1 are not enforceable.
Proof. Suppose all players are cooperating e.g. σ = (C, C, . . . , C), then all players receive the one shot payoff a n−1 . By plugging these payoffs into the risk-adjusted strategy in Definition 1, and using the fact that j =i w j = 1, one obtains Using Lemma 1, the requirement that at t = 0 the entries of the risk-adjusted strategy are in the unit interval implies Now for the generous strategy it is required l = a n−1 . From the above equation we obtain the requirement, Clearly, for any p 0 ∈ [0, 1], the lower bound is satisfied. However, the upper bound reads as Because β > 1 and α > 0, we have β−1 α+β−1 > 0 and because p 0 ∈ [0, 1], a necessary condition for this inequality to hold for some p 0 ∈ [0, 1] is that it holds for p 0 = 1, i.e., We proceed to show that (S27) cannot be satisfied for α > 0, β > 1. For this, we write (S27) equivalently as Because β > 1 and α > 0, the left hand side is positive. Moreover, because α and β are reals, they are finite and we arrive at a contradiction. This completes the proof.

Remark 3 (Deterministic limits).
In the deterministic limit α → ∞ and β < ∞, we obtain 1 ≤ 1 which is always satisfied and thus in the deterministic limit of infinitely repeated games without discounting, generous strategies can exist, which is consistent with the results in 1,3 . If we additionally let β = α(1−δ) δ , then thus generous strategies also exist in the deterministic limit of exponential discounting.

The generosity gap
In the previous section we have seen that under beta discounting, generous strategies are not well-defined. However, from Lemma 1 we know that over time the upper-bound of the strategies increases monotonically. This implies that at some time in the future the generous risk-adjusted strategy can become well-defined. We refer to the time before the generous strategy becomes welldefined as the generosity gap. To investigate the length of the generosity gap, one can start from (S25), but instead of evaluating it at time t = 0, using the general bound with a variable t. That is, First note that when p 0 = 0, this inequality cannot be satisfied for any t ≥ 0. Therefore, assume p 0 > 0. After some basic manipulation one obtains that (S28) is satisfied if and only if: Because β > 1 and α > 0 this fraction is non-negative and decreasing in p 0 . Thus the minimum occurs when p 0 = 1 and evaluates as min .
To obtain an understanding for how this generosity gap τ behaves with respect to increasing uncertainty we fix the mean discount factor to a constant µ ∈ (0, 1). That is, By substitution of β, the expression for the minimum generosity gap becomes that is well defined for α > µ 1−µ , i.e. β > 1 and decreasing in α. By substituting the expression for β into the variance of the beta distribution we obtain The upper-bound on the variance is obtained by using the requirement α > µ 1−µ . More importantly, observe that (S33) is a positive function that is decreasing for increasing α. For a fixed mean also the generosity gap in (S32) is decreasing for increasing α. Hence, one can conclude that the generosity gap becomes shorter as the discount rate becomes more certain. Indeed, for deterministic discount rates generous strategies exist in social dilemmas (see also Remark 3).

Characterizing enforceable payoff relations of risk-adjusted strategies
We now continue to characterize the enforceable payoff relations of risk-adjusted strategies. Previously, this was done for ZD strategies in infinitely repeated multiplayer social dilemmas without discounting 1 . We begin with formulating necessary conditions. Proposition 3. In n-player symmetric social dilemma games with payoffs as in the main text, the enforceable payoff relations of a risk-adjusted strategy with beta discount factors require the following necessary conditions b 0 ≤ l < a n−1 .
Proof. In the following, we refer to the ZD strategist as player i. Let t = 0 and suppose all players are cooperating e.g. σ = (C, C, . . . , C). In this case, every player receives the one shot payoff a n−1 . By plugging these payoffs into the risk-adjusted strategy in Definition 1, and using the fact that j =i w j = 1, one obtains Now suppose on the contrary that all players defect, then all players receive the one shot payoff b 0 and the entry of the risk-adjusted strategy is The requirement that (S37) and (S38) belong to the closed unit interval results in the inequalities where the strict upper bound in (S39) follows from the fact that (S26) cannot be satisfied for α > 0, β > 1 and p 0 ∈ [0, 1]. By multiplying (S39) by −1 we obtain By adding (S40) and (S41) we obtain Combining this with the assumption in the main text that a n−1 > b 0 , it follows that in order for the payoff relation to be enforceable it is necessary that Now suppose there is a single defecting player, i.e., σ = (C, C, . . . , D) or any of its permutations. In this case, the cooperators receive a n−2 and the single defector obtains b n−1 . In the case when the single defector is j = i, the entry of the risk-adjusted strategy is and if the unique defector is i, the entry of p rep is equal to zero and thus, the entry of the riskadjusted strategy is The requirement that (S44) and (S45) belong to the closed unit interval results in the following inequalities By combining these two conditions in a similar manner as was done for the homogeneous action profile we obtain From the assumption b z+1 > a z in the main text, it follows that Together with (S43) this implies that This also implies that In combination with (S49) it follows that Moreover, because n j=1 w j = 1, it follows that min j =i w j ≤ 1 n−1 . Hence, the necessary condition turns into: Let us now investigate the bounds on the baseline payoffs. From (S40) we have Thus, in order for (S52) to hold it is required that It then must hold that l < a n−1 .
Let us now formulate the result that fully characterizes the enforceable payoff relations of risk-adjusted strategies. Towards this let w = (w j ) ∈ R n−1 denote the vector of weights and let w z = min w h ∈w ( z h=1 w h ) denote the sum of the j smallest weights of j = i and finally letŵ 0 = 0.
Proposition 4. For the repeated n-player game with beta discount factors such that α > 0 and β > 1 and payoffs as in the main text that satisfy the social dilemma assumptions in the main text, the payoff relation (s, l) ∈ R 2 with weights w ∈ R n−1 is enforceable by the risk-adjusted strategy in (S15) if and only if − 1 n−1 < s < 1 and Proof. Let t = 0. In the following we refer to the key player, who is employing the ZD strategy, as player i. Let σ = (σ 1 , . . . , σ n ) such that σ k ∈ {C, D} and let σ C be the set of i s co-players that cooperate and let σ D be the set of i s co-players that defect. Also, let |σ| be the total number of cooperators in σ including player i. Using this notation, and using the fact that α, β > 0, for some action profile σ we may write the ZD strategy in (S15) as Also, note that n j =i and because n j =i w j = 1, it holds that Additionally, note that because of the symmetric one shot payoffs, for all h ∈ σ C it holds that g h σ = a |σ|−1 , and for all k ∈ σ D , g k σ = b |σ| . It follows that (S57) can be written as n j =i Accordingly, the entries of the ZD strategy α α+β p 0 σ are given by (S59). For all σ ∈ S we require that This leads to the inequalities in (S60) and (S61). Because φ > 0 can be chosen arbitrarily small, the inequalities in (S60) can be satisfied for some α > 0 and β > 1 and p 0 ∈ [0, 1] if and only if for all σ such that σ i = C the inequalities in (S62) are satisfied. Now substituting into this requirement we obtain that for all σ such that σ i = C the following inequalities are required to hold: (S60) On the other hand, when σ i = D the following inequalities are required to hold: The inequality in (S62) together with the necessary condition that (1 − s) > 0 implies that and thus provides an upper-bound on the enforceable baseline payoff l. We now turn our attention to the inequalities in (S61) that can be satisfied if and only if for all σ such that σ i = D the following holds Combining (S64) and (S63) we obtain max |σ|s.t.σi=D Because b |σ| − a |σ|−1 > 0 and (1 − s) > 0, the minima and maxima of the bounds in (S65) are achieved by choosing the w j as small as possible. That is, the extrema of the bounds on l are achieved for those action profiles σ with σ i = C in which l∈σ C w l is minimum, and those σ with w k is minimum. By the above reasoning, (S65) can be equivalently written as in the proposition. This completes the proof.

Uncertainty and the level of extortion
Definew z := max w h ∈w z h=1 w h to be the maximum sum of weights for some permutation of σ ∈ A with z cooperating co-players. Additionally, for some given payoff relation (s, l) ∈ R 2 and w ∈ R n−1 , define Remark 4. In the applications in the main text, we assume all weights are equal such that w j = 1 n−1 for all j = i. In this case, Proposition 5. Assume p 0 = 0 and (s, b 0 ) ∈ R 2 satisfy the conditions in Proposition 4, then ρ C > 0 and ρ D + ρ C > 0. Moreover, the threshold mean µ above which the extortionate payoff relation can be enforced is given by Proof. For α > 0, β > 1 and for the existence of extortionate payoff relations with l = b 0 , we know p 0 = 0 is required (this fact immediately follows from the lower bound in (S40)). By substituting this into (S60) it follows that in order for the payoff relation to be enforceable it is required that for all σ such that σ i = C the following holds: Hence, (S60) with p 0 = 0 implies that for all σ such that σ i = C it holds that Naturally, ρ C ≥ ρ C . In the special case in which equality holds, then it follows from (S68) that µ ≥ 0, which is satisfied for any α, β > 0. We continue to investigate the case in which ρ C > ρ C . In this case, a solution to (S68) for some φ > 0 exists if and only if which leads to the first expression in the proposition. Now, from (S61) with p 0 = 0, it follows that in order for the payoff relation to be enforceable it is necessary that Because φ > 0 is necessary for the payoff relation to be enforceable it follows that ρ D (σ) ≥ 0 for all σ such that σ i = D. Let us first investigate the special case in which ρ D (z,w z ) = 0. Then (S70) is satisfied for any φ > 0 and µ ∈ (0, 1). Now, assume ρ D (z,w z ) > 0. Then, (S70) and (S68) imply In order for such a φ to exist, it needs to hold that This completes the proof.
Corollary 1 (Symmetric or negatively skewed beta distribution). Enforceable extortionate payoff relations (s, b 0 ) ∈ R 2 require α α+β ≥ 1 2 if one or both of the following holds: 3 Definitions of n-player games in main text

The n-player snowdrift game
The n-player snowdrift game describes the situation in which cooperators need to clear out a snowdrift so that everyone can go on their merry way. By clearing out the snowdrift together, cooperators share a cost c required to create a fixed benefit b. If a player cooperates together with z group members, their one-shot payoff is a z = b − c z . If there is at least one cooperator (z > 0) who clears out the snowdrift, then defectors obtain a benefit b z = b. If no one cooperates, the snowdrift will not be cleared and everyone's payoff is b 0 = 0. By applying the bounds S55 to these one-shot payoffs with l = 0, one can conclude that extortionate strategies can enforce any slope s ≥ 1 − c b(n−1) . Application of Proposition 5 with equal weights w j = 1 n−1 for all j = i, b = 5 4 , c = 1 and n = 3 leads to the left plot in Figure 4 of the main text.

The n-player linear public goods game
In the n-player linear public goods game, cooperators contribute an amount c > 0 to a publicly available good that grows linearly with the number of cooperators. The sum of the contributions is scaled by a public goods multiplier 1 < r < n and then distributed evenly among all players. For cooperators, this results in the one-shot payoffs a z = rc(z+1) n − c and defectors receive b z = rcz n . The bounds in (S55) with l = 0 characterises the enforceable extortionate slopes s. An application of Proposition 5 with equal weights (w j = 1 n−1 for all j = i), the parameter values r = 12 10 , c = 1, and n = 3 leads to the right plot in Figure 4 of the main text.