Optimal nonlinear information processing capacity in delay-based reservoir computers

Reservoir computing is a recently introduced brain-inspired machine learning paradigm capable of excellent performances in the processing of empirical data. We focus in a particular kind of time-delay based reservoir computers that have been physically implemented using optical and electronic systems and have shown unprecedented data processing rates. Reservoir computing is well-known for the ease of the associated training scheme but also for the problematic sensitivity of its performance to architecture parameters. This article addresses the reservoir design problem, which remains the biggest challenge in the applicability of this information processing scheme. More specifically, we use the information available regarding the optimal reservoir working regimes to construct a functional link between the reservoir parameters and its performance. This function is used to explore various properties of the device and to choose the optimal reservoir architecture, thus replacing the tedious and time consuming parameter scannings used so far in the literature.


B The reservoir map and the connectivity matrix
The reservoir map F : R N × R N × R K −→ R N introduced in (4) is uniquely determined by the recursions (3) obtained out of the Euler discretization of the time-delay differential equation (TDDE) and organized in neuron layers parametrized by t ∈ Z. The reservoir map is obtained by using (3) in order to write down the neuron values of the layer for time t in terms of those for time t − 1 and the current input signal value. More specifically: which corresponds to a description of the form that uniquely determines the reservoir map F : be the partial derivative of F with respect to the first argument computed at the point (x 0 , 0 N , θ). We will refer to A(x 0 , θ) := D x F (x 0 , 0 N , θ) as the connectivity matrix of the reservoir at the point x 0 . It is easy to show that A(x 0 , θ) has the following explicit form where Φ := (1 − e −ξ )∂ x f (x 0 , 0, θ) and ∂ x f (x 0 , 0, θ) is the first derivative of the nonlinear kernel f in (B.1) with respect to the first argument and computed at the point (x 0 , 0, θ). We will also use the symbol f x0 to denote ∂ x f (x 0 , 0, θ).

C The approximating vector autoregressive system and information processing capacity estimations
The goal of this appendix is providing details on the construction of the approximating VAR(1) model for the TDR obtained after a partial linearization of the reservoir map on the dynamical variables and respecting the nonlinearity on the input signal. Let x 0 ∈ R and x 0 := x 0 i N ∈ R N be an equilibrium of (B.1) and a fixed point of (B.3), respectively (see Proposition D.9). These solutions are chosen with respect to the autonomous regime, that is, we set I(t) = 0 in the time-delay differential equation (B.1) and I(t) = 0 N in the associated recursion (B.3). In practice, we choose stable solutions; Appendix D contains conditions that ensure this dynamical feature.
The VAR model setup and the construction of the approximating VAR system. The vector autoregressive (VAR) model is of much use in multivariate time series analysis and is a natural extension of the univariate linear autoregressive (AR) model. See [2] for an extensive introduction to multivariate time series methods and the details on the VAR processes. The N -dimensional VAR(1) process (VAR model of order 1) is defined in mean-adjusted form as the solution to the recursions where x(t) = (x 1 (t), . . . , x N (t)) ∈ R N is a random vector, A ∈ M N is a fixed coefficient matrix, µ x ∈ R N , and (t) = ( 1 (t), . . . , N (t)) ∈ R N is such that { (t)} ∼ WN(0 N , Σ ) is a N -dimensional white noise (stochastic process that presents no autocorrelation) or innovation process with mean 0 N and covariance matrix Σ ∈ S N . We are particularly interested in stable VAR models, that is, models of the type (C.1) where the autoregression matrix A is chosen such that det (I N − Az) = 0 for all z ∈ C such that |z| ≤ 1.
It can be proved (see Proposition 2.1 in [2]) that stable models have a unique second order stationary solution {x(t)} t∈Z for which µ x = E[x(t)] and the autocovariance function Let now x 0 = x 0 i N ∈ R N be the stable fixed point of the reservoir map (B.3) in autonomous regime, that is, F (x 0 , 0 N , θ) = x 0 . We now approximate (B.3) by its partial linearization at x 0 with respect to the delayed self feedback and by the Rth-order Taylor series expansion on the variable that determines the input signal injection. We obtain the following expression: where D x F (x 0 , 0 N , θ) is the first derivative of F with respect to its first argument, computed at the point (x 0 , 0 N , θ). We recall that D x F (x 0 , 0 N , θ) = A(x 0 , θ) is the connectivity matrix introduced in (B.4). Additionally, ε(t) in (C.3) is obtained out of the Taylor series expansion of F (x(t), I(t), θ) in (B.2) on I(t) up to some fixed order R ∈ N and is given by where c i , i ∈ {1, . . . , N } are the entries of the input mask c ∈ R N and (∂ (i) is the ith order partial derivative of the nonlinear reservoir kernel f in (B.1) with respect to the second argument I(t) computed at the point (x 0 , 0, θ).
If we now use as input signal z(t) independent and identically distributed random variables with mean 0 and variance σ 2 z , that is, {z(t)} t∈Z ∼ IID(0, σ 2 z ), then the recursion (C.3) makes the reservoir layer dynamics {x(t)} t∈Z into a discrete time random process that, as we show in what follows, is the solution of a N -dimensional VAR(1) model. Indeed, it is easy to see that the assumption {z(t)} t∈Z ∼ IID(0, σ 2 z ) implies that {I(t)} t∈Z ∼ IID(0 N , Σ I ), with Σ I := σ 2 z c c, and that {ε(t)} t∈Z is a family of N -dimensional independent and identically distributed random variables with mean µ ε and covariance matrix Σ ε given by the following expression: where the polynomial q R is the same as in (C.5) and where we use the convention that the powers µ i z := E z(t) i , for any i ∈ {1, . . . , R}. For example, if the variables z(t) are normal, that is, Additionally, Σ ε := E (ε(t) − µ ε )(ε(t) − µ ε ) has entries determined by the relation: where the first summand stands for the multiplication of the polynomials q R (·, c 1 , . . . , c i ) and q R (·, c 1 , . . . , c j ) and the subsequent evaluation of the resulting polynomial at µ z , and the second one is made out of the multiplication of the evaluation of the two polynomials. With these observations it is clear that we can consider (C.3) as a VAR(1) model driven by the independent noise {ε(t)} t∈Z . If the nonlinear kernel f satisfies the generic condition that the polynomial in z given by det (I N − A(x 0 , θ)z), does not have roots in and on the complex unit circle, then (C.3) has a unique second order stationary solution {x(t)} t∈Z with time-independent mean that can be used to rewrite (C.3) in mean-adjusted form In the presence of stationarity we can recursively compute the time independent autocovariance function Γ(k) := E (x(t) − µ x ) (x(t − k) − µ x ) at lag k ∈ Z by using the Yule-Walker equations [2]. Indeed, Γ(0) is given by the vectorized equality: which determines the higher order autocovariances via the relation and the identity Γ(−k) = Γ(k) .
The nonlinear memory capacity estimations. We now concentrate on the computation of the quantitative measures of the reservoir performance introduced in the paper. In particular, we will provide details on the computation of the nonlinear memory capacity formula in (13). Recall that a h-lag memory task is determined by a function H : R h+1 → R (in general nonlinear) that is used to generate a one-dimensional signal y(t) := H(z(t), z(t − 1), . . . , z(t − h)) out of the reservoir input {z(t)} t∈Z . Consider now a TDR computer with N neurons. The optimal linear readout W out adapted to the memory task H is given by the solution of a ridge (or Tikhonov [3]) linear regression problem with regularization parameter λ ∈ R (usually tuned during the training phase via cross-validation) in which the covariates are the neuron values corresponding to the reservoir output and the explained variables are the values {y(t)} of the memory task function. More explicitly, W out is given by the solution of the following optimization problem (W out , a out ) := arg min where the expectation is taken thinking of y t and x(t) as random variables due to the stochastic nature of the input signal {z(t)} t∈Z and hence that of the {I(t)} t∈Z . In order to obtain the explicit solution of (C.12), we first define g(W, a) := E (W · x(t) + a − y(t)) 2 + λ W 2 and set or, equivalently, These equations yield the pair (W out , a out ) that minimizes g(W, a). We now use the fact that {x(t)} t∈Z is the unique stationary solution of VAR(1) approximating system (C.9) for the TDR (C.9) and hence obtain where µ x is provided in (C.8), Γ(0) ∈ S N is determined by the generalized Yule-Walker equations in (C.10) and Cov(y(t), x(t)) is a vector in R N that has to be determined for every specific memory task H. Additionally, it is easy to verify that the error committed by the reservoir when using the optimal readout is The H-memory capacity C H (θ, c, λ) of a reservoir computer constructed using a nonlinear kernel f with parameters θ, an input mask c, and regularizing ridge parameter λ, is defined as one minus the normalized mean square error committed at the time of accomplishing the memory task H. Expression (C.15) shows that when the RC is approximated by the VAR(1) model (C.9), the corresponding H-memory capacity can be approximated by Since the normalized error coming from the expression (C.15) is clearly bounded between zero and one, it is also clear that: We emphasize that in order to evaluate (C.16) for a specific memory task, only Cov(y(t), x(t)) and var (y(t)) need to be computed since the autocovariance Γ(0) is fully determined by (C.10) once the reservoir and the equilibrium x 0 around which we operate have been chosen. Once a specific reservoir and task H have been fixed, the capacity function C H (θ, c, λ) can be explicitly written down and it can hence be used to find reservoir parameters θ opt and an input mask c opt that maximize it, by solving the optimization problem (θ opt , c opt ) := arg max θ∈R K ,c∈R N C H (θ, c, λ). (C.17) Two specific memory tasks. In the following paragraphs we spell out the computation of Cov(y(t), x(t)) and var (y(t)) necessary to evaluate the memory capacity formula (C.16) for the two most basic memory tasks, namely, the linear and the quadratic ones.
(i) The h-lag linear memory task. The linear memory task is determined by the linear task functions H : R h+1 → R that we now describe. First, let z h (t) := (z(t), z(t − 1), . . . , z(t − h)) and let L ∈ R h+1 . We then set H(z h (t)) := L z h (t). In order to evaluate the h-lag memory capacity using formula (C.16), we need to evaluate var (y(t)) and Cov(y(t), x(t)) with y(t) := H(z h (t)).
First, since {z(t)} t∈Z ∼ IID(0, σ 2 z ), we then immediately obtain that var(y(t)) = σ 2 z ||L|| 2 . (C.18) Next, we use the so called MA(∞)-representation of the VAR(1) in (C.9), namely, which immediately yields that (C.21) where the polynomial p R on the variable x is defined by p R (x, c 1 , . . . , c r ) := x · q R (x, c 1 , . . . , c r ) and its evaluation follows the same convention as in (C.6). The expressions (C.18) and (C.21) can be readily substituted in (C.16) in order to obtain an explicit expression for capacity C H (θ, c, λ) associated to the h-lag linear memory task as a function of the reservoir parameters θ and the input mask c. This expression can be subsequently treated as in (C.17) in order to determine optimal architecture parameters for this particular task.
(ii) The h-lag quadratic memory task. In this case we use a quadratic task function H : R h+1 → R of the form for some symmetric matrix Q ∈ S h+1 . Analogously to the linear task case, in order to evaluate the memory capacity associated to H, we have to derive explicit expressions for var (y(t)) and Cov(y(t), x(t)) with y(t) := H(z h (t)). The same computations as in the case of the linear task apply.
Hence, if we put together (C.24) and (C.26), we obtain Recall that for Gaussian variables, that is {z(t)} t∈Z ∼ IN(0, σ 2 z ), we have that µ 4 z = 3σ 2 z , and hence in that case var(y(t)) = 2σ 4 Regarding the computation of the covariance and analogously to the case of the linear h-lag memory task, we use the MA(∞) representation of the VAR(1) model of the TDR in (C.9) and write which leads to the following result: . . , c r ) and is evaluated following the same convention as in (C.6) but taking x 2 instead of x.
Again, we conclude by noticing that the expressions (C.29) and (C.27) substituted in (C.16) provide an explicit formula for capacity C H (θ, c, λ) associated to the h-lag quadratic memory task as a function of the reservoir parameters θ and the input mask c and hence it can be readily used to solve the optimization problem in (C.17).
An observation that is worth to be pointed out is that only the diagonal elements in Q intervene in the covariance (C.29) while all its entries are present in the variance (C.27). When these two quantities are substituted in the memory capacity formula (C.16) it can be seen that by choosing sufficiently high off-diagonal entries in Q, the capacity of the reservoir can be made arbitrarily small which shows a structural limitation of the architecture that we are considering that can only be fixed by using alternative signal feeding schemes.

D Equilibria of the continuous and the discrete time models for the TDR and their stability
As we already explained, the linearization of the reservoir map at a stable fixed point is at the core of the developments in this paper. That is why in this section we carry out a detailed study of the stability properties of the equilibria of the time-delay differential equation (B.1) and of the fixed points of its corresponding discrete-time approximation (B.3). More specifically, we provide sufficient stability conditions and we show that our results exhibit a remarkable consistence regardless of the use of the continuous or of the discrete time schemes.

D.1 Stationary solutions of time-delay differential equations and their stability
We start by recalling some basic facts about the properties of the solutions of the time-delay differential equations and their stability. Let τ ∈ R + be a fixed delay and consider a time-delay map Additionally, for any t ∈ R define the shift operator where λ t is the translation operator by t ∈ R, that is, λ t (s) := s + t, for any s ∈ R. Let now γ ∈ C 1 ([−τ, +∞), R) be a differentiable curve. We say that γ is a solution of the time-delay differential equation (TDDE) determined by X when the equalitẏ holds for any t ∈ [0, +∞). Note that the TDDE (B.1) that is at the core of this paper, namelẏ can be encoded as in (D.3) by using the time-delay map X given by Definition D. 1 We say that the time-delay map X is locally Lipschitzian on the open set Theorem D.2 (Existence and uniqueness of solutions) Let X be a continuous and locally Lips- We say that Γ φ is the solution of the time-delay differential equation determined by X with initial condition φ, or simply the solution through φ. The associated flow is defined as the map We now recall also some basic notions of stability of common use in the TDDE context; see [4] and [5] for details. Let x 0 ∈ R and let φ x0 ∈ C 1 ([−τ, 0], R) be the constant curve at x 0 . We say that the point x 0 is an equilibrium of the TDDE determined by the time-delay map and with flow F whenever F t (φ x0 ) = x 0 , for any t ∈ [−τ, +∞). The equilibrium x 0 is said to be stable (respectively asymptotically stable) if for any > 0 there exists a δ( ) > 0 such that for any The following stability criterion is an extension of Lyapunov's Second Method to the TDDE context due to Krasovskiy [6]. We state it using our notation since it will be used in the sequel.
such that for any φ ∈ C 1 ([−τ, 0], R)) and any t ∈ [0, +∞) satisfies that then x 0 is asymptotically stable. If w(t) ≥ 0 then x 0 is just stable. A functional V that satisfies these conditions is called a Lyapunov-Krasovskiy functional.

D.2 Equilibria of the reservoir time-delay equation and their stability
We now use Theorem D.3 to establish sufficient conditions for the stability of the equilibria of the TDDE (B.1) at the core of the paper, namely, where f is the nonlinear kernel of the TDR. The main tool in the application of that result is the use of a Lyapunov-Krasovskiy functional of the form . See [6], [4] and [5] for the extensive discussion.
Theorem D.4 Let x 0 be an equilibrium of the time-delay differential equation (D.10) in autonomous regime, that is, when I(t) = 0, and suppose that there exists ε > 0 and k ε ∈ R such that one of the following conditions holds If |k ε | < 1 then x 0 is asymptotically stable. If |k ε | ≤ 1 then x 0 is stable.
Proof. Notice first that the equilibria x 0 in the statement are characterized by the equality f (x 0 , 0, θ) = x 0 . Consider now the Lyapunov-Krasovskiy functional introduced in (D.11). It is easy to see that since V (x φ , t) is positive it satisfies condition (i) in Theorem D.9. We will now show that any of the two conditions in the statement imply that condition (ii) in Theorem D.9 are satisfied and hence guarantee the stability of x 0 . We start by writing We now distinguish two cases, namely, when x 0 = 0 and when x 0 = 0.
Consider (D.12) again, now under the hypothesis (ii) of the statement of the theorem. In the case of the trivial solution it implies that there exists ε > 0 and k ε > 0, such that f (x, 0, θ) x ≤ k ε for all x ∈ (−ε, ε). In order to ensure the asymptotic stability of the trivial solution using Theorem D.3, we need to find conditions under which the expression (D.12) is negative, that is d dt V (x φ , t) < 0. We proceed by first multiplying both sides of this inequality by the positive quantity 1 Then due to the hypothesis (ii) of the theorem, a sufficient condition for this inequality to hold is (D.14) Notice that when x φ (t) = x φ (t − τ ) = x 0 , this inequality is always satisfied provided that k ε ≤ 1. Hence in order for (D.14) to hold, it suffices that the polynomial on z has no real roots, which happens, as in point (i) of the statement when k 2 ε < −4µ(µ − 1). Proceeding analogously as under assumption (i) of the theorem, we obtain |k ε | < 1 (respectively |k ε | ≤ 1) as the sufficient condition for the asymptotic stability (respectively stability) of x 0 = 0, as required.
where the function g is defined as g(y(t), 0, θ) := f (y(t) + x 0 , 0, θ) − x 0 . The equation (D.2) has an equilibrium at y 0 = 0 whose stability can be easily studied by mimicking the case x 0 = 0 discussed above. More specifically, it can be shown following the same arguments that in this case the hypothesis (i) of the statement of the theorem can be written as g(y, 0, θ) ≤ k ε y (D.17) and y 0 = 0 is stable or asymptotically stable whenever |k ε | ≤ 1 or |k ε | < 1, respectively. The inequality (D.17) is equivalent to which guarantees that a non-trivial equilibrium x 0 of (D.10) is stable or asymptotically stable when the same conditions on k ε as in the trivial case are satisfied.
Finally, the hypothesis (ii) of the statement of the theorem in the case of (D.16) has the form g(y, 0, θ) y ≤ k ε (D.18) and y 0 = 0 is stable or asymptotically stable when |k ε | ≤ 1 or |k ε | < 1, respectively. It is easy to verify that the inequality (D.18) is equivalent to which provides the same corresponding sufficient conditions on k ε for stability or asymptotic stability of a non-trivial equilibrium x 0 of (D.10), as required.
Corollary D.5 Let x 0 be an equilibrium of the TDDE (D.10) and suppose that the nonlinear reservoir kernel function f is continuously differentiable at , then x 0 is asymptotically stable (respectively, stable).
We now study the equilibria and the parameter values that ensure their stability when Corollary D.5 is applied to the two nonlinear kernels that are most used in our work, that is, the Mackey-Glass [7] and the Ikeda [8] parametric families. We recall that the Mackey-Glass nonlinear kernel is given by the expression where the parameter θ := (γ, η, p) ∈ R 3 is a three tuple of real values. The Ikeda nonlinear kernel corresponds to f (x, I, θ) = η sin 2 (x + γI + φ), (D.23) where the parameter vector θ := (γ, η, φ) ∈ R 3 . In both cases the parameter γ is called the input gain and η the feedback gain. (i) The trivial solution x 0 = 0, for any η ∈ R. The equilibrium x 0 = 0 is asymptotically stable (respectively, stable) if |η| < 1 (respectively, |η| ≤ 1).
Proof. First, in order to characterize the equilibria of the time-delay differential equation (D.10) with the nonlinear kernel in (D.24), we solve 0 = −x + f (x, 0, θ) or, equivalently, A straightforward computation shows that this equality is equivalent to x(x 2 − (η − 1)) = 0 which immediately yields the two families of equilibria in the statement, namely, x 0 = 0, ∀η ∈ R and x 0 = ± √ η − 1 for any η > 1. We now use Corollary D.5 of Theorem D.4, in order to provide the sufficient conditions for stability and asymptotic stability of these two families. Using (D.24), we obtain that Then, when we evaluate this expression at the equilibria under study, we obtain: (i) for x 0 = 0, we have that ∂ x f (x 0 , 0, θ) = η and hence by Corollary D.5 the trivial solution x 0 is asymptotically stable (respectively, stable) if |η| < 1 (respectively, |η| ≤ 1).
The Ikeda nonlinear TDDE exhibits two families of equilibria: (i) The trivial solution x 0 = 0 for any η ∈ R and φ = πn, n ∈ Z. The equilibium x 0 = 0 is asymptotically stable for any η ∈ R.
Proof. The equilibria of the time-delay differential equation (D.10) with the Ikeda kernel (D.26), are characterized by the roots x 0 of the equation 0 = −x + f (x, 0, θ) or, equivalently, We divide the solutions of this equation into two families, namely, the trivial equilibrium x 0 = 0, for any η ∈ R and φ = πn, n ∈ Z, and the non-trivial ones obtained when φ = πn, n ∈ Z. Using (D.26), we compute and evaluate it at the two families of equilibria under study.
(ii) For non-trivial equilibria x 0 , the expression (D.29) amounts to ∂ x f (x 0 , 0, θ) = η sin(2x 0 + 2φ) and hence by Corollary D.5 the non-trivial solutions x 0 are asymptotically stable (respectively, stable) . We now consider the case |η| < 1 (respectively, |η| ≤ 1); in that situation the stability inequalities (D.27) always hold true but it remains to be shown that only one equilibrium exists. That claim is a consequence of the following lemma.
Proof of Lemma. Consider the function g(x) := η sin 2 (x + φ) − x. As g (x) = η sin(2x + 2φ) − 1, we have that if |η| ≤ 1, then g (x) ≤ η − 1 ≤ 0 for any x ∈ R. The function g(x) is hence a monotonously decreasing function and intersects the OX axis in at most one point. Since g(0) > 0 (recall that in this case φ = πn, n ∈ Z) and for any x > η we have that g(x) < 0, we conclude that g(x) intersects the OX axis in exactly one point, as required. Figure 1 illustrates the statement of Corollary D.7.

D.3 Fixed points of the reservoir map and their stability
In this section we consider the discrete time TDR, we characterize its fixed points and establish sufficient conditions for their stability which, as we will show, are analogous to the ones that we obtained for the continuous time case. We place ourselves in the autonomous regime, that is, I(t) = 0 in the time-delay differential equation (B.1) and I(t) = 0 N in the associated recursion (B.3). In this case, we can state the following proposition that shows that there is a bijective correspondence between the equilibria of the discrete and continuous time TDRs.
Proposition D. 9 The point x 0 ∈ R is an equilibrium of the time-delay differential equation (D.10) in autonomous regime, that is when I(t) = 0, if and only if the vector x 0 := x 0 i N is a fixed point of the N -dimensional discretized nonlinear time-delay reservoiṙ in autonomous regime, that is, when I(t) = 0 N . Proof. Suppose first that x 0 is an equilibrium of the time-delay differential equation (D.10) and hence satisfies x 0 = f (x 0 , 0, θ). In order to show that F (x(t − 1), 0 N , θ) = x 0 , we evaluate the components of the right hand side of (B.2) at x 0 and obtain as required. The proof of the converse implication is also straightforward using (B.2).
We now provide sufficient conditions for the stability of the fixed points of the type x 0 = x 0 i N described in Proposition D.9. The asymptotic stability (respectively, stability) of those fixed points is guaranteed whenever the connectivity matrix A(x 0 , θ) in (B.4) satisfies ρ(A(x 0 , θ)) < 1 (respectively, ρ(A(x 0 , θ)) ≤ 1)). Since it is not possible to compute the eigenvalues of A(x 0 , θ) for an arbitrary number of neurons N , we proceed by finding upper bounds of its spectral radius ρ (A(x 0 , θ)). This can be done with the help of the Gershgorin disks theorem (see for instance Corollary 6.1.5 in [9]) or by using a matrix norm ||| · ||| and noting that ρ(A(x 0 , θ)) ≤ |||A(x 0 , θ)|||. After a detailed study using all these possibilities we found that the best result is obtained by using the maximum row sum matrix norm |||A(x 0 , θ)||| ∞ defined in Section A, which allows us to formulate the following result.
Proof. We first recall that the connectivity matrix (B.4) of the discretized nonlinear TDR with N virtual nodes is given by is the first derivative of the nonlinear kernel f in (B.1) with respect to the first argument and computed at the point (x 0 , 0 N , θ), with ξ = log(1 + d) and d ∈ (0, 1] the Euler discretization step or, equivalently, the separation between the virtual neurons. We will use the notation f x0 := ∂ x f (x 0 , 0, θ) in what follows. We proceed by finding sufficient conditions on f x0 that guarantee that the spectral radius of A(x 0 , θ) is bounded above by 1. These conditions will be obtained by enforcing |||A(x 0 , θ)||| ∞ < 1 and by recalling that In the view of the rows of the matrix A(x 0 , θ), it is clear that |||A(x 0 , θ)||| ∞ is given by the sum of the absolute values of one of the rows with numbers 1, N − 1, or N . We can hence write It can be easily verified that this expression can be split into two cases, namely Additionally, by definition of the absolute value, we have two cases: and hence We now consider in detail all the possible combinations of cases that provide the conditions on f x0 that ensure stability by enforcing that |||A(x 0 , θ)||| ∞ < 1.
Notice that since f x0 ∈ [0, 1), then these cases reduce to: We finally write that in this case stability is guaranteed whenever On the other hand, the condition that defines (D.34a) amounts to Notice now that at the same time we require that |||A(x 0 , θ)||| ∞ < 1. Hence for the case (D.42a) we have We now consider the case (D.42b) and in an analogous way using (D.3) we have that If we put together the conditions for f x0 in (D.42b), (D.45), and (D.41) we obtain Case IV (D.34a), (D.35b), (D.36b).
On one hand, (D.34a), (D.35b), (D.36b) imply that (D.47) On the other hand, the case (D.34a) amounts to It can be easily verified that the condition defining (D.48b) is incompatible with (D.47) since the relation ξ holds for any ξ ∈ (0, 1] and N ∈ N. We hence conclude using the case (D.48a) that Hence, together with (D.47) this amounts to On the other hand, the case (D.34b) amounts to Hence, for the case (D.55a) we have We now consider the case (D.55b) and in an analogous way we have that Hence by (D.57), (D.55b) and (D.54) we can write where the last equality follows from the fact that − On the other hand, the case (D.34b) amounts to Hence for the case (D.60a) we have We now consider the case (D.60b) and in an analogous way we have that Finally, we put together all the non-empty intervals provided by the Cases I-VIII that guarantee that when f x0 belongs to them, then the fixed point x 0 is stable: Now, it is easy to see that and hence the condition (D.64) reduces to |f x0 | < 1.
Notice that the proof remains valid for the case of (not necessary asymptotic) stability in the statement of the theorem. In this case we require which results in the condition |f x0 | ≤ 1, as required.
The following theorem provides the characteristic polynomial of the connectivity matrix A(x 0 , θ) and an explicit expression for its spectral radius under some conditions. Another way to find upper bounds for this spectral radius would consist of using the Cauchy bound [10] of this polynomial. In our experience this approach produces mediocre results in comparison with the statement in Theorem D.10.
Proof. Define first B := 1 Φ A(x 0 , θ). We then write Let v = (v 1 , . . . , v N ) be an eigenvector of B with eigenvalue λ, that is, Bv = λv. This equality can be rewritten as If we assume that b N = 0 and λ = 1, this amounts to Consequently, the eigenvalues of A are given by the roots {λ 1 , . . . , λ N } of the polynomial equation If there exists some λ > 1, the expressions (D.66) show that the eigenvector v of B (or of −B if Φ < 0) can be chosen positive. In that situation Corollary 8.1.30 in [9] guarantees that ρ(B) = λ with λ > 1 the eigenvalue corresponding to v and hence ρ(A(x 0 , θ)) = |Φλ| as required.

E Robustness of the empirical tests with respect to the choice of nonlinear kernel
In this section we show the robustness of the empirical results in Sections 1.1 and 2 of the paper with respect to the choice of the nonlinear kernel used in the construction of the TDR. First, in Section 1.1 we carried out an experiment using the Ikeda kernel and a quadratic memory task that showed that optimal performance is obtained when the input mean and variance are tuned so that the dynamics of the reservoir takes place in the neighborhood of a stable steady state and making sure that multimodality is avoided. We have repeated here the same experiment but, this time, using the Mackey-Glass kernel. More specifically, we nonsider a TDR with N = 20 neurons d = 0.943, γ = 4.7901, η = 1.3541, and p = 2. As we explained in Corollary D.6, with these parameter values 5951 is an equilibrium that satisfies the sufficient conditions for asymptotic stability. In order to verify that the optimal performance is obtained when the RC operates in a neighborhood of that stable equilibrium, we study the normalized mean square error (NMSE) exhibited by a TDR initialized at x 0 = −0.5951 when we present to it a quadratic memory task. More specifically, we inject in a TDR under study an independent and identically normally distributed signal z(t) with mean zero and variance 10 −4 and we then train a linear readout W out (obtained with a ridge penalization of λ = 10 −15 ) in order to recover the quadratic function z(t − 1) 2 + z(t − 2) 2 + z(t − 3) 2 out of the reservoir output. The top left panel in Figure 2 shows how the NMSE behaves as a function of the mean and the variance of the input mask c. It is clear that by modifying any of these two parameters we control how far the reservoir dynamics separates from the stable equilibrium, which we quantitatively evaluate in the two bottom panels by representing the RC performance in terms of the mean and the variance of the resulting reservoir output. Both panels depict how the injection of a signal slightly shifted in mean or with a sufficiently high variance results in reservoir outputs that separate from the stable equilibrium and in a severely degraded performance. An important factor in this deterioration seems to be the multi modality, that is, if the shifting in mean or the input signal variance are large enough then the reservoir output visits the stability basin of the other stable point placed at x 0 = √ η − 1 = 0.5951; in the top right and bottom panels we have marked with red color the values for which bimodality has occurred so that the negative effect of this phenomenon is noticeable.
Second, in Section 2 we used a Mackey-Glass based reservoir to compare the empirical performance surfaces in terms of various parameters with that coming from the formula (C.16) that was obtained as a result of modeling the reservoir with an approximating VAR(1) process. In this section we have repeated the same exercise with an Ikeda based reservoir in order to show that the formula (C.16) produces in this case results of comparable quality. The outcome of this experiment are contained in Figure 3 where we represent the normalized mean square error as a function of the distance between neurons and the feedback gain η. The other fixed parameter values used are γ = 0.523 and φ = 0.3106; the reservoir was constructed using 20 neurons and we presented to it the three-lag quadratic memory task corresponding to the diagonal matrix Q with diagonal entries given by the vector (0, 1, 1, 1). The optimal output mask W out was computed using a ridge regression with λ = 10 −15 . As it can be seen in the figure, we restrict the values of the parameter η to the interval [0, 1] which ensures, using one of the results in Corollary D.7, that the TDR exhibits for each value of η a unique equilibrium (unimodality is hence guaranteed) that is always stable. The TDR is always initialized at that stable configuration.

Bimodality Unimodality
Variance of the reservoir output Influence of the reservoir output variance on the normalized mean square error (NMSE) in the 3-lag quadratic memory task NMSE NMSE

Mean of the reservoir output
Influence of the reservoir output mean on the normalized mean square error (NMSE) in the 3-lag quadratic memory task NMSE NMSE Influence of the input mask on on the normalized mean square error (NMSE) in the 3-lag quadratic memory task Influence of the input mask on on the normalized mean square error (NMSE) in the 3-lag quadratic memory task M ea n of th e in pu t m as k V a ri a n c e o f th e in p u t m a sk M ea n of th e in pu t m as k V a r ia n c e o f t h e in p u t m a s k Figure 2: Behavior of the performance of a Mackey-Glass based reservoir in a quadratic memory task as a function of the mean and the variance of the input mask. The modification of any of these two parameters influences how the reservoir dynamics separates from the stable equilibrium. The top panels show how the performance degrades very quickly as soon as the mean and the variance of the input mask (and hence of the input signal) separate from zero. The bottom panels depict the reservoir performance as a function of the various output means and variances obtained when changing the input means and variances. In the top right and bottom panels we have indicated with red markers the cases in which the reservoir visits the stability basin of a contiguous equilibrium hence showing how unimodality is associated to optimal performance.