## Introduction

In Greek mythology, a chimera is a hybrid creature composed of the components of more than one animal. Abrams and Strogatz1 aptly utilized the term ‘chimera’ to label a hybrid solution consisting of localized synchronized and unsynchronized dynamics to an otherwise homogeneous network of oscillators2. The systems showing chimera states are often also multi-stable, meaning that the system exhibits a synchronized or a chimera state depending on the initial conditions3,4.

The initial discovery of chimeras triggered a flurry of research into the theory behind these unusual states, ranging from heterogeneous complex networks, small and multiplex networks to networks considering non-pairwise interactions5,6,7,8,9,10 and from one-dimensional to three-dimensional systems11,12,13,14. Experimentally, chimeras have been observed in systems such as crystal light modulators, mechanical and chemical coupled oscillators15,16,17. Chimeras have been linked to different systems, ranging from biological and neuronal, to ecological and technological18,19,20.

Chimeras offer a modeling framework to study complex oscillating systems such as the brain, where coherent and incoherent dynamics are often found21,22,23. Chimeras have been previously linked to normal brain dynamics such as unihemispheric sleep in some mammals and birds, where half of the brain is active and showing asynchronized activity while the other half is asleep, in a synchronized state20,24,25,26. More pathologically, chimeras have been linked to the onset of epileptic seizures, where the brain region implicated in the onset is unsynchronized while the others brain regions are not27,28,29. Yet, whether there are truly chimeras in the brain, how such chimeras at the macroscale arise from the underlying microscopic dynamics of the neurons, and what their general relevance in the brain is—or more generally in biological systems—are all open questions.

We aim to train the connections weights of artificial recurrent neural networks (RNNs)—which allow for self-sustained dynamical behavior—to output a collective chimera that is otherwise embedded into the network. In particular, the neurons of the RNN can be seen as the microscale and the output of the network as the meso- or macroscale. We use state-of-art techniques in the mimicry of dynamical systems or behaviors in RNNs30,31,32,33,34,35. These techniques stabilize an otherwise chaotic RNN to mimic the complex dynamics of a chimera by determining the weights to achieve this. We specifically use the FORCE (First-Order Reduced and Controlled Error) method, which has been previously implemented to output different types of dynamics, ranging from human motion, chaotic systems30 and birds songs33. It has been applied to networks of continuous rate neurons30,36,37 and to networks of spiking neurons33,38,39,40,41. For example, Maslennikov and Nekorkin35 used FORCE training to train a recurrent network of rate neurons to generate spontaneous and elicited sequences similar to the ones proposed in ref. 33, which are based on discrete sequential states and different from the classical and continuous chimera dynamics considered here.

To summarize, we use the FORCE method to train a recurrent network of continuous rate neurons to output a chimera state. Indeed, we demonstrate that a chimera can be embedded into a RNN, and we show that its emergence is generic and robust to different biological constraints, such as the excitatory/inhibitory classification of neurons (Dale’s law) and the sparsity of connections in neural circuits. The RNN can also be trained to switching chimera states: every time the network receives a random pulse there is a switch between the synchronized and the unsynchronized groups of the embedded chimera.

## Results

### Chimera state

In order to train a RNN to display a chimera, we first selected a suitable supervisor. Following ref. 4, we used a system of Kuramoto oscillators consisting of two coupled populations of n = 3 identical oscillators (Fig. 1, Methods). Each oscillator is coupled with strength μ within the subpopulation and ν between subpopulations, with μ > ν (Fig. 1B). By using chimera-like initial conditions (see Supplementary Fig. 1), a chimera state is easily achieved where one sub-population synchronizes, while the other is asynchronously oscillating (schematically represented in Fig. 1A by six coupled metronomes.) The phases generated by the oscillators, θi(t) (orange/synchronous population) and ϕi(t) (purple/asynchronous population), will be critical for constructing the supervisor for the RNN (Fig. 1C–E and Supplementary Movie 1).

Prior to RNN training, we confirmed the simulated network was indeed displaying a chimera with two well-known measures: the mean phase velocity42, which captures the frequency of the oscillators for a given time period and the order parameter R2,4, which characterizes the synchronization of the system: for fully synchronized systems R = 1 and for asynchronous systems 0 < R < 1 (Methods). The mean phase velocity (Fig. 1F) shows distinct profiles for the two groups and with two different regimes: nodes from the asynchronous group oscillate faster than nodes from the synchronized group. The order parameter (Fig. 1G) also confirms distinct behaviors for the asynchronous group [0.4 < R(t) < 0.8] and for the synchronous group [R(t) = 1]. Thus, the two-population Kuramoto network displays a chimera state which will become the supervisor for the RNN.

### Recurrent neural networks can be trained to display chimeras

With the chimera solution in hand, we investigated if it could serve as a m dimensional supervisor to train a RNN to collectively display a chimera. First, as the oscillators’ phases are discontinuous, and wrapped around the interval [0, 2π), the unit circle transformations ($$\cos {\theta }_{i}$$, $$\sin {\theta }_{i}$$), and ($$\cos {\phi }_{i}$$, $$\sin {\phi }_{i}$$) were applied to the phases. With 2n oscillators, this results in a m = 4n dimensional supervisor for a RNN (throughout the paper we use n = 3 and n = 25). We use the FORCE method30 to train a RNN to autonomously mimic the time series produced by the supervisor. The training is considered successful if the network alone, unguided by the supervisor, can produce an identical chimera.

The RNN is a N autonomous dynamical system:

$$\tau {{{{{{{\boldsymbol{z}}}}}}}}^{\prime} =-{{{{{{{\boldsymbol{z}}}}}}}}+{{{{{{{{\boldsymbol{\omega }}}}}}}}}^{0}{{{{{{{\boldsymbol{r}}}}}}}}$$
(1)
$${{{{{{{\boldsymbol{r}}}}}}}}=\tanh ({{{{{{{\boldsymbol{z}}}}}}}}).$$
(2)

The network is initially sparsely connected with a set of static weights ω0 which initiates the neuron’s currents zi(t) into a high dimensional chaotic regime43,44,45 (Fig. 2A). During learning a low-rank perturbation ηd is added to the weight matrix ω0, where η and d are both N × m (Fig. 2B). The matrix d is used to decode the output of the network, with $${{{{{{{{\bf{d}}}}}}}}}^{\top }{{{{{{{\boldsymbol{r}}}}}}}}=\hat{{{{{{{{\boldsymbol{s}}}}}}}}}$$ (Fig. 2C). This output is simultaneously fed back into the network:

$$\tau {{{{{{{\boldsymbol{z}}}}}}}}^{\prime} ={{{{{{{\boldsymbol{z}}}}}}}}+({{{{{{{{\boldsymbol{\omega }}}}}}}}}^{0}+{{{{{{{\boldsymbol{\eta }}}}}}}}{{{{{{{{\bf{d}}}}}}}}}^{\top }){{{{{{{\boldsymbol{r}}}}}}}}={{{{{{{\boldsymbol{z}}}}}}}}+{{{{{{{{\boldsymbol{\omega }}}}}}}}}^{0}{{{{{{{\boldsymbol{r}}}}}}}}+{{{{{{{\boldsymbol{\eta }}}}}}}}\hat{{{{{{{{\boldsymbol{s}}}}}}}}}$$
(3)

The components of η are fixed and random whereas the components of d are learned with Recursive Least Squares (RLS), which minimizes the sum-squared difference between the network output $${{{\hat{{{{{\boldsymbol{s}}}}}}}}}$$, and the chimera supervisor s (see Supplementary Fig. 2). Training is considered successful when a constant value of the matrix d allows the network to mimic the dynamics of the chimera.

We found that a RNN with N = 1500 could indeed autonomously display a chimera dynamics, i.e., the chimera state with n = 3 as shown in Fig. 2D, G (see also Supplementary Movie 2). We found that smaller network sizes (N = 500 and N = 1000) did not learn this chimera state, while larger ones did as well (N = 2000). Note that the necessary N changes depending on the supervisor’s size (n = 3 or n = 25) and on the constraints we enforce in the RNN (see section FORCE method robustness for the specific N that we used in each case). This is in line with what was discovered in ref. 30, where for more complex supervisors, more neurons were required in a RNN for accurate dynamical fitting.

We refer to the chimera dynamics displayed by the RNN (e.g., Fig. 2D, G and Supplementary Movie 2) as an embedded chimera within the RNN. The chimera is embedded in the sense that a linear decoder (d) is used to extract the chimera state from the network as a whole with a linear combination of the firing rates (Fig. 2C, D). The phases of the individual oscillators can also be reconstituted with a Hilbert Transform (Fig. 2E). To confirm the network indeed learned the chimera shown in Fig. 1, we computed both the mean phase velocity and the order parameter (Fig. 2F, G), which are identical to those in Fig. 1.

### FORCE method robustness

Next, we investigated if chimeras states could be generically embedded within RNNs that enforced specific biological constraints. We considered three different constraints. We first enforced Dale’s law, which states that a neuron can be either excitatory or inhibitory, not both. In our model this translates to the final learned synaptic weights ω1 = ω0 + ηd being constrained by the excitatory/inhibitory nature of each neuron. If neuron i is excitatory (inhibitory), all of its outgoing connections will be positive (negative): $${\omega }_{1i}^{1},\cdots \,,{\omega }_{Ni}^{1} \, > \, 0$$ ($${\omega }_{1i}^{1},\cdots \,,{\omega }_{Ni}^{1} \, < \,0$$)(Fig. 3A, B). See Methods for more details. We found that the chimera from Figs. 1 and 2 could be reliably trained with Dale’s law enforced (Fig. 3C) using a RNN of N = 3500. This was confirmed with the mean-phase velocity (Fig. 3D) and Kuramoto order parameter (Fig. 3E). Thus, chimeras can be generically embedded by a RNN respecting Dale’s Law.

Second, biological circuits are often sparsely connected43,44,46,47. This constraint is not immediately satisfied by FORCE training as the perturbation to ω0 is low rank, which yields all-to-all connectivity. To constrain the network to sparse connectivity, we enforced sparsity to the post-learning weight matrix ω1 (Fig. 3F) as in ref. 33 and tested if a sparsely coupled RNN could still learn the chimera state. In particular, the matrices η and d were kept at a sparsity value of 90% during and after learning. See Methods for more details on the implementation. Post learning, both weight matrices ω0 and ω1 were observed to have the same Gaussian-like distribution and that ω1 was successfully kept at 82% sparse (Fig. 3G). The sparse RNN with N = 3000 was still able to learn how to embed a chimera (Fig. 3H–J).

Finally, we investigated if a RNN could learn more complex chimeras. The supervisor was changed to a larger chimera from two populations of 25 identically coupled Kuramoto oscillators (Fig. 4A and Supplementary Movie 3). The RNN could again successfully reproduce the chimera (Fig. 4B and Supplementary Movie 4). This was confirmed with mean-phase velocity and Kuramoto order parameter for the higher-dimensional supervisor (Fig. 4C, D). Collectively, these results imply that the different chimeras may be generically embedded into RNNs while satisfying some of the main biological constraints in real neural circuits.

### Characteristics of the recurrent neural network

With RNNs successfully and robustly trained to display an embedded chimera, we next investigated if such a chimera can be detected from the micro to the macro scale. To analyze the underlying dynamics of the RNN for chimera signatures, a Fast Fourier Transform (FFT) was applied to both the network output and the original chimera in the Kuramoto system, which allowed us to extract the fundamental frequencies. We observed identical frequencies in the Kuramoto network (Fig. 5A, B) as in the RNN output (Fig. 5C, D). The frequencies correspond to the frequency of the synchronous population, f1 = 0.021 and to the main frequencies of the asynchronous population (Fig. 5A, B), f1 = 0.021, f2 = 0.059 and f3 = 0.096, which can be qualitatively identified in Fig. 1C.

Next, we searched for these fundamental frequencies in experimentally accessible quantities across different scales. First, we considered a common reduction of high-dimensional neural data: Principal Component Analysis (PCA). This was applied to the set of individual firing rates in the network, with the 1st, 3rd, and 5th principal components shown in Fig. 5E. The three main frequencies found in the original Kuramoto system were found in the FFTs of these principal components (marked with dashed lines, Fig. 5F). Note that the 2nd, 4th and 6th components are orthogonal phase shifts of the 1st, 3rd and 6th components since both the cosine and the sine of the oscillators phases were used as supervisors.

The main three frequencies can also be obtained through the FFT of the firing rates of single neurons in our RNN (Fig. 5G, H). Depending on the neuron, all three frequencies are present or only a subset of them. This implies that the individual neurons can not be generally separated into a synchronized and an unsynchronized subgroup and characteristics of the embedded chimeras can permeate the different scales. This suggests that a possible origin of chimera states in the brain is state switching of neural components: the firing rates change dynamically between low and high values and these dynamics can collectively implement a decodable embedded chimera.

Next, we considered if truly macroscopic quantities somehow yielded information about the embedded chimera. To that end, the mean of the firing rates was computed with the resulting time series being highly irregular. Once again, the three main frequencies of the Kuramoto system were identified through the FFT of the spatial mean of the firing rates (Fig. 5I, J). These results show that if a chimera is embedded into a RNN, it may be detected from observations made at the micro-scale of neurons to the macro-scale of average activities.

### Switching chimera state

Inspired by uni-hemispheric sleep observed in some aquatic mammals and birds48,49, where one hemisphere is awake, showing asynchronous electroencephalographic (EEG) activity, while the other is sleep, showing synchronized EEG activity, we sought to determine if the RNN could be trained to switch the synchronized and unsynchronized populations with an input. In uni-hemispheric sleep, the two hemispheres switch states due to an external input48,49 (Fig. 6A). The RNN was trained to learn the switching chimera state: any time an external pulse c was provided to the network (Fig. 6, gray line), the synchronized group (from the network output) changed to unsynchronized and the unsynchronized to synchronized. The supervisor was generated with homogeneous Kuramoto oscillators (as in Fig. 1) and the pulse was random (with some constraints, see Methods). The input pulse is configured nominally to c = 0, and changes randomly to c = 10 or to c = −10 to switch the synchronized and unsynchronized populations in the embedded chimera (Fig. 6B, note that we only show the positive pulse). In particular, for c = 10, the embedded chimera would change $${{{\hat{{{{{\boldsymbol{\theta }}}}}}}}}$$ to the synchronized state and $${{{\hat{{{{{\boldsymbol{\phi }}}}}}}}}$$ to the unsynchronized one and vice-versa for c = −10. Two subsequent pulses could not have the same sign (see Methods for details).

The network was successfully trained to switch the synchrony profiles (Fig. 6B and Supplementary Movie 5) due to the external pulse, and thus a sufficiently strong perturbation to the system can induce a transition from one low-dimensional attractor (i.e., a low-dimensional projection of the RNN’s multidimensional attractor) to another. The RNN trained to switch chimera showed evidence of only two different low-dimensional attractors (for more information about the underlying dynamics or attractor of the trained RNN see Supplementary Note 2 and Supplementary Fig. 3). One attractor was defined by $${{{\hat{{{{{\boldsymbol{\theta }}}}}}}}}$$ being synchronized and $${{{\hat{{{{{\boldsymbol{\phi }}}}}}}}}$$ being unsynchronized, and the other attractor was defined by $${{{\hat{{{{{\boldsymbol{\theta }}}}}}}}}$$ being unsynchronized and $${{{\hat{{{{{\boldsymbol{\phi }}}}}}}}}$$ being synchronized (Supplementary Fig. 4). However, we found that for some pulses, the embedded chimera did not switch indicating that switching success may depend on the specific state of the system at the time of a pulse50. For those pulses where the embedded chimera did not switch, the following pulse switched the chimera state regardless of the state of $${{{\hat{{{{{\boldsymbol{\theta }}}}}}}}}$$ and $${{{\hat{{{{{\boldsymbol{\phi }}}}}}}}}$$. In other words, any time an unsuccessful pulse occurred, the −10 pulse would act as a +10 pulse, and the +10 pulse would act as a −10 pulse (Supplementary Fig. 5). We tested the RNN with only positive pulses and with only negative pulses and it switched states independently of the pulse sign (Supplementary Fig. 6). Altogether, it proves that it is the size rather than the direction/sign of the pulse what leads to a switch in the chimera state.

Finally, we sought to determine how the low-dimensional projection of the RNN’s multidimensional attractor (see Supplementary Fig. 3) would change in the case of the switching chimera (Supplementary Fig. 4). We analyzed the underlying activity by computing the PCA of the RNN firing rates (Fig. 6C, D). The neuronal dynamics themselves readily display a change after the external input. At the network level, this change of dynamics is manifested on the temporal evolution of the 3rd and 5th principal components. The first principal component does not change after the input, and we believe this is due to the fact it accounts for the main frequency of both groups (the synchronized and unsynchronized populations), which is identical for both as we have shown in the FFT of the embedded chimera state (Fig. 5D).

## Discussion

So far, chimera states in the brain have been studied from bottom-up approaches, using networks of individual neurons described with simple mathematical models and using specific coupling schemes27,28, which are not directly applicable to real biological systems. Some exceptions apply and chimera states have been studied using biological coupling schemes51,52 and on the human brain macroscale using in silico experiments and personalized brain networks26. Yet, the question how individual neurons could organize themselves such that a chimera state emerges on the macroscale—or more generally, how macroscopic chimeras can arise from the underlying dynamics in large complex networks—has remained unanswered up to now.

Our work shows that chimeras can emerge on the macroscale of a network as a result of interactions between the nodes without the necessity of high-order couplings (such as non-pairwise interactions, multilayer interactions, or time-varying interactions53) and without using a specific coupling or model. We demonstrated that this result is robust to different biological constraints such as the excitatory/inhibitory neural classification known as Dale’s law and the sparsity of connections in a neural circuit. An interesting result, is that the individual neurons do not show any modular structure, suggesting that chimeras states can emerge on the macroscale even if no community structure is present.

Inspired by the uni-hemispheric sleep observed in some aquatic mammals and birds48,49, which has been linked to chimera states24, we showed that a RNN can learn to switch the synchronized and unsynchronized populations with an input. Previously, a switching chimera was modeled using two coupled groups of heterogeneous oscillators, coupled to an external periodic signal, which determined the period of the switching chimera25. Here, we demonstrated how the training we used is generic as the oscillators can be homogeneous and the period of the switching chimera is not tied to the period of the pulses (which were randomly applied). Regardless of the pulse sign, a sufficiently strong pulse is enough to change the state of each population by switching between two attractors.

In summary, while it has been theoretically proven that RNNs can act as universal approximators of any dynamical system54,55, our results collectively show that RNNs can embed chimeras, generically, robustly, and even in the presence of relevant biological constraints. This suggests that chimeras have a general relevance in neuroscience and, potentially, in large networks in general.

## Methods

### Two coupled Kuramoto–Sakaguchi populations

The system studied in ref. 4 was used as a supervisor. It consists of two groups of n Kuramoto oscillators each. The phases of the oscillators for group 1 and group 2 are given by $${{{{{{{\boldsymbol{\theta }}}}}}}}={\{{\theta }_{i}\}}_{i = 1}^{n}$$ and $${{{{{{{\boldsymbol{\phi }}}}}}}}={\{{\phi }_{i}\}}_{i = 1}^{n}$$, which are governed by the following equations:

$$\frac{d{\theta }_{i}}{dt}=\rho -\mu \mathop{\sum }\limits_{j=1}^{N}\cos ({\theta }_{i}-{\theta }_{j}-\beta )-\nu \mathop{\sum }\limits_{j=1}^{N}\cos ({\theta }_{i}-{\phi }_{j}-\beta )$$
(4)
$$\frac{d{\phi }_{i}}{dt}=\rho -\mu \mathop{\sum }\limits_{j=1}^{N}\cos ({\phi }_{i}-{\phi }_{j}-\beta )-\nu \mathop{\sum }\limits_{j=1}^{N}\cos ({\phi }_{i}-{\theta }_{j}-\beta ).$$
(5)

The intrinsic frequency ρ = 1 is kept the same for all oscillators. The coupling between groups is given by $$\nu =\frac{1-A}{2N}$$ and within groups by $$\mu =\frac{1+A}{2N}$$, with ν < μ and 0 A 1. The coupling is the Kuramoto–Sakaguchi type: $$\sin (\varphi +\alpha )=\cos (\varphi -\beta )$$3,4,24, where φ refers to the phase difference between two oscillators and β = π/2 − α. Depending on the values of β and A the system undergoes different dynamics. To obtain a stable chimera, we simulated (4) and (5) with appropriate initial conditions (see Note 1 in the Supplementary Material) and we fixed β = 0.025 and A = 0.1. Figure 1A, B illustrate the network architecture and Fig. 1C depicts the chimera state. The equations were integrated using the Euler method with an integration step of dt = 10−3. Note that all equations are dimensionless.

#### Order parameter z

Two different metrics were used to characterize the chimera state: the order parameter and the mean phase velocity. The order parameter z is given by

$$| z| =\left|\frac{1}{n}\mathop{\sum }\limits_{j=1}^{n}\exp (i{\theta }_{j})\right|$$
(6)

and quantifies the synchronization of any oscillatory system with phases $${\{{\theta }_{i}\}}_{i = 1}^{n}$$. For synchronized systems z = 1 and for systems that are not fully synchronized, 0 z < 1. In a chimera state the oscillators of one group are complete synchronized while the other group is not synchronized.

#### Mean phase velocity Ωθ

The mean phase velocity42 for a given oscillator with phase θi is defined as

$${{{\Omega }}}_{i}=\frac{2\pi {M}_{i}}{{{\Delta }}t},$$
(7)

where Mi is the number of complete rotations around the origin performed by the ith oscillator during the time interval Δt = 1000. Having different mean phase velocities for the two populations is typical for chimera states11,56,57.

### Recurrent neural network equations and the FORCE method

To train a chimera state, the RNN was used:

$${{{{{{\bf{z}}}}}}}^{\prime} =-{{{{{{{\bf{z}}}}}}}}+G{{{{{{{{\boldsymbol{\omega }}}}}}}}}^{0}{{{{{{{\bf{r}}}}}}}}\\ {{{{{{{\bf{r}}}}}}}} =\tanh ({{{{{{{\bf{z}}}}}}}}),$$
(8)

The network will serve as the basis to output m dimensional dynamics $${\hat{s}}_{i}$$. The neuronal dynamics zi(t) can be seen as a neuronal current with firing rate ri(t). The parameter G scales ω0 a N × N static and sparse weight matrix drawn from a normal distribution with mean 0 and variance $$\frac{1}{N{p}^{2}}$$ (Supplementary Fig. 2A), where p is the sparsity degree (set to 90%). This sets the initial network dynamics into high-dimensional chaos. The variable G controls the chaotic behavior with G < 1 being the subcritical regime, and G > 1 being the supercritical regime58. The coupling was set in the supercritical regime (G = 1.5). The network was trained with the FORCE method30, which trains a second set of weights ω1 = Qηd (Supplementary Fig. 2B) such that eq. (8) can be rewritten as

$${{{{{{{\bf{z}}}}}}}}^{\prime} =-{{{{{{{\bf{z}}}}}}}}+(G{{{{{{{{\boldsymbol{\omega }}}}}}}}}^{0}+Q{{{{{{{\boldsymbol{\eta }}}}}}}}{{{{{{{{\bf{d}}}}}}}}}^{\top }){{{{{{{\bf{r}}}}}}}},$$
(9)

where the parameter Q scales η a N × m matrix drawn randomly and uniformly from [−1, 1]m. By increasing Q, the feedback applied to the network is strengthened. A value of Q = 1 was used for all simulations. The network output (9) is defined as

$$\hat{{{{{{{{\bf{s}}}}}}}}}={{{{{{{{\bf{d}}}}}}}}}^{\top }{{{{{{{\bf{r}}}}}}}}.$$
(10)

The FORCE method enforces

$$\hat{{{{{{{{\bf{s}}}}}}}}}\approx {{{{{{{\bf{s}}}}}}}},$$
(11)

by determining d in an online fashion with Recursive Least Squares (RLS)30. Online here means d is being computed as the network is being simulated. The RLS algorithm has an online solution for the optimal d, the one that minimizes the squared error e between the network output $$\hat{{{{{{{{\bf{s}}}}}}}}}$$ and the supervisor s (Supplementary Fig. 2D). The RLS updates to d at each time step n are

$${{{{{{{{\bf{d}}}}}}}}}_{n+1}={{{{{{{{\bf{d}}}}}}}}}_{n}-{{{{{{{{\boldsymbol{P}}}}}}}}}_{n+1}^{-1}{{{{{{{{\bf{r}}}}}}}}}_{n}{{{{{{{{\bf{e}}}}}}}}}_{n}$$
(12)
$${{{{{{{{\boldsymbol{P}}}}}}}}}_{n+1}^{-1}={{{{{{{{\boldsymbol{P}}}}}}}}}_{n}^{-1}-\frac{{{{{{{{{\boldsymbol{P}}}}}}}}}_{n}^{-1}{{{{{{{{\bf{r}}}}}}}}}_{n}{{{{{{{{\bf{r}}}}}}}}}_{n}^{\top }{{{{{{{{\boldsymbol{P}}}}}}}}}_{n}^{-1}}{1+{{{{{{{{\bf{r}}}}}}}}}_{n}^{\top }{{{{{{{{\boldsymbol{P}}}}}}}}}_{n}^{-1}{{{{{{{{\bf{r}}}}}}}}}_{n}}$$
(13)

where d0 = 0 and P0 = In/λ. The parameter λ controls the rate of the error30 and we set it to λ = 1. The parameter In is a N × N identity matrix.

As soon as RLS algorithm is switched on the firing rates go from a chaotic to a regular dynamics (Supplementary Fig. 2C). The FORCE method is successful if the network is able to reproduce the supervisor once RLS is off. The network under these conditions is completely autonomous, with the weight matrix ω0 + QηdT.

### Enforcing biologically motivated constrains

#### Dale’s law

The initial weight matrix is generated to satisfy the Dale’s law constraint, where the first half of the population of neurons only projects positive weights (excitatory): $${\omega }_{ij}^{0}\ge 0$$j [0, N], and the second half only negative weights (inhibitory): $${\omega }_{ij}^{0}\le 0$$j [0, N]. The second set of weights ω1 = ηd is constrained to project either positive or negative weights, as well. We achieve that by first defining η as η = η + η+ where η and η+ have only negative or positive entries, respectively. Second, we set d such that dij ≥ 0$$\forall i\in [0,\frac{N}{2}]$$ and dij ≤ 0$$\forall i\in [\frac{N}{2},N]$$. For more details refer to ref. 33 and to Additional Information, where the link to the code is available.

#### Sparsity

The learned weight matrix ω1 is sparse by setting a random subset of weights (90%) to be 0, and therefore not projecting an all-to-all connectivity. As well, we keep the auxiliary function P needed to perform the RLS algorithm sparse (by setting a random subset to be 0). For more details, refer to ref. 30 and to Additional Information, where the link to the code is available.

#### Switching Chimera

In order to train the RNN to learn the switching chimera, an external input pulse was added to Eq. (9):

$${{{{{{{\bf{z}}}}}}}}^{\prime} =-{{{{{{{\bf{z}}}}}}}}+(G{{{{{{{{\boldsymbol{\omega }}}}}}}}}^{0}+Q{{{{{{{\boldsymbol{\eta }}}}}}}}{{{{{{{{\bf{d}}}}}}}}}^{\top }){{{{{{{\bf{r}}}}}}}}+{{{{{{{{\boldsymbol{\omega }}}}}}}}}^{in}c,$$
(14)

where ωin is a N × N matrix drawn randomly and uniformly from [−1, 1]. The external pulse is c, which is set to c = 0 except when it randomly changes to either 10 or −10 for 100 time steps dt = 0.1 of the simulation. The inputs are configured such that two subsequent pulses with the same sign never occur. Also, any subsequent pulse has to happen at least after a time window of 5000 time-steps. Depending on the value of c, the chimera state changes as follows:

$$\,{{\mbox{if}}}\,c=10\to \left\{\begin{array}{l}{{{\hat{{{{{\boldsymbol{\theta }}}}}}}}}\,{{\mbox{changes to synchronized}}} \\ {{{\hat{{{{{\boldsymbol{\phi }}}}}}}}}\,{{\mbox{changes to unsynchronized}}}\end{array}\right.$$
(15)
$${{\mbox{if}}}\,c=-10\to \left\{\begin{array}{l}{{{\hat{{{{{\boldsymbol{\theta }}}}}}}}}\,{{\mbox{changes to unsynchronized}}} \\ {{{\hat{{{{{\boldsymbol{\phi }}}}}}}}}\,{{\mbox{changes to synchronized}}}\end{array}\right.$$
(16)