In our recent Review (Control of synaptic plasticity in deep cortical networks. Nat. Rev. Neurosci. 19, 166–180 (2018))1, we reviewed factors that influence synaptic plasticity in sensory and association cortex during reinforcement learning. We asked how the brain can implement powerful learning rules such as error-backpropagation (BP), a rule that is broadly used to train artificial neural networks to perform complex tasks. We described how feedback signals originating from the response-selection stage might tag those synapses at deeper network levels that are responsible for the outcome of actions, thereby gating their plasticity. In their Correspondence (Can neocortical feedback alter the sign of plasticity? Nat. Rev. Neurosci. (2018))2, Richards and Lillicrap suggest that feedback connections might not only gate but also change the sign of synaptic plasticity. Here we address this important assertion with a closer examination of the proposed learning rules. Our analysis indicates that the proposed learning rules indeed predict that feedback connections also influence the sign of plasticity.

Previous studies3,4 demonstrated that the changes in the strengths of synapses (∆wi,j) required by BP can be split into separable factors, which are available locally, at cortical synapses.

$$\Delta {w}_{i,j}=\beta \cdot {f}_{i}({a}_{i})\cdot {f}_{j}({a}_{j})\cdot RPE\cdot F{B}_{j}$$

Here, fi(ai) is a function of the presynaptic activity level ai, and fj(aj) is a function of the postsynaptic activity level aj. The RPE is the reward prediction error, which can be mediated by neuromodulators, such as dopamine5,6. Previous studies suggested that the RPE steers plasticity: it is positive if the outcome of an action is better than expected and negative if it is worse. In our Review, we suggested that FBj gates plasticity: it indicates the degree to which a synapse can be held responsible for the outcome of an action. If a neuron does not receive feedback from the response selection stage, its synapses will not change in strength. An important aspect of the learning rule of equation 1 is that it supports trial-and-error learning of new tasks. There is no need for external supervision that tells the network which units should be switched on and off, unlike several previous models7,8 that propose a role for feedback connections in modulating synaptic plasticity. Importantly, these biologically plausible plasticity rules, like AuGMEnT4, enable the learning of high-dimensional functions within a reasonable amount of time because they have better variance properties than previous models such as REINFORCE3,9. Indeed, AuGMEnT allows artificial neural networks to learn complex and nonlinear stimulus–response mappings with a learning speed that is faster than monkeys trained on the same tasks4,10. However, animals also learn the statistics of stimuli without a specific reward structure; that is, outside of the reinforcement learning contexts. Future studies could investigate whether the beneficial effects of feedback connections in the gating of plasticity also generalize to other learning schemes, such as unsupervised learning, and if and how such learning schemes can train the multilayered networks of the brain within a realistic time frame.

In their Correspondence, Richards and Lillicrap propose to also consider learning rules in which feedback connections not only gate plasticity but also can change the sign of synaptic plasticity. Figure 1 illustrates a neural network that selected action ‘a’ in the output layer. We ask how the learning rule affects the input to the association neurons y1–y4, which project to the output layer with excitatory (y1) or inhibitory connections (y2). Previous work3,4 concerning the learning rule of equation 1 suggested that feedback connections should be (or become) proportional in strength to the feedforward connections. Hence, the feedback connection from a back to y1 should be excitatory, and the feedback connection from a to y2 inhibitory (there are no long-range inhibitory connections between remote cortical areas, but long-range excitatory projections can provide inhibition to a cortical column by activating local interneurons; see below). Let us now examine whether this learning rule is compatible with a sign-changing effect of feedback connections.

Fig. 1: Feedback connections have different influences on the plasticity of inputs onto excitatory and inhibitory neurons.
figure 1

Neural network with three layers, in which action a has been selected in the output layer. Excitatory neurons and connections are shown in black and red, and inhibitory neurons and connections in cyan and green. FB+ , excitatory feedback connection; FB–, inhibitory feedback connection.

First, we consider the plasticity of connections onto excitatory neuron y1. When the outcome of action a is better than expected, all factors in equation 1 are positive, such that ∆w1,1 is also positive: the synapse strengthens. By contrast, if the RPE is negative, the connection would weaken. Units x3 and y3 are also active in the example, but unit y3 does not receive feedback from a and thus w3,3 is unaltered. Indeed, y3 did not provide input to a and is not responsible for the outcome. This is what we imply with a ‘gating effect’: feedback connections switch plasticity on or off.

These considerations should not distract from the possibility that the plasticity rule may work differently for inhibitory neurons. Unit y2 in Fig. 1 inhibits action a, and the selected action now sends a negative feedback signal back to y2. If the RPE is positive, this negative feedback signal causes ∆w1,2 and ∆w2,2 to be negative and excitatory connections w1,2 and w2,2 therefore weaken. However, w4,4 does not change, because it does not receive feedback, representing another instance of a gating effect. Hence, the learning rule of equation 1 actually conforms with the suggestion by Richards and Lillicrap at the level of the cortical column, because the plasticity rules for connections onto excitatory and inhibitory neurons differ.

The neural network model in Fig. 1 provides an abstraction of the complex interconnectivity between cortical neurons, where virtually all long-range connections between brain areas are excitatory. The long-range connections target pyramidal cells to excite a cortical column, particular inhibitory cell types to inhibit the column and other inhibitory cell types to cause disinhibition. It will be of great interest to test if the predicted learning rules for inhibitory and excitatory neurons can be confirmed in experimental work.