Meta-learning biologically plausible plasticity rules with random feedback pathways

Backpropagation is widely used to train artificial neural networks, but its relationship to synaptic plasticity in the brain is unknown. Some biological models of backpropagation rely on feedback projections that are symmetric with feedforward connections, but experiments do not corroborate the existence of such symmetric backward connectivity. Random feedback alignment offers an alternative model in which errors are propagated backward through fixed, random backward connections. This approach successfully trains shallow models, but learns slowly and does not perform well with deeper models or online learning. In this study, we develop a meta-learning approach to discover interpretable, biologically plausible plasticity rules that improve online learning performance with fixed random feedback connections. The resulting plasticity rules show improved online training of deep models in the low data regime. Our results highlight the potential of meta-learning to discover effective, interpretable learning rules satisfying biological constraints.


Introduction
Error-driven learning in multilayer neural networks was revolutionized by the error backpropagation algorithm [1], or backprop for short.In backprop, gradients or "errors" are propagated backward through auxiliary feedback pathways to compute parameter updates.
While practical, backprop has strong structural constraints that make it biologically implausible [2,3].A major limitation, known as the weight transport problem [4] states that transmitting gradients to upstream layers requires feedback connections that are symmetric with feedforward connections.Such symmetric connectivity is not known to exist in the brain.In an attempt to depart from the symmetry assumption, Lillicrap et al. [5] show that even random backward connections can transmit effective teaching signals to train the upstream layers.In this scenario, while the backward connections are fixed, forward weights evolve to align the teaching signals with those prescribed by the backprop algorithm.However, leaving out the we integrate all these features to address the weight alignment problem.Our analysis of the meta-learned plasticity rules demonstrates how they overcome the weight alignment challenge.Our approach further advances the use of meta-plasticity to understand how effective learning can emerge in biological neural circuits.

Feedback Alignment does not learn effectively in deep networks
Consider a fully-connected deep neural network f W parameterized by weights W , representing a non-linear mapping f W : x → y L from the network's input y 0 = x to the output y L , with L denoting the depth of the network.Each network layer is defined by where y is the activation for layer and σ stands for the non-linear activation function.
Given a dataset D train = (X train , Y train ), the model is trained in an attempt to find the set of weight parameters W = {W −1, |0 < ≤ L}, that minimize a loss function L(y L , Y train ).Each weight matrix W −1, is modulated by a teaching signal e derived from L. A commonly used method to compute e is to analytically calculate the modulatory signal e L in the output layer and then use a backward auxiliary network to transmit it to the upstream layers.This backward projection follows the relation e = B +1, e +1 σ (z ), where denotes element-wise multiplication and B = {B +1, |0 < < L} are the set of feedback connections.
In a gradient-based optimization algorithm, e L is defined as the derivative of the loss function L with respect to z L .This teaching signal is propagated backward up to the initial layer to modulate the weight parameters.A widely used scheme, backprop, uses feedback weights B BP +1, that are the transposes of the forward path's weights, to transport these modulating signals using Eq. 3. Subsequently, the forward weight parameters are updated by which represents a shared plasticity rule for all forward connections W −1, and θ is the associated learning rate.
To alleviate the biologically undesirable characteristics of the backprop algorithm, Ref. [5] proposed the "Random Feedback Alignment" approach, which departs from the assumption of symmetric feedback connections and instead uses fixed random backward connections B F A that are not bound to the forward weights.To distinguish between the two learning algorithms, we hereafter use the phrase "feedback alignment" to refer to the learning rule in Eq. 4 with fixed random B F A +1, and we use "backprop" to refer to Eq. 4 with B BP +1, = W T , +1 .For feedback alignment, the teaching signal e F A is not an exact gradient, but an approximating pseudo-gradient term.The resulting learning algorithm performs well on simple tasks and shallower networks.However, feedback alignment fails to reach good accuracy in deeper networks and is not as robust in the small data regime.In our empirical test with an online stream of data, feedback alignment only begins to effectively learn after about 2000 iterations, while backprop learns much more quickly (Fig. 1a).An alternative approach to using feedback connections that link consecutive layers is to create direct backward pathways [8].This change allows errors to be transmitted directly from the output layer to the upstream layers.This modification leads to improved performance compared to the feedback alignment method, speeding up the learning process and improving accuracy.However, it still falls short of the performance level of backpropagation (see Supplementary Fig. S1).In addition, Fig. 1b shows that the teaching signals transmitted through fixed feedback connections e F A are not aligned with the true gradients, e BP , computed by backpropagation at this stage of training.Figure 1: Feedback alignment learns poorly in deep models: Performance of benchmark learning schemes while training a 5−layer fully-connected classifier network on MNIST digits [27] with online learning.(a) Accuracy versus the number of training data for Feedback Alignment (FA) [5] and backprop (BP) [1] methods, compared to the discovered biologically plausible plasticity rule (bio) in Sec.2.2.2.(b) The angle α between the teaching signal e F A transmitted by the Feedback Alignment method and the corresponding backpropagated signal e BP .In all figures, each plot illustrates the mean over multiple trials.The shaded area represents the 98% confidence interval (see Methods).
These limitations indicate that the backward flow of information through fixed feedback is insufficient for online training in deeper models.This paper investigates modified plasticity rules to improve the trained model's performance.To that end, a meta-learning framework is adapted to explore a parameterized space of the plasticity rules.
2.2 A meta-learning approach for discovering interpretable plasticity rules.
Meta-learning is a machine learning paradigm that aims to learn elements of a learning procedure.This framework consists of a two-level learning scheme: An inner adaptation loop that learns parameters W of a model f W using a parameterized plasticity rule F(Θ) and an outer meta-optimization loop that modifies the plasticity meta-parameters Θ.The meta-training dataset contains a set of tasks {T ε } 0≤ε≤E , each consisting of K training data (X train , Y train ) and Q query data (X query , Y query ) per class.The former is used to train the model f W while the latter optimizes the meta-parameters Θ. Algorithm 1 details the meta-learning framework presented in this work.

end for
In each meta-iteration, also known as an episode, a randomly initialized model f W is trained on an online training data sequence.In other words, each adaptation iteration uses a single data point (x train , y train ) to update W .It is worth emphasizing that reinitializing weights W at each episode removes the learning rule's dependence on the weight initialization.The meta-learned plasticity rules are therefore optimized to learn a task starting from a randomly initialized weight matrix.In contrast, meta-optimizing initial weights will adapt meta-parameters Θ to the later stages of learning, which does not extrapolate to the training lifetime anymore.Moreover, when meta-learning a weight initialization in conjunction with a plasticity rule (e.g., [17]), it is not clear to what extent improvements in learning can be attributed to the weight initialization versus the meta-learned plasticity rule itself.
Each episode ε follows two objectives.The first is to quantify the model parameters W using a loss function L, iteratively, on each data point sampled from task T ε 's training set.Then, given a set of R candidate terms {F r } 0≤r≤R−1 , a parametrized plasticity rule is defined as a linear combination of individual plasticity terms, where Θ = {θ r |0 ≤ r ≤ R − 1} is the set of learning parameters shared across layers.This rule is used to update forward weights, W , in the network.The second objective, dubbed meta-loss, assesses the meta-parameters Θ by evaluating the loss function L on the query set of the same task T ε using the updated model f W .While meta-learning over the pool of plasticity terms F(Θ) yields an optimized set of meta-parameters, Θ, the resulting plasticity rule consists of too many terms which are difficult to interpret and understand and whose underlying mechanisms may overlap.Therefore, following Occam's razor, we introduce an L1 penalty on plasticity coefficients to select for a sparser set of plasticity terms.Mathematically put, the meta-loss is defined as where f W is the model updated in the adaptation loop and λ is a predefined hyperparameter.The regularization term in Eq. 6 is the L1 norm of the meta-parameters, leading the algorithm to favor simplicity in the plasticity model.While weights W are optimized using F(Θ), metaparameters Θ are updated by a gradient-based approach.Figure 2 summarizes the problem's configuration.O A e j J E q i c n v V z 8 z 2 s l K j h x U 8 r j R B G O R w 8 F C Y M q g n k V 0 K e C Y M U G m i A s q M 4 K c Q 8 J h J U u z N Q l O J N f n i b 1 g 7 J z V D 6 8 s o u V c z B C A e y A X V A C D j g G F X A J q q A G M L g D D + A Z v B j 3 x p P x a r y N R m e M 8 c 4 W + A P j / Q f M J Z h y < / l a t e x i t > W (0) < l a t e x i t s h a 1 _ b a s e 6 4 = " P D n l Q q 7 o v 2 l c 7 G B D r D I F w / G t q F c = " > A A A C B H i c b V C 7 T s M w F H X K q 5 R X g L G L R Y t U l i q p E D B W s D A W q S + p C Z X j u K

3
< l a t e x i t s h a 1 _ b a s e 6 4 = " L Q e i m / w q P V w j g J < l a t e x i t s h a 1 _ b a s e 6 4 = " R q N e Y d S R a o 3 J C W e N 2 3 w P j V U R 1 9  In the meta-optimization phase, the solution W of the inner loop is used to compute the loss on the query set of task T ε .Then, a gradient-based strategy explores the meta-parameter space to optimize the plasticity meta-parameters Θ (ε) .( 4) The plasticity rule F(Θ) is reconstructed using the updated meta-parameters Θ (ε+1) to guide the weight optimization in the next episode.This procedure is repeated until the meta-parameters converge.In the initial episodes, the unoptimized F(Θ) is unlikely to direct W to a solution.However, as Θ converges, F(Θ) discovers a new direction that may only partially adhere to the direction of the gradient.

Meta-learning the learning coefficients via backprop and feedback alignment establishes a benchmark
Before introducing new plasticity rules, it is necessary to establish the baseline performance for the current learning models for the learning task considered here.To this end, we use the meta-learning framework to optimize the learning rate, θ, in Eq. 4 for backprop and feedback alignment.Since, in these examples, the meta-learning model seeks to optimize the metaparameter rather than selecting one term over the other, the regularization coefficient λ in Eq. 6 is set to zero.Figures 3a -3c compare the performance of the two plasticity rules over 600 episodes.First, the reinitialized models f W are trained at each episode using an online stream of M × K = 250 data points.Then, the meta-accuracy and meta-loss are evaluated with the query data.Tracing the evolution of the plasticity coefficients in Fig. 3c shows that the meta-learning model converges after ∼ 100 episodes.After convergence, the model trained with feedback alignment is, on average, about 25% accurate in its predictions, whereas the model backpropagated via symmetric feedbacks reaches an approximate accuracy of about 70% (Fig. 3a).In addition, the backpropagated model reaches considerably lower loss values as shown in Fig. 3b.The comparison shows that the former is not adequately trained with an online data stream in the small data regime.This outcome is further supported by Fig. 3d, which illustrates the poor alignment of the modulating signals in feedback alignment with the backprop analogs.

Biologically plausible plasticity rules
The analysis in section 2.2.1 indicated a substantial performance gap between the backprop model and the pseudo-gradient rule with random feedback pathways early in the learning process.However, with the interrupted backward flow as the only distinction between the two rules, the error in the last layer and activations still maintain proper information.Intuitively, introducing new local combinations of these terms to the plasticity rule may restore information flow and improve performance.To that end, we define a set of candidate plasticity terms and use meta-learning to uncover combinations that enhance learning.Meta-learning helps in two ways: finding the optimized set of meta-parameters for the linear combination of candidate terms and selecting the dominant plasticity terms.While the former avoids cumbersome handtuning of the coefficients, the latter provides a tool for systematically studying the space of learning rules.We began by examining a set of R = 10 plasticity terms and combined them according to Eq. 5 to form the learning rule F pool (see Methods and below for definitions of these rules).Figure 4a-c illustrates the performance of the model.We set the initial values of the metaparameters {θ r } 1≤r<R to 0. As seen in Fig. 4a, the model's accuracy initially resembles that of the FA model, but as the meta-optimization continues, the accuracy improves, starting around 10 episodes.By about 300 meta-iterations, the accuracy approaches that of the BP model.This trend is also echoed in Fig. 4b, where the loss initially follows that of the FA learning model but then declines and eventually becomes similar to that of the BP method.In Fig. 4c, it is demonstrated that the alignment angles of the teaching signals with their BP counterparts are improved compared to the FA model, seen in Fig. 3d. Figure 4d shows that the coefficients for all but 3 terms converge toward zero after about 600 episodes.Those three terms are a pseudo-gradient rule (F 0 ), a Hebbian-like plasticity rule (F 2 ), and Oja's rule (F 9 ).Selecting these three terms and omitting the others gives a simpler plasticity rule of the form where Θ = {θ 0 , θ 2 , θ 9 } is the set of plasticity meta-parameters.F bio performs similar to the F pool (see Supplementary Fig. S2) and significantly improves the performance of the feedback alignment method in the low data regime (Fig. 1).
While the meta-learning successfully discovers F bio , it is important to interpret the plasticity rule and understand how it leads to improved learning.F bio consists of three components: a pseudo-gradient term, a Hebbian-style error term, and Oja's rule.In what follows, we study the latter terms separately with the pseudo-gradient term to unveil the underlying reason behind their performance.

Hebbian-style error term
Motivated to understand the Hebbian-style error-based learning term in Eq. 7, we rerun the model using a plasticity rule that only includes the modified Hebbian term and the pseudogradient term, but omits the third term In Fig. 5, the meta-learning algorithm is used to optimize the coefficients θ 0 and θ 2 , which are initialized to 10 −3 and zero, respectively.Comparing the accuracy and the loss plot to F bio 's performance (Fig. S2) shows that while F eHebb demonstrates a significant improvement over F 0 via feedback alignment, it is yet to reach that of F bio .Despite this, the teaching signals of F eHebb are better aligned with the backprop direction than F bio 's (Fig. S2), which indicates that the Hebbian error term is the driving force behind aligning the teaching signals in F bio .6a shows a model solely trained with the F 0 via feedback alignment.In this scenario, the information from B 2,1 flows to W 0,1 through Eq. 3, which is then propagated to W 1,2 after the forward pass.This configuration updates W 1,2 to align the modulator vector e 1 with the backprop counterpart.Nonetheless, this machinery does not sufficiently align the modulating signals when applied to deeper networks with fewer training iterations.In the diagram on the right, the last layer is updated with an additional Hebbian-style plasticity term F 2 , while the first layer is trained with vanilla F 0 rule via feedback alignment.Once again, information from B 2,1 flows into W 0,1 .However, this time, F eHebb introduces an auxiliary channel to flow the information from B 2,1 to W 1,2 .Finally, the forward propagation through the network implicitly transmits the information from B 2,1 to W 1,2 .The modified rule F eHebb establishes an explicit supplementary means to communicate between B 2,1 and W 1,2 , boosts the alignment of e 1 , and improves the model's performance.Note that the mechanism in F 0 needs two learning iterations to transmit information from B 2,1 to W 1,2 ; information from W 0,1 propagates to W 1,2 only after y 1 is computed with the updated W 0,1 .Meanwhile, F 2 does this in the same iteration, carrying out expedited learning.
< l a t e x i t s h a 1 _ b a s e 6 4 = " c F / 2 0 D 3 9 F M 9 2 X 3 W H 5 5 + A 1 t P C m r Q = " > A A A D z n i c f V L b a t t A E N 1 Y v a T u z W n 7 1 p e l T i E p x k i m 9 E I x h P a l j y 7 U S U A S Z r U a y 4 v 3 I n Z X T h w h + t r P K H 1 r / 6 h / 0 5 X t Q u Q k H R A a n T O z c 7 R z k p w z Y 3 3 / z 0 7 L u 3 X 7 z t 3 d e + 3 7 D x 4 + e t z Z e 3 J s V K E p j K n i S p 8 m x A B n E s a W W Q 6 n u Q Y i E g 4 n y f x T z Z 8 s Q B u m 5 F e 7 z C E W J J N s y i i x D p p 0 n u 1 H i e K p W Q r 3 K q G a l I N q f 9 L p + n 1 / F f h q E m y S L t r E a L L X + h m l i h Y C p K W c G B M G f m 7 j k m j L K I e q H R U G c k L n J I P Q p Z I I M H G 5 k l / h l w 5 J 8 V R p 9 0 i L V + j l j p < l a t e x i t s h a 1 _ b a s e 6 4 = " I j L D q V M e D i q X z J n n I n k 0 < l a t e x i t s h a 1 _ b a s e 6 4 = " I j L D q V M e D i q X z J n n I n k 0 k R y G 6 / 5 e q s l Z j 0 k J G r t x w 4 F z H 1 4 d c I j L b l B 9 q K q 2 c 3 W w 7 e G r y f G g H 7 z p v / 4 y 6 B 5 9 3 P h 7 F z 1 H L 9 A B C t B b d I Q + o x E a I 4 o u 0 A / 0 C / 3 2 R t 7 C q 7 x v 6 9 L W z q b n K W q E 9 / 0 v j 9 B F e A = = < / l a t e x i t > The first layer is updated with the rule F(Θ) = θ 0 F 0 via feedback alignment, while the second layer uses . The blue arrows depict information propagation through the forward and backward paths.The communications between feedback and feedforward pathways are represented with red arrows.
To corroborate the argument above, we consider a 3−layer network trained with F 0 rule via feedback alignment and inspect the effect of adding the error-based Hebbian-style plasticity term F 2 on the alignment angles in different layers.To that end, rather than sharing the same learning rule across the network, each layer is updated using one of the F 0 rule via feedback alignment or F eHebb rules.Table 1 determines that adding the Hebbian error term to the weight update reduces the alignment angle α between the pre-synaptic error and its backprop analog.A more detailed discussion can be found in Supplementary Notes. .Since e 0 is a synthetic error, the effect of the F eHebb on W 0,1 alone has been excluded.The model is trained for 500 episodes, and the computed angles are averaged after a burn-in period of 100 episodes.
For a more precise, mathematical intuition of the effects that F eHebb has on weights, we show in Supplementary Notes that, in a linear network model under reasonable approximating assumptions, for layers, = 1, 2, . . ., L − 1.Thus, the term e T e −1 in F eHebb pushes W −1, toward the transpose of B , −1 , resulting in faster alignment of the modulatory signals with the backprop algorithm's error vectors and more efficient learning.

Oja's rule
Eq. 7 proposes a plasticity rule to train deep networks using fixed feedback matrices.Above, we demonstrated that the Hebbian-style learning term improves the trained model's performance by improving the modulatory signals' alignments with the backpropagated analogs.Here, we look at the remaining plasticity term in Eq. 7: Oja's rule, a purely local learning rule that updates the weights based on its current state and the local activations in the forward path.
To this end, we redefine the plasticity rule as a linear combination of the pseudo-gradient term and Oja's rule We initialize θ 0 to 10 −3 and θ 9 to zero and employ Alg. 1 to optimize the set of metaparameters Θ. Figures 7a and 7b illustrate that adding Oja's rule to the pseudo-gradient term enhances the model's accuracy when backward connections are fixed.Figure 7c presents the angles between the teaching signals ensued by Eq. 9 and the corresponding backpropagated ones.While the accuracy and loss are significantly improved, contrary to expectations, Oja's rule does not substantially reduce the alignment angles (Fig. 7c).In fact, alignment angles are only slightly smaller when using Oja's rule compared to using pure FA, as seen by comparing Fig. 7c to Fig. 3d.This contrasts with alignment angles for F eHebb and F bio , which are greatly reduced in deeper layers compared to F Oja (compare Fig. 7c to Figs. 5c and S2c).Inspecting Fig. 7 suggests that rather than helping to align the modulating signals, Oja's rule helps by entirely circumventing the backward path.Oja's rule implements a Hebbian learning rule subjected to an orthonormality constraint on the weights [18].In Eq. 9, y −1 and y denote post-nonlinearity activations (as stated in Eq. 2), resulting in the F 9 plasticity rule to implement a non-linear version of Oja's rule.When trained iteratively, this non-linear variation implements a recursive non-linear algorithm for Principal Component Analysis [28,29].Previous studies on the convergence of Oja's rule have shown that for a compression layer, where dim(y −1 ) > dim(y ), rows of the weight matrix (W −1, ) 1 , . . ., (W −1, ) dim(y ) will tend to a rotated basis in the dim(y )−dimensional subspace spanned by the principal directions of the input y −1 [30].
We demonstrate that incorporating Oja's rule into Feedback Alignment improves feature map extraction in the forward path through unsupervised learning, despite F Oja not recursively applying pure Oja's rule.By analyzing the continuous-time differential equation corresponding to the Oja's learning rule, Williams [30] and Oja [29] establish the stability limits for this rule.In a compression layer, the fixed point of Oja's rule is a stable solution if W −1, W T −1, = I.This conclusion can be used to derive a proximity measure [31,32,33] of the estimated W −1, to a stable solution of Oja's rule in the presence of non-linear activations.The error where can define this measure.Figure 8 studies this orthonormality measure in models trained with different plasticity rules.Results show that using Oja's rule will render the weight matrices increasingly orthonormal, reducing the correlation in weight rows and improving the feature extraction in these layers.These findings indicate that introducing Oja's rule alone can help with the problem of slow learning caused by random feedback connections.The architecture of a classifier network includes initial layers that act as feature extractors, creating hidden representations for the final layer.This last layer, dubbed predictor, maps the hidden feature representations to the target class for the given input image.To improve the classifier's performance, a plasticity rule that enhances feature extraction in the earlier layers is beneficial.However, this rule has no grounds to positively impact the predictor layer's performance.Despite this, for comprehensiveness, we also applied the plasticity rule F Oja to the final layer and found no detrimental effect on the model's performance.Ultimately, rather than improving the alignments, F Oja provides embeddings that facilitate more effective learning in the last layer.

Discussion
Despite the dominance of the backpropagation algorithm as the primary technique to train deep neural networks, its biological plausibility remains a significant ground for contest [2,3].
In particular, the presence of feedback synaptic projections that are precisely symmetric to the forward projections is not biologically realistic.Previous work [5] showed that learning can be achieved without this symmetry using feedback connections that are randomly sampled, not tied to the forward path, and fixed throughout the training process.While a breakthrough, this method is susceptible to diminished performance when training deeper networks or using smaller batch sizes [6,7].The latter is a challenge for online learning.
A recent body of work attempts to improve learning through asymmetric feedback connections.They either rewire fixed feedback connections, use plastic feedback connections that are updated through an auxiliary plasticity rule, or impose partial symmetry in the backward network [8,17,20,19,9].Our work accelerates the learning process by enhancing the rules that govern neural plasticity while transmitting teaching signals through fixed connections.Our proposed rules for plasticity are based on biologically motivated learning principles, like Oja's rule, or have been inspired by them, such as the error-based Hebbian rule.A linear combination of these terms yields a parameterized learning rule.To overcome the arduous hand-tuning of these hyper-parameters, we use a meta-learning approach that systematically explores the pool of candidate plasticity rules.This approach consists of an inner loop that learns a task and an outer loop that updates the plasticity coefficients.The inner loop always starts from randomly initialized weights, so the model must learn to learn from scratch.Moreover, the inner loop learns from an online stream of training data, simulating real-time learning in the brain.
To assure interpretability of our meta-learned learning rule, we expressed the rule as a linear combination of individual plasticity terms, imposed an L1 penalty on the coefficients, and used meta-parameter sharing between all update rules.Many terms in the pool of plasticity rules can be redundant and employ identical or overlapping mechanisms but only differ in their efficiency, i.e., computational cost or the number of required learning iterations to operate.Employing an L1-penalized meta-loss decreases the count of plasticity terms that work in parallel.Additionally, while sharing the same meta-parameters across layers may limit the model's freedom in learning, it is a vital component for discovering a global learning rule, leaving the door open to investigate the revealed terms.
Using this meta-learning approach, we discover two plasticity rules that accelerate learning through fixed feedback connections.The first, an error-based Hebbian rule, combines the errors of pre-and post-synaptic layers to update forward projecting weights.The second rule, known as Oja's rule, combines pre-and post-synaptic activations with the connection's current state to update weights.We investigated each plasticity rule, its underlying mechanism, and how it contributes to learning, revealing two distinct mechanisms behind them.First, the Hebbian-like error term improves performance by modifying the flow of information through the backward path.It introduces an auxiliary channel to communicate information about the backward connections to the forward weights.As a result, it accelerates learning by better aligning modulating signals with the ones transmitted through a symmetric feedback connection.Ultimately, the modified plasticity alters the training to resemble backpropagation.Unlike the Hebbian-like rule, Oja's rule does not directly affect the flow of the feedback signals.Instead, it acts only on the forward path, implementing an unsupervised learning scheme that extracts feature maps independently of the labels and loss.The updated weight rows approximate an orthonormal basis in the subspace spanned by PCA eigenvectors of the pre-synaptic activations [29].The strengthened signal separation capabilities in the earlier layers improve predictions made by the output layer.
While synaptic plasticity in the brain is mediated by a vast array of biophysical processes, the changes to a single synaptic weight largely depend on the activity of its pre-synaptic and post-synaptic neurons and the current weight, a property known as "local" plasticity.For the plasticity rules used in our study (with the exception of Oja's rule), weight updates depend on activations from a forward pass and error signals from a backward pass.Since these quantities were used to update the forward projecting weights, this raises the question of whether the plasticity rules are truly local.The answer to this question depends on the biological interpretation of the forward and backward passes.
Under one interpretation, separate populations of neurons encode the forward and backward passes, i.e., the neurons encoding e are distinct from those encoding y .Under this interpretation, the plasticity rules used in this study are not strictly local.
Under another interpretation, forward activations and backward errors are represented by the same neural populations, i.e., the same neurons encode e and y .Under this interpretation, all of the plasticity rules used in this study are local.There are several models for how this multiplexing of forward and backward signals could be achieved (see [2] for a review).For example, activations and errors could be represented at separate points in time by the same neurons.
Alternatively, recent work hypothesizes that activations and errors are encoded separately in the basal and apical dendrites of the same cortical pyramidal neurons [34].Along similar lines, a growing body of work posits that activations and errors are multiplexed by the distinction between bursts and single action potentials, which are communicated separately by synaptic projections onto the soma versus apical dendrites of pyramidal neurons [35,36,37].The dependence of synaptic plasticity on the morphological site of the synaptic contact and on the type of spiking (bursts versus individual spikes) is well established in experiments [38,39,40,41,42].Under these models, established biophysical properties of cortical synapses can produce plasticity rules like ours that multiplex forward and backward propagating information to update weights.Networks in [37] rely on weight decay to approximately align forward and backward weights [11], while some networks in [34] rely on random feedback alignment.Hence, our meta-learned plasticity rules could improve learning in those models.
Our meta-learning approach isolated three plasticity terms: a backprop-like rule (F 0 ), Oja's rule [18] (F 9 ), and a rule we refer to as eHebb (F 2 ).Possible biological implementations of Oja's rule and the backprop-like rule have been studied in great depth in previous work [34,2,3,37].The eHebb rule could be implemented in a similar way to the backprop-like rule.For example, under the model in [37], eHebb would change synaptic weights in response to the co-occurrence of pre-and post-synaptic bursts.Plasticity is strongly mediated by firing rates and intracellular calcium [43,44], both of which are elevated during bursts.
As the eHebb's mechanism tends to align modulating signals with the symmetric counterparts, its performance may at best match that of backprop.However, as Oja's rule does not aim to imitate backprop, its performance is not bounded by that of backprop, and hence it can also be used to enhance learning in symmetric feedback models.For instance, we realized that adding Oja's plasticity rule to the gradient-based learning term accelerates learning for poorly initialized networks.This observation explains why the improved performance in the fixed feedback model may outperform learning in the symmetric case.A similar concept was used in the earlier works to initialize internal representations of the neural networks [33].However, that work used weights preprocessed by Oja's rule to start gradient-based learning rather than using both terms simultaneously as the plasticity rule.Hence, our results demonstrate the utility of the proposed meta-learning approach as a tool for combining different learning terms as a single parameterized learning rule.
We used meta-learning to find plasticity rules that can learn effectively under the biologi-cally relevant setting where forward and backward weights are not explicitly aligned.But our meta-learning technique can be applied more broadly to identify plasticity rules that overcome other biological constraints in various contexts and models.For instance, our study only focused on plasticity in forward connections; however, backward projections in the brain can also exhibit plasticity.Our meta-learning approach can be extended to discover plasticity rules for backward connections in such settings.Another interesting future direction is to meta-learn the architecture of the feedback pathways instead of (or in addition to) the plasticity parameters.
That is, to simultaneously provide both direct [8] and regular [5] feedback pathways and allow the meta-learning algorithm to pick the most efficient path to carry the teaching signals to each layer.
In another direction, our meta-parameter sharing approach could be partially relaxed without learning a new plasticity rule for each connection.For example, one could consider a network with several neural populations and a shared plasticity rule for each pair of populations.This approach could help understand the role of distinct neuron types and populations in biological circuits.
We focused on meta-learning biologically plausible plasticity rules, but our approach can also be applied to discover learning rules that satisfy other constraints or optimize other meta-loss functions.For example, the approach can be used to find learning rules that can be implemented in non-standard hardware like neuromorphic chips or optical networks, or to discover learning rules that minimize energy consumption or other factors.
In summary, we developed and tested a meta-learning approach designed to produce simple, interpretable plasticity rules that can effectively learn on new data.First, using randomly initialized weights on each iteration of the outer loop (instead of meta-learning the initialization) and using online learning in our inner loop encouraged plasticity rules that can perform online learning from scratch.Secondly, meta-parameter sharing yielded a vastly smaller set of learned plasticity rules compared to learning a plasticity rule for each synapse.Finally, an L1 penalty on plasticity coefficients promoted sparsity within the learning rule, ultimately yielding a small set of plasticity terms that are more readily interpreted.Our results demonstrate the utility of this approach for discovering and interpreting plasticity rules.Taken together, our work opens new avenues to the application of meta-learning for discovering interpretable plasticity rules that satisfy biological or other constraints.In the fixed feedback pathway problem, the weights and feedback connections are initially set to random values that differ from each other.Both symmetric and fixed feedback models utilize the Xavier method [45] to re-initialize forward and backward connections at the start of each meta-learning episode.

Models
In Figs. 4, 5, 7, and 8, and Tab. 1, we set the initial value for the learning rate θ 0 of the term F 0 to 10 −3 and set all other hyper-parameters to zero.
All plots depict the mean outcome over 20 trials, each with different initial weights and feedback matrices.The shaded region in the loss, accuracy, and meta-parameters plots illustrates the 98% confidence interval, determined through bootstrapping with 500 samples.

Candidate learning terms
Section 2.2.2 presented a plasticity rule that improves the model's performance in the presence of fixed random feedback connections (Eq.7).We employed the meta-learning framework described in section 2.2 to explore a set of local learning rules to discover such a plasticity term.This set of terms is defined as The rules above are local in the sense that the updates to the j, kth entry of W −1, depend only on the kth entry of e −1 and y −1 , the jth entry of y and e , and the j, kth entry of W −1, .This notion of locality assumes that errors and activations are encoded in the same neurons (see Discussion).Even under this constraint of locality, there is an unlimited number of possible plasticity rules to choose from.To form the list above, we first considered all quadratic combinations of activations and errors except we omitted pure Hebbian plasticity (y y T −1 ) because we found that it leads to unstable network dynamics (a blowup of activations).Instead, we replaced it with Oja's rule F 9 , which adds a stabilizing term onto pure Hebbian plasticity.Additional terms were added to test the viability of higher order plasticity terms.
Computing the learning terms F 1 , F 2 , F 4 , F 6 , F 7 , and F 8 requires a pre-synaptic error term.In order to update the weights in the first layer W 0,1 , where there is no pre-synaptic error, we define a synthetic error e 0 using Eq. 3 and the activation function in Eq. 11, such that e 0 := B 1,0 e 1 (1 − exp(−βy 0 )).

Meta-Training
Section 2.2 presented a meta-learning framework for swiftly exploring a pool of plasticity terms and uncovering combinations that exceed the performance of the existing plasticity rule.We demonstrate this by training a classifier network, which performs a 5−way classification on 28 × 28 images.The cross-entropy function evaluates the loss in the adaptation loop, whereas the meta-loss is determined by Eq. 6. While, in principle, any optimization algorithm, such as evolutionary methods, can be used to optimize Θ, the algorithm presented in Alg. 1 uses ADAM [46], a gradient-based optimization technique, with a meta-learning rate of 10 −3 .
In the meta-optimization phase, this gradient-based optimizer differentiates through the unrolled computational graph of the adaptation phase.Thus, the non-linear layers are double differentiated, once to compute e L and a second time by the meta-optimizer.This arrangement will only allow a two-times differentiable non-linear layer, which prohibits using the Rectified Linear Unit, ReLU, as the activation function σ.Instead, we use the softplus function (Eq.11), a continuous, twice-differentiable approximation of the ReLU function.In Eq. 11, parameter β controls the smoothness of the function.Furthermore, the L1 norm used in the meta-loss (Eq.6), defined by the absolute value function, is not continuously differentiable at every point.However, it is commonly used in deep learning in conjunction with stochastic gradient descent (SGD) [47].In PyTorch and other deep learning frameworks, the derivative of the absolute value function is typically defined as zero at zero.
In the present examples, each task contains M = 5 labels.Consequently, assembling a diverse set of 5−way classification tasks requires a database with a large number of classes.Thus, databases such as MNIST [27], which only has ten classes, are unsuitable for proper meta-training.On the other hand, in each episode, the classifier f W is reinitialized with random weights W .Therefore, the task should contain enough data points per class to train f W adequately. Hence, databases such as Omniglot [48] with only 20 data points per character designed for few-shot learning (e.g., with meta-optimized W ) are impractical in the present framework.In the current work, meta-training tasks are made from the EMNIST database [49].This database contains 47 classes, making it a good candidate for the meta-learning framework.Each task contains K = 50 training and Q = 10 query data points per class.
Notably, the use of K = 50 training data per class with M = 5 classes in each episode means that the metalearned plasticity rule needs to train a randomly initialized network with only 250 training data points.Hence, our models are in a low data regime without the benefit of pre-trained weights that are often used for few-shot learning.

Code Availability
The PyTorch-based implementation and script files used to generate the results in this paper will be publicly accessible at https://github.com/NeuralDynamicsAndComputing/upon publication.

Supplementary Material
Performance of DFA Figure 1 illustrates that the Feedback Alignment model [5] is less effective than the backprop model when training deep networks with a continuous data stream.To be more precise, the backprop model begins learning immediately at the start of training, while the Feedback Alignment model takes around 2000 training data points before it starts to learn.Additionally, the rate of learning for the Feedback Alignment model is slower.
In an attempt to improve the Feedback Alignment model's performance, the Direct Feedback Alignment (DFA) method [8] proposed altering the backward connections to directly transmit errors from the output layer y L to the upstream layers y .The modulating signals in this modified model are calculated as e = B L, e L σ (z ), with e L = ∂L ∂z L .
In this formulation, B L, ∈ R dim(y )×dim(y L ) , where dim(y ) represents the dimensionality of the activation y .As shown in Fig. S1, incorporating direct feedback connections to the Feedback Alignment method speeds up learning, and the model's accuracy improves after 1000 training data points.However, even with this modification, the network's performance is still lower than that of the backprop model.Figure S1 further compares the DFA model with the Feedback Alignment model trained with the F bio plasticity rule (Eq.7) and shows that the improved plasticity rule outperforms the DFA model.connected classifier network on MNIST digits [27] for a 10-way classification task.The plot demonstrates accuracy versus the number of training data for Feedback Alignment (FA) [5], Direct Feedback Alignment (DFA) [8], and backprop (BP) [1] methods, compared to the discovered biologically plausible plasticity rule F bio (Eq.7).

Performance of the F bio
Fig. S2 demonstrates the classifier's performance with F bio within 600 iterations of the metaoptimizer.Comparing the loss and accuracy of F bio with F 0 via feedback alignment in Fig. S2a and Fig. S2b, respectively, shows a significant boost in learning through F bio .Figure S2c further shows improvement in the alignment of the modulating signals with those of the backprop.These angles are reduced the most in the deeper layers.Lastly, Fig. S2d illustrates the progress of the meta-parameters.We observe that the plasticity coefficients converge in about 200 episodes.Comparison between (a) meta-accuracy and (b) meta-loss of F bio rule with F 0 via feedback alignment (FA) and backprop (BP), (c) alignment angles between modulating signals of F bio and backprop, and (d) convergence of the plasticity meta-parameters.While the term F bio was discovered by regularizing the meta-loss with the penalty term in Eq. 6 (See Methods), λ is set to zero for the illustrations in this figure for the uncovered rule.

Data flow in F eHebb
Table 1 demonstrates the effect of the Hebbian-like error plasticity term (Eq.8) on the alignment angles of the modulator signals.Here, we explain these improvements by illustrating F eHebb 's influence on the feedback pathway's interactions with the forward path.To set the baseline, Fig. S3a employs the plasticity rule to train the network (row 1 in Tab.1), where e is transmitted through random feedback pathways.First, information from backward connections B 2,1 and B 3,2 (through B 2,1 ) flows into W 0,1 via Eqs. 3 and S.1.Similarly, information from B 3,2 flows into W 1,2 during the weight update.Then in the forward pass, information from W 0,1 and W 1,2 are propagated forward into W 2,3 .Table 1 shows that this flow of information does not sufficiently adjust W for a good alignment of the teaching signals, particularly in online training with limited data.In Fig. S3b, we add the Hebbian-style error term to update W 2,3 using F eHebb , while the rest of the network is trained with F 0 through feedback alignment (Tab. 1, row 3).The information flow to W 0,1 and W 1,2 stays the same; however, F eHebb introduces an auxiliary information channel from B 3,2 to W 2,3 .As presented in Tab. 1, this supplementary channel results in a better alignment of e 2 with the corresponding error vector transmitted via backprop.
Figure S3c repeats this experiment with W 1,2 updated using F eHebb while other layers are updated with F 0 with feedback alignment (Tab. 1, row 2).Although W 0,1 is updated with the same flow of information as Fig. S3b, there is a new flow from B 2,1 to W 1,2 , which improves e 1 's alignment.Note that better alignment of e 1 results in more backprop-like weight update, which subsequently improves data propagation to the downstream layers.As a result, the alignments in the downstream layers are slightly improved as well, even with the vanilla F 0 plasticity rule with feedback alignment updating them.This behavior is similar to the reduced alignment angles in Fig. 7c, where F Oja positively affects the alignments by improving the forward data propagation.
< l a t e x i t s h a 1 _ b a s e 6 4 = " c F / 2 0 D 3 9 F M 9 2 X 3 W H 5 5 < l a t e x i t s h a 1 _ b a s e 6 4 = " I j L D q V M e D i q X z J n n I n k 0 p M C 5 l B w = " > A A A D z n i c f V L b a t t A E N 1 Y v a T u z W n 7 1 p e l T i E p x k i m 9 E I x h P a l j y 7 U S U A S Z r U a y 4 v 3 I n Z X T h w h + t r P K H 1 r / 6 h / 0 5 / 5 e q s l Z j 0 k J G r t x w 4 F z H 1 4 d c I j L b l B 9 q K q 2 c 3 W w 7 e G r y f G g H 7 z p v / 4 y 6 B 5 9 3 P h 7 F z 1 H L 9 A B C t B b d I Q + o x E a I 4 o u 0 A / 0 C / 3 2 R t 7 C q 7 x v 6 9 L W z q b n K W q E 9 / 0 v j 9 B F e A = = < / l a t e x i t > e 1 < l a t e x i t s h a 1 _ b a s e 6 4 = " a o D s f / W h 3 T 5 7 r p G 9 l x q + B K a 6 u f P 8 m W I / S N F E s e b 2 g C j Z j l S I i p x t t m D K b 0 q L U 1 + y u l 5 9 S Z d Z r P F / t s Z H P P F 3 y C j Z r W T r / 4 u M E O a t 7 Q S J / n m C q v U L y O E p d U t P x P 0 E P 9 + t d 9 O v j Q e r a 8 < l a t e x i t s h a 1 _ b a s e 6 4 = " c F / 2 0 D 3 9 F M 9 2 X 3 W H 5 5 < l a t e x i t s h a 1 _ b a s e 6 4 = " I j L D q V M e D i q X z J n n / 5 e q s l Z j 0 k J G r t x w 4 F z H 1 4 d c I j L b l B 9 q K q 2 c 3 W w 7 e G r y f G g H 7 z p v / 4 y 6 B 5 9 3 P h 7 F z 1 H L 9 A B C t B b d I Q + o x E a I 4 o u 0 A / 0 C / 3 2 R t 7 C q 7 x v 6 9 L W z q b n K W q E 9 / 0 v j 9 B F e A = = < / l a t e x i t > e 1 < l a t e x i t s h a 1 _ b a s e 6 4 = " a o D s f / W h 3 T 5 7 r p G 9 l x q + B K a 6 u f P 8 m W I / S N F E s e b 2 g C j Z j l S I i p x t t m D K b 0 q L U 1 + y u l 5 9 S Z d Z r P F / t s Z H y z f a O 7 J 8 k 6 z G f 4 q q A f 4 y e I f 2 P P 7 g q Z T W J B Q / U 5 V V 2 n q y r X n F k X R X / W 1 o N r 1 2 / c 3 L g V 3 r 5 z 9 < l a t e x i t s h a 1 _ b a s e 6 4 = " c F / 2 0 D 3 9 F M 9 2 X 3 W H 5 5 < l a t e x i t s h a 1 _ b a s e 6 4 = " I j L D q V M e D i q X z J n n I n k 0 r p G 9 l x q + B K a 6 u f P 8 m W I / S N F E s e b 2 g C j Z j l S I i p x t t m D K b 0 q L U 1 + y u l 5 9 S Z d Z r P F / t s Z H  ○), which is then propagated to W 2,3 after the forward propagation.(b) W 2,3 is updated using F eHebb (Θ) = θ 0 F 0 + θ 2 F 2 , while W 0,1 and W 1,2 are trained with the rule F(Θ) = θ 0 F 0 via feedback alignment.Plasticity rule F 0 transmits information from B 2,1 and B 3,2 to W 0,1 ( 1 ○) and from B 3,2 to W 1,2 ( 2 ○).This information is propagated to their downstream layers after the forward path ( 4○).Concurrently, an additional channel established by F 2 explicitly propagates the information from B 3,2 to W 2,3 ( 3 ○).(c) W 0,1 and W 2,3 use the plasticity rule F(Θ) = θ 0 F 0 via feedback alignment, and W 1,2 utilizes F eHebb (Θ) = θ 0 F 0 +θ 2 F 2 .F 0 communicates information from B 2,1 and B 3,2 to W 0,1 ( 1 ○), which then is propagated to the downstream layers ( 3 ○).Meanwhile, the F 0 rule in F eHebb disseminates information from B 3,2 to W 1,2 , while F 2 in F eHebb establishes a direct route to transmit information from B 2,1 to W 1,2 ( 2 ○).The ensuing forward propagation from W 1,2 to the downstream layers continues as usual.In all graphs, blue arrows represent the propagation of data through the forward or backward path, while the red arrow represents the flow of information from the backward pathway to the forward connections.

Expectation of Hebbian-style error-based plasticity
Assume that the entries of B +1, are i.i.d. with expectation zero and independent from the entries of e +1 .Also assume that the entries of e have variance σ 2 .In this Supplementary section, we show that The last two lines follow from the fact that whenever i = j, the expectation is equal to zero.Eq. (S.2) follows directly.
Performance of the F Oja on FashionMNIST Section 2.2.2 examines how Oja's rule improves learning in the Feedback Alignment model (Fig. 7).In this section, we demonstrate the effectiveness of Oja's rule on a different dataset by using the FashionMNIST [50] to train a classifier model.Figure S4 illustrates that introducing the Oja's rule (Eq.9) substantially enhances learning across different datasets when the model is trained with random feedback connections.connected classifier network on FashionMNIST dataset [50] for a 10-way classification task.The plot demonstrates accuracy versus the number of training data for Feedback Alignment (FA) [5] and backprop (BP) [1] methods, compared to F Oja (Eq.9).

Performance of alternative penalization methods
In Sec.2.2, we proposed using L1 regularization on the meta-loss to decrease redundancy within the update rules.As shown in Fig. 4d, this technique leads to a sparser set of meta-parameters and acts as a model selection method, identifying the most effective plasticity rules.In Fig. S5, we examine the impact of alternative regularization methods on the metalearning algorithm by comparing the performance of models with no regularization and L2 regularization.When using no regularization in the meta-learning, the algorithm eliminates update terms negatively impacting the learning.However, another set of plasticity rules may individually improve the results, but when these rules are considered in a set, other terms may be more beneficial for the optimization process.Nevertheless, the model still includes them in the final meta-optimized learning rule.As seen in Fig. S5a, the model has identified seven plasticity terms, making it impractical to investigate each of these terms individually.
As an alternative, Fig. S5b shows the results of using L2 regularization L meta (Θ) = L(f W (X query ), Y query ) + λ Θ 2 .(S.3) Unlike L1 regularization, L2 tends to decrease all parameters but does not return sparse solutions and is unsuitable for feature selection.In other words, even though L2 regularization reduces the values of all parameters, it does not eliminate the redundant or less influential plasticity terms with large meta-parameters from the final solution.[49] with online learning.Evolution of meta-parameters for the pool of learning rules defined in section 4.2 using (a) no penalization, (b) L2 penalized meta-loss (Eq.S.3).

Performance of alternative backward initialization
As mentioned in Sec.4.1, the Xavier initialization method was used to randomly sample forward and backward connections from a uniform distribution B +1, , W , +1 ∼ U − 6 dim(y ) + dim(y +1 ) , 6 dim(y ) + dim(y +1 ) (S.4) throughout the study, where dim(y ) is the dimension of the activation y .Nevertheless, the findings presented in this work do not depend on the initialization method of the backward connections.
To illustrate this, we conducted an experiment where we employed the normal Xavier initialization method B +1, ∼ N 0, 2 dim(y ) + dim(y +1 ) (S.5) to sample initial values for the backward connections.The forward connections were initialized using a uniform distribution as before (Eq.S.4). Figure S6 shows that the proposed F bio plasticity rule can successfully train the model using different methods for initializing the backward connections.[27] to perform a 10-way classification task using Feedback Alignment (FA) [5] compared to the proposed F bio plasticity rule (bio) outlined in Eq. 7. The backward connections were initialized in both tests using the normal Xavier initialization method (Eq.S.5).

Inter-treatment variation
Throughout the paper, we examine the variations within each plasticity rule by calculating the confidence intervals.To determine if the improvements in accuracy are statistically significant, we use the Mann-Whitney U test to compare two sets of data: the accuracy of trials using the FA method and the modified plasticity rule.Samples are taken at the end of each episode and represent the accuracy of the model trained with different initial weights and feedback connection values.We chose the Mann-Whitney U test over the t-test as it does not assume a Gaussian distribution within the groups.We begin by hypothesizing that the FA method trial samples show lower accuracy than that of the modified plasticity rule.We utilize 20 samples from each group.The results, illustrated in Fig. S7, indicate that the p-value falls below 5% within fewer than 100 episodes in every example.Our findings indicate strong evidence against the null hypothesis, providing statistical support for the performance gain using the proposed plasticity rules.

… < l a t e x i t s h a 1 _
b a s e 6 4 = " U y 7 x 2 R G j r G x v z I Z 5 0 T 3 f N f l 4 S f I= " > A A A C C 3 i c b V D L S s N A F J 3 U V 6 2 v q E s 3 o a 1 Q N y U p o i 6 L g r i s 0 B c 0 o U y m k 3 b o Z C b M T I Q S s n f j r 7 h x o Y h b f 8 C d f + O k z U J b L w x z O O d e 7 r n H j y i R y r a / j c L a + s b m V n G 7 t L O 7 t 3 9 g H h 5 1 J Y 8 F w h 3 E K R d 9 H 0 p M C c M d R R T F / U h g G P o U 9 / z p T a b 3 H r C Q h L O 2 m k X Y C + G Y k Y A g q D Q 1 N M t V N 4 R q g i B N b t O a 6 3 M 6 k r N Q f 4 n b n m A F 0 7 P q 0 K z Y d X t e 1 i p w c l A B e b W G 5 p c 7 4 i g O M V O I Q i k H j h 0 p L 4 F C E U R x W n J j i S O I p n C M B x o y G G L p J f N b U u t U M y M r 4 E I / p q w 5 + 3 s i g a H M H O r O z L h c 1 j L y P 2 0 Q q + D K S w i L Y o U Z W i w K Y m o p b m X B W C M i M F J 0 p g F E g m i v F p p A A Z H S 8 Z V 0 C M 7 y y a u g 2 6 g 7 F / X z + 0 a l e Z 3 H U Q Q n o A x q w A G X o A n u Q A t 0 A A K P 4 B m 8 g j f j y X g x 3 o 2 P R W v B y G e O w Z 8 y P n 8 A 3 F u a 8 A = = < / l a t e x i t > F(⇥) < l a t e x i t s h a 1 _ b a s e 6 4 = " m t 2 1 V D 0 X j C H l a H b R k z Q F N P v O v + E = " > A A A B / X i c b V D N S 8 M w H E 3 9 n P W r f t y 8 B I c w L 6 M V U S / i 0 I v H C e 4 D t j r S N N 3 C 0 r Q k q T B L 8 V 8 R x I M i X v 0 T v H s R / x v T b Q f d f B D y e O / 3 I y / P i x m V y r a / j Z n Z u f m F x c K S u b y y u r Z u b W z W Z Z Q I T G o 4 Y p F o e k g S R j m p K a o Y a c a C o N B j p O H 1 L 3 K / c U u E p B G / V o O Y u C H q c h p Q j J S W O t Z 2 2 4 u Y L w e h v t J G d p O W 7 P 2 s Y x X t s j 0 E nC b O m B T P P s z T + P H L r H a s z 7 Y f 4 S Q k X G G G p G w 5 d q z c F A l F M S O Z 2 U 4 k i R H u o y 5 p a c p R S K S b D t N n c E 8 r P g w i o Q 9 X c K j + 3 k h R K P 1 V x 4 l s B 6 m K M r D w K y w M I M T K R 7 D x N z h t B m g 5 k u W j c + 7 V v f d 4 E a N S W d a 3 U V h b 3 9 j c K m 6 X d n b 3 9 g / M w 6 O u D G O B S Q e H L B R 9 D 0 n CK C c d R R U j / U g Q F H i M 9 L z p T e b 3 H o i Q N O R t N Y u I G 6 A x p y O K k d L S 0 C x X H S 9 k v p w F + k u c 9 o Q o l N 4 n N e s s r Q 7 N i l W 3 5 o C r x M 5 J B e R o D c 0 v x w 9 x H B C u M E N S D m w r U m 6 C h K K Y k b T k x J J E C E / R m A w 0 5 S g g 0 k 3 m R 6 T w V C s + H I V C P 6 7 g X P 3 d k a B A Z m v q y g C p i V z 2 M v E / b x C r 0 Z W b U B 7 F i n C 8 G D S K G V Q h z B K B P h U E K z b T B G F B 9 a 4 Q T 5 B A W O n c S j o E e / n k V d J t 1 O 2 L + v l d o 9 K 8 z u M o g j I 4 A T V g g 0 v Q B L e g B T o A g 0 f w D F 7 B m / F k v B j v x s e i t G D k P c f g D 4 z P H x g v l 7 w = < / l a t e x i t > ⇥ (0) < l a t e x i t s h a 1 _ b a s e 6 4 = " 0 v V f + x M H I M L 2 P k T d 8 J A D L M W y N 8 I = " > A A A C E 3 i c b V D L S s N A F J 3 U V 6 2 v q E s 3 w V Y Q w Z I U U Z d F N y 5 c V O g L m l B u p t N 2 6 G Q S Z i Z C C f k H N / 6 K G x e K uH X j z r 9 x 0 m a h 1 Q v D H M 6 5 l 3 v u 8 S N G p b L t L 6 O w t L y y u l Z c L 2 1 s b m 3 v m L t 7 b R n G A p M W D l k o u j 5 I w i g n L U U V I 9 1 I E A h 8 R j r + 5 D r T O / d E S B r y p p p G x A t g x O m Q Y l C a 6 p s n l V O X g 8 + g n 7 h + yA Z y G u g v c Z t j o i B N 3 Q D U G A N L b t N K 3 y z b V X t W 1 l / g 5 K C M 8 m r 0 z U 9 3 E O I 4 I F x h B l L 2 H D t S X g J C U c x I W n J j S S L A E x i R n o Y c A i K 9 Z H Z T a h 1 p Z m A N Q 6 E f V 9 a M / T m R Q C A z s 7 o z 8 y g X t Y z 8 T + v F a n j p J Z R H s S I c z x c N Y 2 a p 0 M o C s g Z U E K z Y V A P A g m q v F h 6 D A K x 0 j C U d gr N 4 8 l / Q r l W d 8 + r Z X a 1 c v 8 r j K K I D d I i O k Y M u U B 3 d o A Z q I Y w e 0 B N 6 Q a / G o / F s v B n v 8 9 a C k c / s o 1 9 l f H w D 5 v S e 0 w = = < / l a t e x i t > r ⇥ L < l a t e x i t s h a 1 _ b a s e 6 4 = " j f d W I Q N V h y 4 A i I U n 4 r w 7 O W 3 x k v 0 = " > A A A B + 3 i c b V B P S 8 M w H E 3 n v z n / 1 X n 0 E t w E T 6 M d o h 6 H X j x O c H O w l Z K m 6 R a W J i V J x V H 6 V b x 4 U M S r X 8 S b 3 8 Z 0 6 0 E 3 H 4 Q 8 3 v v 9 y M s L E k a V d p x v q 7 K 2 v r G 5 V d 2 u 7 e z u 7 R / Y h / W + E q n E p I c F E 3 I Q Z C P J b T P h 0 w 6 D s e P 2 B 3 b f d f s t / T e T k x T q t N 0 L h l f P 8 3 7 5 f Z 8 4 Z 1 R p z z t f a z i 3 b t + 5 u 3 6 v e f / B w 0 e P W x t P D l V W S E y G O G O Z P I 6 R I o w K M t R U M 3 K c S 4 J 4 z M h R P N 2 t + K M T I h X N x I G e 5 S T i K B V 0 T D H S F h q 1 n p t w X i T Y O / j 4 I T K 9 3 k 5 3 t 1 e G m E r M S G J 6 Z T l q t T 3 X m x u 8 6 v h L p z 3 Y D F / 9 O B / M 9 k c b j

4 <
r M O w p L a W S G e I I m w t u v 7 X 6 U a x 5 F M q e h 7 7 l s q I p O S j B M t Z 7 U Y I 5 C O a V z H U o n y C c V n d R S x N L O j T P g N M M X l t f + / C l a r V H U U Z 7 w 6 U A n r s X l O R E J X x l B F P K Z p I W + 4 X S c 5 o b l a n v F s c c d a P r N 0 w U t Y 7 6 X p 9 L O N E + S 0 m g W J 5 O W F 7 A I / M m F F B x f S 7 m 9 V t 3 C r 5 3 Z k m v C S h S J L S K A m K C f 9 R X 4 n k e i 0 Q 4 U g E t p 2 / a 5 V H 5 w X 2 I a m 7 Z f v y r J p V e 2 v a v i q c 9 h 1 / d d u 7 5 O V 9 3 u w s H X w D G y C L e C D H T A A e 2 A f D A E G X 8 F v 8 A f 8 d S L n i / P N + b 4 I b a w t c 5 6 C m j k / / w H M 6 k 1 F < / l a t e x i t > l a t e x i t s h a 1 _ b a s e 6 4 = "

x 6 8 e 4 8 2
+ c t F l o 6 4 G B w z n 3 c s 8 c L + J M a d v + t g o r q 2 v r G 8 X N 0 t b 2 z u 5 e e f + g p c J Y E t o k I Q 9 l x 8 O K c i Z o U z P N a S e S F A c e p 2 1 v f J P 5 7 Q m V i o X i Q U 8 j 6 g Z 4 K J j P C N Z G c n s B 1 i O C e X K b P p 7 3 y x W 7 a s + A l o m T k w r k a P T L X 7 1 B S O K A C k 0 4 V q r r 2 J F 2 E y w 1 I 5 y m p V 6 s a I T J G A 9 p 1 1 C B A 6 r c Z B Y 6 R S d G G S A / l O Y J j W b q 7 4 0 E B 0 p N A 8 9 M Z i H V o p e J / 3 n d W P t X b s J E F G s q y

Figure 2 :
Figure2: Schematic depiction of the meta-learning workflow: (1) A pool of R biologically plausible plasticity terms {F r } 0≤r≤R−1 is exploited to define a plasticity rule F(Θ) that governs the weight updates of the model f W .Each term F r integrates local elements available to the weight, including pre-synaptic activation y i , post-synaptic activation y j , pre-synaptic error e i , post-synaptic error e j , and the current state of the weight W i,j .Such terms are consistent with local plasticity if y i and e i are encoded by the same neuron (see Discussion).The linear combination of these terms defines plasticity rule F(Θ), where Θ = {θ r |0 ≤ r ≤ R − 1} is the set of meta-parameters shared across the network.(2) The parameterized local learning rule F(Θ) is used to navigate the weight parameter space.At each episode ε, F(Θ (ε) ) iteratively searches for optimized W starting from a random weight W (0) .A single data point sampled from T ε 's train set is used at each adaptation step for online training of the model.(3) In the meta-optimization phase, the solution W of the inner loop is used to compute the loss on the query set of task T ε .Then, a gradient-based strategy explores the meta-parameter space to optimize the plasticity meta-parameters Θ(ε) .(4) The plasticity rule F(Θ) is reconstructed using the updated meta-parameters Θ (ε+1) to guide the weight optimization in the next episode.This procedure is repeated until the meta-parameters converge.In the initial episodes, the unoptimized F(Θ) is unlikely to direct W to a solution.However, as Θ converges, F(Θ) discovers a new direction that may only partially adhere to the direction of the gradient.

Figure 3 :
Figure 3: Meta-learning coefficients for feedback alignment and backprop.(a) Metaaccuracy of feedback alignment (FA) compared to backprop (BP) trained using the meta-learning framework (Alg. 1) during 600 meta-optimization episodes and (b) the corresponding meta-loss, (c) evolution of the learning rate meta-parameter (initialized to 10 −3 ) with feedback alignment (FA) compared to backprop (BP) during 600 meta-optimization episodes.In this figure, each meta-parameter was optimized separately in a single-parameter meta-optimization problem and is superimposed for comparison.(d) Alignment angle α between modulating signals of the feedback alignment e F A and backprop e BP for l = 1, 2, 3, and 4. For both approaches, e 5 is computed using ∂L/∂z L and has the same value, resulting in α 5 = 0.

Figure 4 :
Figure 4: Performance of the model trained with the pool of biologically plausible plasticity rules F pool : (a) Accuracy and (b) loss for F pool compared to F 0 via feedback alignment (FA) and backprop (BP), (c) alignment of the teaching signals of F pool with the ones for backprop, and (d) convergence of the plasticity coefficients.

Figure 5 :
Figure 5: Performance of the image classification network trained with F eHebb (Eq.8):(a) Meta-accuracy and (b) meta-loss plots for F eHebb compared to F 0 via feedback alignment (FA) and backprop (BP), (c) alignment angles for modulating signals across the network, and (d) convergence of the plasticity coefficients using the meta-learning model.

Figure 6
Figure 6  illustrates how F eHebb alters the communications between the backward and forward pathways.The diagram in Fig.6ashows a model solely trained with the F 0 via feedback

Figure 6 :
Figure 6: Information flow between the forward and backward pathways: (a) Both layers are trained with the rule F(Θ) = θ 0 F 0 via feedback alignment.In this case, information from B 2,1 is transmittedto W 0,1 through F 0 ( 1 ○) and then propagated forward to W 1,2 ( 2 ○).(b)The first layer is updated with the rule F(Θ) = θ 0 F 0 via feedback alignment, while the second layer usesF eHebb (Θ) = θ 0 F 0 + θ 2 F 2 .Using F 0 , information from B 2,1 is communicated to W 1,2 ( 1 ○, 3○); meanwhile, the presence of F 2 sets up a new channel to directly communicate information from B 2,1 to W 1,2 ( 2 ○).The blue arrows depict information propagation through the forward and backward paths.The communications between feedback and feedforward pathways are represented with red arrows.

Figure 7 :
Figure 7: Performance of the model trained using F Oja (Eq.9) through fixed backward connections: (a) Meta-accuracy and (b) meta-loss of F Oja compared to F 0 learning rule via feedback alignment (FA) and backprop (BP), (c) alignment of modulating signals of F Oja with backprop's teaching signals, and (d) evolution of the plasticity meta-parameters.

Figure 8 :
Figure 8: Orthonormality error throught a deep network for different plasticity rules: Orthonormality errors are measured by Eq. 10 for different layers of a 5-layer deep network.The model is trained using (a) F 0 via feedback alignment (FA), (b) F eHebb , (c) F Oja , and (d) backprop (BP).In this comparison, the last layer has been excluded.

Figure 1
Figure 1 performs a 10-way classification on the MNIST dataset, with images resized to 28 × 28 dimensions.The model is a 5-layer fully connected neural network with dimensions 784-170-130-100-70-47.Hidden layers use the softplus activation function σ(z ) = 1 β log(1 + exp(βz )), (11) with β = 10.The output layer uses the softmax activation function.Figures 3 -5 and 7 -8 perform 5-way classification on the EMNIST dataset.These figures use the same architecture as Fig. 1.For Tab. 1, the model conducts a 5-way classification on the EMNIST dataset with an image size of 28 × 28.The model is a 3-layer fully connected neural network with dimensions 784-130-70-47.Like the rest of the paper, hidden layers use softplus non-linearity with β = 10, while the output layer uses softmax.In the fixed feedback pathway problem, the weights and feedback connections are initially set to random values that differ from each other.Both symmetric and fixed feedback models utilize the Xavier method[45] to re-initialize forward and backward connections at the start of each meta-learning episode.In Figs. 4, 5, 7, and 8, and Tab. 1, we set the initial value for the learning rate θ 0 of the term F 0 to 10 −3 and set all other hyper-parameters to zero.All plots depict the mean outcome over 20 trials, each with different initial weights and feedback matrices.The shaded region in the loss, accuracy, and meta-parameters plots illustrates the 98% confidence interval, determined through bootstrapping with 500 samples.

Figure S1 :
Figure S1: Performance of benchmark learning schemes while training a 5−layer fully-

Figure S2 :
Figure S2: Performance of the classifier network trained with F bio plasticity rule:

3 < l a t e x i t s h a 1 _ b a s e 6 4 =
" 0 T q 8 j E S E T Y m J 2 D 3 s U 5 7 O g U w p K k g = " > A A A D z n i c f V J b a 9 s w F F b j X b r s l m 5 7 2 4 t Y O m h H C H Y Y u z A C Z X v Z Y w Z L W 7 B N k G X F E d H F S H L a V I i 9 7 m e M v W3 / a P 9 m c p J B n b Y 7 Y H z 8 f e f o f N b 5 s p J R b c L w z 0 4 r u H X 7 z t 3 d e + 3 7 D x 4 + e t z Z e 3 K s Z a U w G W P J p D r N k C a M C j I 2 1 D B y W i q C e M b I S T b / V P M n C 6 I 0 l e K r W Z Y k 5 a g Q d E o x M h 6 a d J 7 t J 5 l k u V 5 y / 7 J L N 7 G h 2 5 9 0

7 m 6 Y o 7 2
J x l 6 P z C 1 w l y V m t B I n + V Y K q 8 Q / I 4 S m 1 S 0 / E / N w 8 P 6 l 3 0 68 / D 1 L b h p U i E z E m s Z 6 g k w 3 V / L 1 f o r E e F I A r 6 c c O B d x 9 c H X A I b T d y H 5 x r e 1 d H 2 x 6 + m h w P + t G b / u s v g + 7 R x 4 2 / d 8 F z 8 A I c g A i 8 B U f g M x i B M c D g A v w A v 8 D v Y B Q s A h d 8 W 5 e 2 d j Y 9 T 0 E j g u 9 / A c + V R Y s = < / l a t e x i t > y 0 < l a t e x i t s h a 1 _ b a s e 6 4 = " U a V A S P i k U K z 1 g V 5 u I M q 3 c s 7 f V b 4 = " > A A A D z n i c f V J b a 9 s w F F b j X b r s l m 5 7 2 4 t Y O m h H C H Y Y u z A C Z X v Z Y w Z L W 7 B N k G X F E d E N S U 6 b G r H X / Y y x t + 0 f 7 d 9 M T j K o 0 3 Y H j I + / 7 x y d z z p f p h g 1 N g z / 7 L S C W 7 f v 3 N 2 9 1 7 7 / 4 O G j x 5 2 9 J 8 d G l h q T M Z Z M 6 t M M G c K o I G N L L S O n S h P E M 0 Z O s v m n m j 9 Z E G 2 o F F / t U p G U o 0 L Q K c X I e m j S e b a f Z J L l Z s n 9 q 1 q 6 S R W 5 / U m n G / b D V c Cr S b R J u m A T o 8 l e 6 2 e S S 1 x y I i x m y J g 4 C p V N K 6 Q t x Y y 4 d l I a o h C e o 4 L E P h

1 < l a t e x i t s h a 1 _
r P 1 3 t s 9 D N P l 9 z B 5 i x L 5 x e + T p C z W g s S + a sE U + 0 d k s d R W i U 1 H f 9 z 8 / C g 3 k W / / j x M q z a 8 F I m Q O Y n N D C k y X P f 3 c o 3 O e l Q I o q E f N x x 4 9 8 H V A Y e w 6 k b u g 3 N t 7 + p o 2 8 N X k + N B P 3 r T f / 1 l 0 D 3 6 u P H 3 L n g O X o A D E I G 3 4 A h 8 B i M w B h h c g B /g F / g d j I J F 4 I J v 6 9 L W z q b n K W h E 8 P 0 v 0 u x F j A = = < / l a t e x i t > y b a s e 6 4 = " 0 a y P P t 9 M z u z M 1 + m G D U 2 D P 9 s t Y I b N 2 / d 3 r 7 T v n v v / o O d z u 7 D Y y N L j c k I S y b 1 S Y Y M Y V S Q k a W W k R O l C e I Z I + N s / r 7 m x 6 d E G y r F Z 7 t Q J O W o E H R K M b I e m n Q e 7 y W Z Z L l Z c P 9 z 4 2 r i w h 6 M q r 1 J p x v 2 w 6 X B y 0 6 0 d r p g b U e T 3 d a P J J e 4 5 E R r t r i K v 1 9 l Y 8 W N l e 9 s G y x L i y r + A K / w g / g 3 e J E j d t G W k 1 Y 6 / b 8 Y z n v m y k l F t w v D P V i u 4 c f P W 7 e 0 7 7 b v 3 7 j / Y 6 e w + P N a y U r U S G I g r 7 c c O D V B 5 c X H E D b j d x b 5 9 p e 1 d G m h i 8 7 x 4 N + 9 L L / 4 u O g e / h u r e 9 t 8 A Q 8 5 n e R c r D e g J I 2 d d F H i p v J S J u A J m p L n 0 / 6 t g 2 0 r b R Y k S 7 Y A a 3 I 3 V m s q C r c i w V T 5 m Z W W u m F 2 / O G H a L s d 4 t p h j J 5 9 7 u h I N 7 t Z y b P r F x 0 l 6 2 m o B W T x N C T N + Q 4 o k z u q 0 p Z N / K z z c b m c x a K 8 7 W R 3 i c 5 Z K V d D E T k D T 4 S K / X x g 4 7 T M p q c G + 3 H D X b x + e P 7 C D 6 1 7 c v G 6 a 0 G 9 1 v L r D F 5 2 D 3 U H 8 f P D s U 9 T b e 4 s W t o E e o c d o G 8 X o B d p D H 9 A + 2 1 0 t n L D e r N V 4 s 9 9 j o 5 4 4 u R I W b s y yb f n Z 1 E s 5 r L U S m r y L K t H N I G g Z x G d V 0 + M / J g + 1 6 F / 3 6 c y c u 2 / h S R F K l E J o J y W G w 7 O + l m p z 3 m J S g s R s 3 2 H X u w 4 s D d n D Z D a p 3 V d V 2 r g 7 W P X w 1 O d n t B 2 / 6 r 4 + d v Q / R M j b R c / Q C b a M A 7 a E D 9 A E d o S G i a I a + o u / o h / f R 0 9 7 c K 5 e l r Y 1 V z 1 P U C O / L X y V V R 6 4 = < / l a t e x i t >e 0 < l a t e x i t s h a 1 _ b a s e 6 4 = " k x D b Q

3 <
2 / C Y u m c w L C 5 I u B 4 0 L j q 3 C 9 T 3 g l G m g l s 9 d Q q h m T i u m E 6 I J t e 6 2 / n d S g x N E Z 0 w O / P 5 b J u M y A y X A 6 n m j p p TE J i x p Y p k m + Y T R 8 y Z K e K a c l I m 4 B m a 0 u v L / 1 8 H 6 K k 0 T p U r U C 6 p w s z b P Q a Z s T Y Y p k j H L C n 3 N 7 n r p j O V m t c b z 5 R 4 b / d z R h a h w c 5 Z l 0 8 + u T s J Z r Y X I 9 H l E m X Y O S c M g L q O a D v 8 5 e b B T 7 6 J f f + 7 G Z R t f i E i q F E I z I T k M l v 2 9 V J O z H p M S N H b j B n v O f X h x w C 4 u u 0 H 1 r q r a z t X B u o c v J 8 d 7 / e B V / + W R s / c B W s Y m e o K e o h 0 U o N d o H 3 1 Ah 2 i I K J q h r + g 7 + u F 9 9 L Q 3 9 8 p l a W t j 1 f M I N c L 7 8 h c v V E e x < / l a t e x i t > e l a t e x i t s h a 1 _ b a s e 6 4 = " v 7 o 7 U A m k m 3 Y e U P j 9 8 N C r K t a D y e O K P M S z P j t K l l s e U z E D v 4 A H 6 A r + B D Y M 0 4 C V K d t l z J 8 v U 5 9 8 4 9 n n t S x a i x Q f B r o + X d u H n r 9 u a d 9 t b d e / e 3 O z s P j o w s N C Z D L J n U J y k y h F F B h p Z a R k 6 U J o i n j B y n 0 3 c 1 f z w j 2 l A p P t q 5 I g l H u a B j i p F 1 0 K j z K E 4 l y 8 y c u 1 d 5 W I 3 K s B d U o 0 4 3 8 I N F w M t J u E q 6 B 0 9 / / / g 5 2 / o z 7 m U a n P S o E 0 d C N 6 + 8 7 9 8 H F A X u w 7 I b V 2 6 p q O 1 e H 6 x 6 + n B z t + + F L / 8 U H Z + 9 D s I x N 8 B g 8 A b s g B K / A A X g P B m A I M D g H X 8 A 3 8 N 0 b e D O v 8 j 4 t S 1 s b q 5 6 H o B H e 5 7 / s r E m w < / l a t e x i t > B 1,0 < l a t e x i t s h a 1 _ b a s e 6 4 = " / d 4 h C X x o J M Y Y O b b j r K t a D y e O K P M w 5 o Z p 0 1 H F l s + A 7 G D D + A H + A o + B N a M k y D V a c u V L F + f c + / c 4 7 k n y R n V x v d / b T S 8 G z d v 3 d 6 8 0 9 y 6 e + / + g 9 b 2 w y

7 8 J
V l e p 6 y i W v F p Q C e u 1 e U 5 E S j d k 6 C I Z 0 6 x Q l + y u k 8 5 o r t d r P F 3 t s d b P H F 3 w E t Z n G T r 9 4 O o E m V d a k E i f R J g q 5 5 A 0 D G I b V X T 4 z 8 7 9 v W o X 3 e p z P 7 Z N e C Y i I V M S 6 g n K S X / V 3 0 k V m n e o E E R B N 6 7 f c + 6 D y w P 2 o W 0 H 5 e u y b D p X B 5 s e P p 8 c 9 7 r B 8 + 6

9 7 9
z a 0 H B 1 Z V h t A R U V y Z o x w s 5 U z S k W O O 0 y N t K I i c 0 8 N 8 + q 7 l D 0 + o s U z J z 2 6 m a S a g l G z M C D g P J a k A N y HA 6 4 / N 8 W Y v G k R z w x e d e O n 0 3 v w K h / r n 7 3 D / e G v 9 R 1 o o U g k q H e F g b R J H 2 m U 1 G M c I p 0 2 Y V p Z q I F M o a e J d C Y L a r J 5 r b v A T j x R 4 r I w / 0 u E 5 e j 6 j B m H t T O Q + s t V o V 7 k W v I x L K j d + m d V M 6 s p R S R a F x h X H T u G 2 A b h g h h L H Z 9 4 B Y p j X i s k E D B D n 2 / S / l z q c A F M y O Y w G r 5 j M 6 p I q Q Z 2 Z d W J q C S 5 n e R c r D e g J I 2 d d F H i p v J S J u A J m p L n 0 / 6 t g 2 0 r b R Y k S 7 Y A a 3 I 3 V m s q C r c i w V T 5 m Z W W u m F 2 / O G H a Ls d 4 t p h j J 5 9 7 u h I N 7 t Z y b P r F x 0 l 6 2 m o B W T x N C T N + Q 4 o k z u q 0 p Z N / K z z c b m c x a K 8 7 W R 3 i c 5 Z K V d D E T k D T 4 S K / X x g 4 7 T M p q c G + 3 H D X b x + e P 7 C D 6 1 7 c v G 6 a 0 G 9 1 v L r D F 5 2 D 3 U H 8 f P D s U 9 T b e 4 s W t o E e o c d o G 8 X o B d p D H 9 A +G i G C F P q K v q H v w f t g G p j A L U L X 1 5 Y 5 D 1 H H g u Y v r s x E 5 g = = < / l a t e x i t > L < l a t e x i t s h a 1 _ b a s e 6 4 = " Q b + l 7 K J Z V R n E g 6 J X W M W j p C m 2 4 n E = " > A A A D y n i c f V L L a t t A F J 1 Y f a T u y 2 m h m 2 6 G m k J S j J F C a V K K I aS b L r p I o E 4 C k j C j 0 b U 8 e B 5 i Z u T E F d r 1 M 7 o q t L v + R X + i m 3 5 L R 7 Y L k Z P 0 g t D V O f f O P Z p 7 k p w z Y 3 3 / 9 0 b L u 3 X 7 z t 3 N e + 3 7 D x 4 + e t z Z e n

t 2 1 3 <
0 t n L D e r N V 4 s 9 9 j o 5 4 4 u R I W b s y y b f n Z 1 E s 5 r L U S m r y L K t H N I G g Z x G d V 0 + M / J g + 1 6 F / 3 6 c y c u 2 / h S R F K l E J o J y W G w 7 O + l m p z 3 m J S g s R s 3 2 H X u w 4 s D d n D Z D a p 3V d V 2 r g 7 W P X w 1 O d n t B 2 / 6 r 4 + d v Q / R M j b R c / Q C b a M A 7 a E D 9 A E d o S G i a I a + o u / o h / f R 0 9 7 c K 5 e l r Y 1 V z 1 P U C O / L X y V V R 6 4 = < / l a t e x i t > e 0 < l a t e x i t s h a 1 _ b a s e 6 4 = " k x D b Q Q A l 2 N p l w W i + k e k B 4 i G k l 8 E = " > A A A D y n i c f V L L b t N A F J 3 G P E p 4 N A W J D Z s R E V K L o s g u i I d Q p K p s W L B o J d J W s q 1 o P L 5 x R p m H N T N O G y z v + A x W S L D j L / g J N n w L 4 y R I d d p y J c v X 5 9 w 7 9 3 j u S X L O j P X 9 3 x s t 7 8 b N W 7 c 3 7 7 T v 3 r v / Y K u z / f D Y q E J T G F L F l T 5 N i A H O J A w t s x x O c w 1 E J B x O k u n 7 m j + Z g T Z M y U 9 2 n k M s S C b Z m F F i H T T q d K J E 8 d T M h X u V U I 1 e j D p d v + 8 v A l 9 O g l X S 3 X 9 8 9 I f 9 P P h 1 O N p u f Y t S R Q s B 0 l J O j A k D P 7 d x S b R l l E P V j g o D O a F T k k H o U k k E m L h c S K / w M 4 e k e K y 0 e 6 T F C / R i R 0 m E q d W 5 S k H s x K x z N X g V F x Z 2 / C Y u m c w L C 5 I u B 4 0 L j q 3 C 9 T 3 g l G m g l s 9 d Q q h m T i u m E 6 I J t e 6 2 / n d S g x N E Z 0 w O / P 5 b J u M y A y X A 6 n m j p p T E J i x p Y p k m + Y T R 8 y Z K e K a c l I m 4 B m a 0 u v L / 1 8 H 6 K k 0 T p U r U C 6 p w s z b P Q a Z s T Y Y p k j H L C n 3 N 7 n r p j O V m t c b z 5 R 4 b / d z R h a h w c 5 Z l 0 8 + u T s J Z r Y X I 9 H l E m X Y O S c M g L q O a D v 8 5 e b B T 7 6 J f f + 7 G Z R t f i E i q F E I z I T k M l v 2 9 V J O z H p M S N H b j B n v O f X h x w C 4 u u 0 H 1 r q r a z t X B u o c v J 8 d 7 / e B V / + W R s / c B W s Y m e o K e o h 0 U o N d o H 3 1 A h 2 i I K J q h r + g 7 + u F 9 9 L Q 3 9 8 p l a W t j 1 f M I N c L 7 8 h c v V E e x < / l a t e x i t > e l a t e x i t s h a 1 _ b a s e 6 4 = " v 7 o 7 U A m k m 3 Y e U P j 9 8 N C C v Y 9 s v f s = " > A A A D z n i c f V L L b t N A F J 3 G P E o K N A V 2 b E Y E p B Z F l l 0 h H k K R q r J h G S T S Vr K t a D y e O K P M S z P j t K l l s e U z E D v 4 A H 6 A r + B D Y M 0 4 C V K d t l z J 8 v U 5 9 8 4 9 n n t S x a i x Q f B r o + X d u H n r 9 u a d 9 t b d e / e 3 O z s P j o w s N C Z D L J n U J y k y h F F B h p Z a R k 6 U J o i n j B y n 0 3 c 1 f z w j 2 l A p P t q 5 I g l H u a B j i p F 1 0 K j z K E 4 l y 8 y c u 1 d 5 W I 3 K s B d U o 0 4 3 8 I N F w M t J u E q 6 B 0 9 / / / g 5 2 / o z G O 2 0 v s a Z x A U n w m K G j I n C Q N m k R N p S z E j V j g t D F M J T l J P I p Q J x Y p J y I b + C z x y S w b H U 7 h E W L t C L H S X i p l b o K j m y E 7 P O 1 e B V X F T Y 8 e u k p E r M a o 1 n y z 0 2 + p m j C 1 7 B 5 i x L p + e u T p D T W g s S 2 f MY U + 0 c k k V h U s Y 1 H f 1 z c 3 + 3 3 o V f f + 4 l Z R t e i F j I j E R m g h T p L / t7 m U a n P S o E 0 d C N 6 + 8 7 9 8 H F A X u w 7 I b V 2 6 p q O 1 e H 6 x 6 + n B z t + + F L / 8 U H Z + 9 D s I x N 8 B g 8 A b s g B K / A A X g P B m A I M D g H X 8 A 3 8 N 0 b e D O v 8 j 4 t S 1 s b q 5 6 H o B H e 5 7 / s r E m w < / l a t e x i t > B 1,0 < l a t e x i t s h a 1 _ b a s e 6 4 = " / d 4 h C X x o J M Y Y O b b j r K t a D y e O K P M w 5 o Z p 0 1 H F l s + A 7 G D D + A H + A o + B N a M k y D V a c u V L F + f c + / c 4 7 k n y R n V x v d / b T S 8 G z d v 3 d 6 8 0 9 y 6 e + / + g 9 b 2 w yM t C 4 X J E E s m 1 U m C N G F U k K G h h p G T X B H E E 0 a O k + m 7 i j + e E a W p F B / N P C c x R 5 m g Y 4 q R c d C o 9 T h K J E v 1 n L u X P S x H 9 m W n V4 5 a b b / r L w J e T o J V 0 j 5 4 9 v v H z 9 n W n 8 F o u / E 1

6 t 8 WB 3 , 2 <
y 5 x 1 o / c 3 T B S 1 i f Z e j 0 3 N U J c l p p Q S J 9 E W G q n E P S M I h t V N H h P z f 3d 6 p d d K v P 3 d g 2 4 Y W I h E x J q C c o J / 1 l f y d V 6 L R D h S A K u n H 9 n n M f X B y w C 2 0 7 K N + W Z d O 5 O l j 3 8 O X k q N c N 9 r q v P j h 7 H 4 J l b I I n 4 C n Y A Q H Y B w f g P R i A I c D g H H w B 3 8 B 3 b + D N v N L 7 t C x t b K x 6 H o F a e J / / A v o I S b Q = < / l a t e x i t > l a t e x i t s h a 1 _ b a s e 6 4 = " o v f l / a j H z v 3 R 4 O G 6 q r x K j b n M 5 e g = " > A A A D z 3 i c f V L L b t N A F J 3 G P E r K I w W x Y j M i I L U o i u y A e A h F q m D D s p V I U 8 m 2 o v F 4 4 o w y D 2 t m n D S M j N j y G b C E P T / A V / A h s G a c B K l O W 6 5 k + f q c e + c e z z 1 J z q g 2 v v 9 r q + F d u X r t + v a N 5 s 7 N W 7 f v t H b v H m t Z K E w G W D K p T h K k C a O C D A w 1 j J z k i i Ce M D J M p m 8 r f j g j S l M p 3 p t F T m K O M k H H F C P j o F H r f p R I l u o F d y 8 7 L E e 2 1 4 F P y 1 G r 7 X f 9 Z c D z S b B O 2 g e P f v / 4 O d v 5 c z j a b X y N U o k L T o T B D G k d B n 5 u Y o u U o

3 <
Q y 6 a R M + C U w x e W F / 7 8 J V l e p 6 y i W v F p Q C e u 1 e U 5 E S j d k 6 C I Z 0 6 x Q l + y u k 8 5 o r t d r P F 3 t s d b P H F 3 w E t Z n G T r 9 4 O o E m V d a k E i f R J g q 5 5 A 0 D G I b V X T 4 z 8 7 9 v W o X 3 e p z P 7 Z N e C Y i I V M S 6 g n K S X / V 3 0 k V m n e o E E R B N 6 7 f c + 6 D y w P 2 o W 0 H 5 e u y b D p X B 5 s e P p 8 c 9 7 r B 8 + 6z I 2 f v N 2 A V 2 + A B e A j 2 Q A B e g A P w D h y C A c D A g i / g G / j u H X l z 7 6 P 3 a V X a 2 F r 3 3 A O 1 8 D 7 / B d m V S f M = < / l a t e x i t > W 2,3 < l a t e x i t s h a 1 _ b a s e 6 4 = " n U e S a B S k M h M O I / W 9 3 O E d Q O s p Q k g = " > A A A D z H i c f V L d b t M w F P Y a f k b 5 W T c u Q c i i Q t r Q V C U D 8 S N U a Y I b r t A m 0 W 1 S E l W O 4 6 Z W / R P Z z t Z i + R I e g w t u 4 B H 2 K j w D L 4 H T F m n p N o 4 U 5 e T 7 z v H 5 4 v N l J a P a h O H v t V Z w 4 + a t 2 + t 3 2 n f v 3 X + w 0 d n c O t K y U p g M s G R S n W R I E 0 Y F G R h q G D k p F U E 8 Y + Q 4 m 3 y o + e N T o j S V 4 r O Z l S T l q B B 0 R D E y H h p 2 t p J M s l z P u H / Z m R v a F 2 7 Y 6 Y a 9 c B 7 w c h I t k + 7 + 4 / P D P 1 + f n B 8 M N 1 s / k l z i i h N h M E N a x 1 F Y m t Q i Z S h m x L W T S p M S 4 Q k q S O x T g T j R q Z 2 L d / C Z R 3 I 4 k s o / w s A 5 e r H D I q 5 r f b 6 S I z P W q 1 w N X s X F l R m 9 S S 0 V Z W W I w I t B o 4 p B I 2 F 9 E z C n i m D D Z j 5 B W F G v F e I x U g g b f 1 / / O 6 n B c a Q K K v p h 7 y 0 V q S 2 I 5 M S o W a P G C m Q y m j W x Q q F y T P G 0 i S J W S C 9 l z K + B K X Z X / v 8 q W F + l b q J Y 8 n p B D j Z r y 5 K I n K 7 I 0 F U 2 o k W l r t n d b n 5 K S 7 1 c 4 3 S x x 0 Y / 8 3 T F H W z O M n T y x d c J c l Z r Q S J / n m C q v E P y O E p t U t P x P y / 3 t + t d 9 O r P n d S 2 4 Y V I h M x J r M e o J P 1 F / 2 6 u 0 N k u F Y I o 6 M f 1 9 7 z 7 4 P y A H W i 7 k X v n X N u 7 O l r 1 8 O X k a K 8 X v e q 9 P P T 2 f g 8 W s Q 4 e g a d g G 0 T g N d g H H 8 E B G A A M p u A 7 + A l + B Z 8 C E 9 j A L Up b a 8 u e h 6 A R w b e / K R 5 I 2 w = = < / l a t e x i t > y l a t e x i t s h a 1 _ b a s e 6 4 = " 0 T q 8 j E S E T Y m J 2 D 3 s U 5 7 O g U w p K k g = " > A A A D z n i c f V J b a 9 s w F F b j X b r s l m 5 7 2 4 t Y O m h H C H Y Y u z A C Z X v Z Y w Z L W 7 B N k G X F E d H F S H L a V I i 9 7 m e M v W 3 / a P 9 m c p J B n b Y 7 Y H z 8 f e f o f N b 5 s p J R b c L w z 0 4 r u H X 7 z t 3 d e + 3 7 D x 4 + e t z Z e 3 K s Z a U w G W P J p D r N k C a M C j I 2 1 D B y W i q C e M b I S T b / V P M n C 6 I 0 l e K r W Z Y k 5 a g Q d E o x M h 6 a d J 7 t J 5 l k u V 5 y / 7 J L N 7 G h 2 5 9 0 u m E / X A W 8 m k S b p A s 2 M Z r s t X 4 m u

1 <
7 m 6 Y o 7 2 J x l 6 P z C 1 w l y V m t B I n + V Y K q 8 Q / I 4 S m 1 S 0 / E / N w 8 P 6 l 3 0 6 8 / D 1 L b h p U i E z E m s Z 6 g k w 3 V / L 1 f o r E e F I A r 6 c c O B d x 9 c H X A I b T d y H 5 x r e 1 d H 2 x 6 + m h wP + t G b / u s v g + 7 R x 4 2 / d 8 F z 8 A I c g A i 8 B U f g M x i B M c D g A v w A v 8 D v Y B Q s A h d 8 W 5 e 2 d j Y 9 T 0 E j g u 9 / A c + V R Y s = < / l a t e x i t > y 0 < l a t e x i t s h a 1 _ b a s e 6 4 = " U a V A S P i k U K z 1 g V 5 u I M q 3 c s 7 f V b 4 = " > A A A D z n i c f V J b a 9 s w F F b j X b r s l m 5 7 2 4 t Y O m h H C H Y Y u z A C Z X v Z Y w Z L W 7 B N k G X F E d E N S U 6 b G r H X / Y y x t + 0 f 7 d 9 M T j K o 0 3 Y H j I + / 7 x y d z z p f p h g 1 N g z / 7 L S C W 7 f v 3 N 2 9 1 7 7 / 4 O G j x 5 2 9 J 8 d G l h q T M Z Z M 6 t M M G c K o I G N L L S O n S h P E M 0 Z O s v m n m j 9 Z E G 2 o F F / t U p G U o 0 L Q K c X I e m j S e b a f Z J L l Z s n 9 q 1 q 6 S R W 5 / U m n G / b D V c C r S b R J u m A T o 8 l e 6 2 e S S 1 x y I i x m y J g 4 C p V N K 6 Q t x Y y 4 d l I a o h C e o 4 L E P h W I E 5 N W K / k O v v R I D q d S + 0 d Y u E I v d 1 S I m 1 q g r + T I z s w 2 V 4 P X c X F p p + / S i g p V W i L w e t C 0 Z N B K W N 8 F z K k m 2 L K l T x D W 1 G u F e I Y 0 w t b f 2 P 9 O a n A c 6 Y K K Y d h / T 0 V a F U R y Y v W y U V M J Z D O a N b F C I z W j + L y J I l Z I L 2 X G b 4 A p d t f + / z Z Y X 6 V p o l j y e k E O N m u V I i K n W z J M m U 1 p U e o b d t f L F 1 S Z z R r P 1 3 t s 9 D N P l 9 z B 5 i x L 5 x e + T p C z W g s S + a s E U + 0 d k s d R W i U 1 H f 9 z 8 / C g 3 k W / / j x M q z a 8 F I m Q O Y n N D C k y X P f 3 c o 3 O e l Q I o q E f N x x 4 9 8 H V A Y e w 6 k b u g 3 N t 7 + p o 2 8 N X k + N B P 3 r T f / 1 l 0 D 3 6 u P H 3 L n g O X o A D E I G 3 4 A h 8 B i M w B hh c g B / g F / g d j I J F 4 I J v 6 9 L W z q b n K W h E 8 P 0 v 0 u x F j A = = < / l a t e x i t > y l a t e x i t s h a 1 _ b a s e 6 4 = " Q g F m 9 W z o M 0 r U S G I g r 7 c c O D V B 5 c X H E D b j d x b 5 9 p e 1 d G m h i 8 7 x 4 N + 9 L L / 4 u O g e / h u r e 9 t 8 A Q 8

Figure S4 :
Figure S4: Performance of benchmark learning schemes while training a 5−layer fully-

Figure S5 :
Figure S5: L1 improves feature selection in the meta-learning model: Performance of different penalization methods while training a 5−layer fully-connected classifier network on EMNIST digits[49] with online learning.Evolution of meta-parameters for the pool of learning rules defined in section 4.2 using (a) no penalization, (b) L2 penalized meta-loss (Eq.S.3).

Figure S6 :
FigureS6: F bio trains effectively under different initialization of the feedback: Accuracy of a 5-layer classifier network trained on MNIST dataset[27] to perform a 10-way classification task using Feedback Alignment (FA)[5] compared to the proposed F bio plasticity rule (bio) outlined in Eq. 7. The backward connections were initialized in both tests using the normal Xavier initialization method (Eq.S.5).

Figure S7 :
Figure S7:The performance gain obtained with the modified plasticity rules is statistically significant: The p-value of the one-sided Mann-Whitney test over 600 metaoptimization episodes, comparing samples from trials using the FA method to those using (a) F eHebb , (b) F Oja , (c) F bio , and (d) F pool plasticity rules.

Table 1 :
Effect of the Hebbian-like error learning rule F eHebb on the alignment of the modulating signals α for different layers: The leftmost column includes the parameters updated using F 0 with feedback alignment, and the next column indicates layers trained with F eHebb (Eq.8).Angles α represent the alignment between the modulatory signal e and the backpropagated counterpart e BP at each layer (in degrees)