Brain experiments imply adaptation mechanisms which outperform common AI learning algorithms

Attempting to imitate the brain’s functionalities, researchers have bridged between neuroscience and artificial intelligence for decades; however, experimental neuroscience has not directly advanced the field of machine learning (ML). Here, using neuronal cultures, we demonstrate that increased training frequency accelerates the neuronal adaptation processes. This mechanism was implemented on artificial neural networks, where a local learning step-size increases for coherent consecutive learning steps, and tested on a simple dataset of handwritten digits, MNIST. Based on our on-line learning results with a few handwriting examples, success rates for brain-inspired algorithms substantially outperform the commonly used ML algorithms. We speculate this emerging bridge from slow brain function to ML will promote ultrafast decision making under limited examples, which is the reality in many aspects of human activity, robotic control, and network optimization.


Introduction
Machine learning is based on Donald Hebb's pioneering work; seventy years ago, he suggested that learning occurs in the brain through synaptic (link) strength modifications (1).A synaptic strength modification typically lasts tens of minutes (2) while the clock speed of a neuron (node) ranges around one second (3).Although the brain is comparatively slow, its computational capabilities outperform typical state-of-the-art artificial intelligence algorithms.Following this speed/capability paradox, we experimentally derive accelerated learning mechanisms based on small datasets, where their utilization on gigahertz processors is expected to lead to ultrafast decision making.
Unlike modern computers, a well-defined global clock does not govern brain dynamics; instead, they are a function of relative event timing (e.g., stimulations and evoked spikes) (4).
According to neuronal computational, using decaying input summation via its ramified dendritic trees, each neuron sums the asynchronous incoming electrical signals and generates a short electrical pulse (spike) when its threshold is reached.For each neuron, synaptic strength is slowly modified based on the relative timing of inputs from other synapses; if a signal is induced from a synapse without generating a spike, its associated strength is modified based on the relative timing to adjacent spikes from other synapses on the same neuron (5).
Recently it was experimentally demonstrated that each neuron functions as a collection of independent threshold units (6).After signals arrive via one of the dendritic trees, each threshold unit is activated.Additionally, a new type of adaptive rule was experimentally observed based on dendritic signal arrival timing (7), which is similar to the slow adaptation mechanism currently attributed to synapses (links).This dendritic adaptation occurs on a faster timescale: it requires approximately five minutes, while synaptic modification requires tens of minutes or more.

Results
In this study, dendritic adaptation was experimentally examined at a higher stimulation frequency, 5 Hz, using the training pattern of previous experiments run at 0.5 to 1 Hz.We planted neuronal cultures on a multi-electrode-array with added synaptic blockers, which extracellularly stimulated a patched neuron via its dendrites (Fig. 1a and Materials and Methods).
The adaptation process consisted of a training set: 50 pairs of stimulations.After an abovethreshold intracellular stimulation, an extracellular stimulation that did not evoke a spike arrived with a predefined delay, typically 1 to 4 ms (Fig. 1b).We primarily take into account differing extra-and intra-spike waveforms (Fig. 1a, right), which presumably activated the neuron from two independent dendritic trees (7).
By comparing the amplitudes of intracellular responses and extracellular stimulation before and after the training procedure, we quantified the effect of the neuronal adaptation.To quantify the initial response, the extracellular stimulation amplitude was decreased until no reliable evoked spikes were observed (Fig. 1, c and d, left).
In the first type of experiment, one minute after the training terminated, we measured the enhanced responses and witnessed dendritic adaptation (Fig. 1c, right).When compared with the visible adaptation time for the 1 Hz training, that of the 5 Hz training was substantially faster.
Occasionally, the visible effect of adaptation was further enhanced after more time passed (Supplementary Fig. 1), suggesting by extrapolation that adaptation might occur in much less than one minute.Because after training, the initialization and compilation of subsequent experiments required a minimal time lag (one minute), the feasibility of such ultrafast adaptation was impossible to examine.
To overcome this limitation, we introduced a second type of experiment (Fig. 1d): shortly after the end of the training procedure, the neuron was extracellularly stimulated using two predefined amplitudes with unreliable responses.Without requiring any new experimental methods or compilations (Materials and Methods), this procedure pinpointed dendritic adaptation within only 10 seconds of the training termination (Fig. 1d, right).
With the increased training frequency, the adaptation process substantially accelerated (Fig. 2a), potentially implying a time-dependent decaying adaptation step-size: where the current adaptation step,   +1 , is equal to the previous one with a decaying weight, t stands for a discrete time step,  0 is a constant, 1/τ stands for the training frequency, and Δ is a constant representing the incremental effect of the current training step.This type of decay process occurs in many biological scenarios and represents, for instance, the decaying concentration of active material due to diffusion.As a generalization of Eq. ( 1), incoherent consecutive training steps are also allowed, decreasing the dendritic strength and resulting in a -Δ term in Eq. ( 1).
Using supervised on-line learning of realizable rules and binary classification (Fig. 2, b and c), we first examined the impact of the time-dependent adaptation steps (Eq. 1) on accelerating biological learning processes.The teacher provided the student asynchronous-input and binaryoutput relations (8), where both had the same architecture of the simplest classifier, the perceptron (9), and the output nodes were represented by a leaky integrate-and-fire neuron (10).
Two scenarios were examined: synaptic adaptation and dendritic adaptation (Fig. 2, b and c and Materials and Methods).Results clearly indicate that the generalization error,   , of the experimentally-inspired time-dependent η (Eq. 1) substantially outperformed the fixed η scenario (Fig. 2, b and c).This accelerated learning stems from the fact that weights in synaptic learning convergence to the extreme limits, vanishing or above-threshold weights (7).Hence, the learning step-sizes in coherent dynamics increased toward these extremes, accelerating the learning scenario (Fig. 2b).Similarly, in the dendritic case (Fig. 2c), weights oscillated and synchronized via repeatedly hitting the boundary values (7) .Hence, a faster decay of   also resulted from learning step acceleration (Fig. 2c).
Next, we examined the experimentally-inspired time-dependent learning step mechanism on the supervised learning of an unrealizable rule using the MNIST database (11) tested on a neural network.This database consists of a large number of examples of handwritten digits (Fig. 3a) and is commonly used as a prototypical problem for quantifying the generalization performance of machine learning algorithms for various image processing tasks.In this study we use a small subset of the MNIST database without any data extension methods (12)(13)(14).The commonly used trained networks consisted of 784 inputs representing the 28 x 28 pixels of a digit, one hidden layer (30 units in this study), and ten outputs representing the labels (Fig. 3a).The commonly used learning approach is the backpropagation strategy (15): where weight at time-step ,   , is modified with a step-size η towards the minus sign of the gradient of the cost function, C.An improved approach is the momentum strategy (5,16,17) and regularization of the weights (18,19): where the momentum, μ, and the regularization, α, are constants in the region [0,1] and  0 is a constant.We optimized the performance of the momentum strategy (Eq. 3) over (, ,  0 ) for a limited training dataset using the cross-entropy cost function (Materials and Methods) and compared its performance with the following two experimentally-inspired learning mechanisms consisting of time-dependent η.
In the first approach, acceleration, the time-dependent η, and the update rules for weight are given by these equations: where τ is the positive decaying factor,  1 and  1 are constants representing the amplitude and the gain between the input and the hidden layers, respectively, and  2 and  2 represent the same between the hidden and the output layers.It is evident that coherent consecutive gradients of weight, i.e., with the same sign, increased its conjugate η.Note, in the limit  → ∞, the equation for η was simplified,  +1 =   ⋅ exp (−) +  ⋅ ( W t ).The second approach, advanced acceleration, combines the two previous approaches (Eq.3,4): For the two experimentally-inspired accelerated-approaches (Eq.4, 5), the changes in weights and  depend on the higher moments of gradients, in contrast with the linear dependence of the momentum approach (Eq.3).Given a limited subset of the dataset examples the performance of the accelerated approaches was maximized over six (Eq.4) and seven (Eq. 5) parameters (Materials and Methods).
The on-line training set consisted of 300 randomly chosen examples: each label appeared 30 times within a random order.After 300 learning steps, the accelerated approaches outperformed the momentum method by more than 25 %, and the test accuracy increased from about 0.43 to 0.54, respectively (Fig. 3b).Note, the acceleration approach (Eq.4) showed similar performance to the advanced acceleration approach (Eq.5).These improved results were found to be robust also for on-line training based on 6000 examples (30 batches of size 200) (Fig. 3c) and 1200 examples (24 batches of size 50) (Fig. 3d).For both cases (Fig. 3, c and d), the advanced acceleration approach (Eq.5) offered the best performance.We repeated the training using the same 60 examples 5 times (60 x 5 = 300 training in total); compared with using 300 examples for training once, the 60 x 5 approach yielded better performance: using the advanced acceleration approach test accuracy increased from 0.54 to 0.57 (Fig. 3e and Supplementary Fig. 2 ).For a given number of network updates, results demonstrate that smaller example sets yield more information.This result stems from the random training order of a randomly selected small dataset, e.g., 60 or 300 examples, consisting of a balanced appearance for each label.Around 3 ), and for one with 30 subsets of 10 examples where each label appears once, the test accuracy increased further to 0.67 and to 0.7 for a fixed label order (Supplementary Fig. 4).Results indicate that in order to maximize the test accuracy for on-line scenarios and especially for small datasets, the balanced set of examples and their balanced temporal training order are important ingredients.

Conclusions
Based on increased η with coherent consecutive gradients, the brain-inspired acceleratedlearning mechanism outperforms existing common ML strategies for small sets of training examples (20).Consistent results occur across various cost functions, e.g., square cost-function, however, with a relatively diminished performance (Fig. 3f).Because the performance maximization for a given dataset depends on the selected acceleration approach (Fig. 3, b and c), adapting the learning approach during the training process may improve performance.
Nevertheless, in addition to possible advanced nonlinear functions for updating η, given the number of network updates, the ultimate scheduling of acceleration approaches and the ordering of trained examples to maximize the performance deserves further research.The presented bridge from experimental neuroscience to ML is expected to further advance decision making using limited databases, which is the reality in many aspects of human activity (21), robotic control (22,23), and network optimization (24,25).

In-Vitro Experiments:
Animals: All procedures were in accordance with the National Institutes of Health Guide for the Care and Use of Laboratory Animals and Bar-Ilan University Guidelines for the Use and Care of Laboratory Animals in Research and were approved and supervised by the Bar-Ilan University Animal Care and Use Committee.

Culture preparation:
Cortical neurons were obtained from newborn rats (Sprague-Dawley) within 48 hours after birth using mechanical and enzymatic procedures.The cortical tissue was digested enzymatically with 0.05% trypsin solution in phosphate-buffered saline (Dulbecco's PBS) free of calcium and magnesium, and supplemented with 20 mM glucose, at 37 • C. Enzyme treatment was terminated using heat-inactivated horse serum, and cells were then mechanically dissociated mostly by trituration.The neurons were plated directly onto substrate-integrated multi-electrode arrays (MEAs) and allowed to develop functionally and structurally mature networks over a time period of 2-4 weeks in-vitro, prior to the experiments.The number of plated neurons in a typical network was in the order of 1,300,000, covering an area of about ~5 cm 2 .The preparations were bathed in minimal essential medium (MEM-Earle, Earle's Salt Base without L-Glutamine) supplemented with heat-inactivated horse serum (5%), B27 supplement (2%), glutamine (0.5 mM), glucose (20 mM), and gentamicin (10 g/ml), and maintained in an atmosphere of 37 • C, 5% CO2 and 95% air in an incubator.

Synaptic blockers:
Experiments were conducted on cultured cortical neurons that were functionally isolated from their network by a pharmacological block of glutamatergic and GABAergic synapses.Intracellular threshold estimation: In order to find a threshold for the intracellular stimulation, several stimulations at 1 Hz were given.The duration of the stimulations was set to 3 ms, and the intensity typically ranged from An extracellular electrode was selected and both intra-and extra-cellular thresholds were estimated.The neuronal response latency, NRL, and its stability for the extracellular stimulations were estimated.The NRL was used to accurately adjust the time-lag between intracellular evoked spikes or EPSPs originated from consecutive intra-and extra-cellular stimulations to be in the range of 1-4 ms.We note that an above-threshold extracellular stimulation given shortly, e.g. 2 ms, after an above-threshold intracellular stimulation, does not result in an evoked spike, and can be used to enhance adaptation.The thresholds and NRL were rechecked at the end of the experiment, in order to ensure their stability.

Statistical analysis:
The demonstrated results were repeated tens of times on many cultures.

Data analysis:
Analyses were performed in a Matlab environment (MathWorks, Natwick, MA, USA).The recorded data from the MEA (voltage) was filtered by convolution with a Gaussian that had a standard deviation (STD) of 0.1 ms.Evoked spikes were detected by threshold crossing, typically -40 mV, using a detection window of [0.5, 30] ms following the beginning of an extracellular stimulation.In order to calculate the neuronal response latency, defined as the timelag between the stimulation and its corresponding evoked spike, the evoked spikes' times were extracted from the recorded voltage.

Simulations of biological neural networks:
The perceptron: The input layer consists of N input units and an output unit functioning as a leaky integrated and fire (LIF) neuron (see Output production).The input units are connected to the output unit via N synaptic weights,   (Fig. 2b), or via K=N/5 dendritic strengths,   (Fig. 2c).In the synaptic scenario, {  } are the tunable parameters (Fig. 2b), whereas for the dendritic scenario, {  } are the tunable parameters while {  } are time-independent (Fig. 2c).
The supervised learning algorithm: The scenario of supervised learning by a biological perceptron is examined using a teacher and a student.The mission of the student is to imitate the responses, i.e. the outputs, of the teacher, where both have the same architecture, hence it is a realizable rule.For each asynchronous input the teacher produces an output.The timings and the amplitudes of the inputs as well as the resulting teacher's firing timings are provided to the student.Those input/output relations constitute the entire information provided to the student for each asynchronous input.The algorithm is composed of 3 parts: Output production, weights adaptation and learning.
Output production: An identical asynchronous input, example, is given to the teacher and the student, each produces its outputs according to their weights and decaying input summation, O T and O S , respectively (see Output production -Leaky integrate and fire neuron).
Weight adaptation: For each input unit the teacher preforms weights adaptation next to its output production, following its input/output (see Adaptation).The student preforms the same adaptation as the teacher, unless otherwise stated (see Student's adaptation).
Learning: The student preforms learning steps, unless otherwise stated, on weights with conflicting outputs with the teacher, i.e.Om T  Om S for the m th input unit.

Inputs generation:
Each input was composed of N/2 randomly stimulated input units.For each stimulated unit a random delay and a stimulation amplitude were chosen from given distributions.The delays were randomly chosen from a uniform distribution with a resolution of 1 (0.001) ms, such that the average time-lag between two consecutive stimulations was 2 (10) ms for the synaptic (dendritic) scenario.Stimulation amplitudes were randomly chosen from a uniform distribution in the range [0.8, 1.2].Note that the reported results are qualitatively robust to the scenario where all the non-zero amplitudes equal 1.In the dendritic scenario, the five   connected to the same dendrite were stimulated sequentially in a random order and with an average time-lag of 10 ms between consecutive stimulations.
Output production -Leaky integrate and fire neuron: In the synaptic adaptation scenario, the voltage of the output unit is described by the leaky integrate and fire (LIF) model where v(t) is the scaled voltage,  = 20 ms is the membrane time constant and vst = 0 stands for the scaled stable (resting) membrane potential.  and m stand for the m th weight and delay, respectively.A spike occurs when the voltage crosses the threshold, v(t) ≥ 1 and at that time the output unit produces an output of 1, otherwise the output is 0. After a spike occurs, the voltage is set to vst.For simplicity, we scaled the equation such that vth = 1, vst = 0, consequently, v ≥ 1 is above threshold and v < 1 is below threshold.Nevertheless, results remain the same for both the scaled and unscaled equations, e.g.vst = -70 mV and vth = -54 mV.The initial voltage was set to v(t = 0) = 0.
In the dendritic adaptation scenario, the voltage of each dendritic terminal is described by where vi(t) and   stand for the voltage and the strength of the i th dendrite, respectively.The rest of the parameters are identical to the synaptic adaptation scenario.

Adaptation:
The adaptation for the synaptic scenario is done according to where the discrete time t measures the number of asynchronous inputs and Δt is the time-lag between a sub-threshold stimulation to   (stimulation that didn't evoke spike, output 0) and an evoked spike, within one asynchronous input.Similarly, the dendritic adaptation is given by where Δt now is the time-lag between a sub-threshold stimulation at   and an evoked spike from another dendrite.For both scenarios ) ⋅ (Δ) (5) representing the strengthening/weakening of a weight conditioned to a prior/later evoked spike at a time delay Δt, respectively, where a cutoff time window of 50 ms is enforced.For simplicity, a step function, was used for all time delay Δt, unless otherwise stated.However, all results are robust to adaptation in the form of either exponential decay or a step function.

Student's adaptation:
In order to perform the same adaptation as the teacher, the required information is the teacher's temporal input/output relations.Note that although the student performs the same adaptation steps as the teacher, it does not necessarily ensure tracking of the parameters of the teacher, since the changes in the weights are relative to the current values of the weights of the student. Learning: Learning steps were performed on the student's weights with conflicting output with the teacher.
This learning rule is based on a gradient descent dynamics, which minimizes a cost function that measures the deviation of the student voltage,   ,form the teacher voltage,   , in case of an error (unmatched spike timings between the teacher and the student).A spike is considered as v=1.The change in the weights   is proportional to the negative derivative of the cost function relative to the weight.
denotes the stimulation amplitude of the m th input unit.For simplicity, the weighted exponential prefactor is neglected, where the qualitative results remain the same with/without the weighted exponential prefactor.Consequently, the learning step for the synaptic scenario is similar to the traditional perceptron learning algorithm   +1 =    + η ⋅ (   −    ) ⋅   (7) and similarly for the dendritic scenario where η denotes the learning step and Om T and Om S are the outputs of the teacher and the student at the m th input unit in the i th dendrite, respectively, and   denotes the stimulation amplitude of the m th input unit.
Calculating the generalization error: The estimation consisted of up to 20,000 inputs presented to the teacher and the student, where each input generates about 30/200 evoked spikes for the synaptic/dendritic scenario.The generalization error is defined as  6) with A = 0.05, and learning was following equation ( 7) with η= 1/1000.
was bounded from above by 1.5 and from below by 10 -4 .The fixed learning rate was compared to the accelerating method using adaptive learning step: using  = 0.1, B = 0.01 and  was initiated as 1/1000.
Panel C: {  } were chosen from a uniform distribution in the range [0.1, 0.9] and then were normalized to a mean equals to 0.5.{  } were chosen from a uniform distribution in the range [0.5, 1.5].Stimulations with low amplitudes (0.01) were given to the N/2 unstimulated input units, resulting in non-frozen   .The adaptation follows equation ( 5) with A=0.05 and the learning follows equation ( 7) with η =1/1000.  was bounded from below by 0.1 and from above by 3. The fixed learning rate was compared to the accelerating method using adaptive learning step: using  = 0.1, B = 0.01 and  was initiated as 1/1000.

Simulations of neural network:
Architecture: The feedforward neural network contains 784 input units, 30 hidden units and 10 output units in a fully connected architecture.Each unit in the hidden and the output layers has an additional input from a bias unit.Weights from the input layer to the hidden layer, W1, and from the hidden layer to the output layer, W2, were randomly chosen from a Gaussian distribution with a zero average and standard deviation equals 1.All weights were normalized such that all input weights to each hidden unit have an average equals 0 and a STD equals 1.The initial value of the bias of each weights was set to 1.We trained the network on the handwritten digits dataset, MNIST, using gradient descent.The inputs, examples from the train dataset, contain 784 pixel values in the range [0, 255].We normalized the inputs such that the average and the STD are equal to 0 and 1, respectively.

Forward propagation:
The output of a single unit in the hidden layer,   1 , was calculated as: where   1 is the weight from the i th input to the j th hidden unit,   is the i th input and   1 is the bias to the j th hidden unit.
For the output layer, the output of a single unit,   2 was calculated as: where   2 is the weight from the i th hidden unit to the j th output unit,   is the i th input and   2 is the bias to the j th output unit.

Back propagation:
We used two different cost functions; the first was the cross entropy: and the second was the mean square error (MSE): where y are the desired labels and  stands for the current 10 output units of the output layer.
The summation is over all training examples, N.
The backpropagation method computes the gradient for each weight with respect to the chosen cost function.The weights and biases were updated according to 3 different methods: (1) Momentum: The weights update: where t is the discrete time-step W are the weights,  is a regularization constant,  is the fixed learning rate, and ∇ W t  is the gradient of the cost function for each weight at time t.
V was initialized as: − 0 ⋅ ∇ W   , where     is the first computed gradient and the biases update: where ∇   is the gradient of the cost function of each bias with respect to its weight, b, and Vb was initialized as: − ⋅ ∇ b   , where     is the first computed bias gradient.
(2) Acceleration: The weights update: where  is defined for each weight,  1 and  1 are constants representing the amplitude and the gain between the input and the hidden layers, respectively, and  2 and  2 represent the same between the hidden and the output layers. was initialized as:  (3) Advanced acceleration: The weights update: and the biases update: Testing the network: The network classification accuracy was tested on the MNIST dataset for testing, containing 10,000 inputs.The test inputs were also normalized to have an average of each equals to 0 and a STD equals to 1.

Optimization:
For each update method the parameters were chosen to maximize the test accuracy.For optimization we first used a grid of the adjustable parameters followed by a fine tuning with higher resolution for each parameter.The optimization was performed over 3 parameters for the momentum method (,  0 , ), 6 parameters for the acceleration method ( 1 ,  2 ,  1 ,  2 , , ) and for 7 parameters for the advanced acceleration method ( 1 ,  2 ,  1 ,  2 , , , ).
equalized trained label appearances, there are more temporal fluctuations for a dataset involving 300 examples with 30 appearances for each label than for one involving 5 sets of 60 examples with 6 appearances for each label.Indeed, for a training set of 300 distinct examples composed of 5 subsets of 60 balanced examples, a test accuracy of 0.57 was achieved (Supplementary Fig. For each culture at least 20 μl of a cocktail of synaptic blockers were used, consisting of 10 μM CNQX (6cyano-7-nitroquinoxaline-2,3-dione), 80 μM APV (DL-2-amino-5-phosphonovaleric acid) and 5 μΜ Bicuculline methiodide.After this procedure no spontaneous activity was observed both in the MEA and the patch clamp recordings.In addition, repeated extracellular stimulations did not provoke the slightest cascades of neuronal responses (recorded extra-or intra-cellular).Stimulation and recording -MEA:An array of 60 Ti/Au/TiN extracellular electrodes, 30 μm in diameter, and typically spaced 200 μm from each other (Multi-Channel Systems, Reutlingen, Germany) was used.The insulation layer (silicon nitride) was pre-treated with polyethyleneimine (0.01% in 0.1 M Borate buffer solution).A commercial setup (MEA2100-60-headstage, MEA2100-interface board, MCS, Reutlingen, Germany) for recording and analyzing data from 60-electrode MEAs was used, with integrated data acquisition from 60 MEA electrodes and 4 additional analog channels, integrated filter amplifier and 3-channel current or voltage stimulus generator.Each channel was sampled at a frequency of 50k samples/s, thus the recorded action potentials and the changes in the neuronal response latency were measured at a resolution of 20 μs.Mono-phasic square voltage pulses were used, in the range of [−900, −100] mV and [100, 2000] μs.Stimulation and recording -Patch Clamp: The Electrophysiological recordings were performed in whole cell configuration utilizing a Multiclamp 700B patch clamp amplifier (Molecular Devices, Foster City, CA).The cells were constantly perfused with the slow flow of extracellular solution consisting of (mM): NaCl 140, KCl 3, CaCl2 2, MgCl2 1, HEPES 10 (Sigma-Aldrich Corp. Rehovot, Israel), supplemented with 2 mg/ml glucose (Sigma-Aldrich Corp. Rehovot, Israel), pH 7.3, osmolarity adjusted to 300-305 mOsm.The patch pipettes had resistances of 3-5 MOhm after filling with a solution containing (in mM): KCl 135, HEPES 10, glucose 5, MgATP 2, GTP 0.5 (Sigma-Aldrich Corp. Rehovot, Israel), pH 7.3, osmolarity adjusted to 285-290 mOsm.After obtaining the giga-ohm seal, the membrane was ruptured and the cells were subjected to fast current clamp by injecting an appropriate amount of current in order to adjust the membrane potential to about -70 mV.The experiments were taken into account only when this adjustment current was stable during the measurements.The changes in the neuronal membrane potential were acquired through a Digidata 1550 analog/digital converter using pClamp 10 electrophysiological software (Molecular Devices, Foster City, CA).The acquisition started upon receiving the TTL trigger from MEA setup.The signals were filtered at 10 kHz and digitized at 50 kHz.The cultures mainly consisted of pyramidal cells as a result of mainly enzymatic and mechanical dissociation.For patch clamp recordings, pyramidal neurons were intentionally selected based on their morphological properties.MEA and Patch Clamp synchronization: The experimental setup combines multi-electrode array, MEA 2100, and patch clamp.The multielectrode array is controlled by the MEA interface board and a computer.The Patch clamp subsystem consists of several microstar manipulators, an upright microscope (Slicescope-pro-6000, Sceintifica), and a camera.Stimulations and recordings are implemented using multiclamp 700B and Digidata 1550A and are controlled by a second computer.The recorded MEA/patch data is saved on the computers respectively.The time of the MEA system is controlled by a clock placed in the MEA interface board and the time of the patch subsystem is controlled by a clock placed in the Digidata 1550A.The relative timings are controlled by triggers sent from the MEA interface board to the Digidata using leader-laggard configuration.Extracellular electrode selection: For the extracellular stimulations in the performed experiments an extracellular electrode out of the 60 electrodes was chosen by the following procedure.While recording intracellularly, all 60 extracellular electrodes were stimulated serially at 1-2 Hz and above-threshold, where each electrode is stimulated several times.The electrodes that evoked well-isolated, well-formed spikes were used in the experiments.Extracellular threshold estimation: After choosing an extracellular electrode, its threshold for stimulation was estimated.Stimulations at 0.5-1 Hz with duration in the range [200, 2000] s and different values of voltage amplitudes were given, until a response failure occurred.The threshold was defined between the stimulation voltage that resulted in a response failure to the closest value of stimulation voltage that resulted in an evoked spike.For patched neurons that were significantly close to an extracellular electrode (several micrometers) shorter stimulation durations were used in order to avoid the stimulation artifact in the voltage recordings.The stability of the extracellular threshold was confirmed during the experiments.After the training of coupled intra-and extrastimulations, the extracellular threshold was re-estimated in two methods; first, it was estimated 1 minute after training, second, it was estimated ~10 seconds after training with 2 predefined stimulation amplitudes.(Note that the MEA has three independent stimulators).

Figure 2 .
Figure 2. Acceleration of supervised realizable learning rules based on the biologically inspired mechanism.(a) The implication of the biological mechanism (Fig. 1) indicating that training scheduling with low/high frequency (a1/a2) results in a low/high learning rate, η.(b) Synaptic learning.(b1) A perceptron with 1000 asynchronous inputs and a leaky integrate-and-fire output

Figure 3 .
Figure 3. Biological-inspired accelerating learning for the MNIST database in comparison with a common existing learning method.(a) The trained network, using backpropagation and cross- 1/2 ⋅ ℎ( 1/2 ⋅     ), where     is the first computed gradient and the biases