Introduction

Energy constraints are believed to play a vital role in shaping the evolution of the brain1,2. Although the brain performs amazingly well in analog signal processing, it is interesting to note that it is not evolved to give maximum information transfer efficiency; instead, it compromises information transfer efficiency in a trade-off with energy efficiency3.

Although a majority of computational neuroscience models focus on modeling neural signaling, in recent years, there is a growing body of literature that addresses issues related to energy dynamics that underlies neural signaling, conceptualized as a separate field known as neuroenergetics4. Neural signaling, which is associated with fluctuations in the membrane voltage, is supported by the current flowing through the ion channels. However, alterations in concentrations of the ionic species (Na+, K+, Ca2+) on either side of the neural membrane are corrected by the action of various pumps that utilize energy in the form of Adenosine TriPhosphate (ATP)5. This ATP is replenished by oxidative phosphorylation and glycolysis using the oxygen and glucose received from the blood vessels. Proximal blood vessels dilate as a consequence of neural activity, which ensures adequate blood flow to fuel the neural activity6,7,8.

The interaction between neurons and cerebral vessels is called neurovascular coupling. Many molecular pathways facilitate effective coupling between neurons and vessels. One important mechanism is the direct release of nitric oxide by neurons9,10. Nitric oxide is a vasoactive substance that diffuses to nearby vessels, causing their dilation. Even though the forward influence from the neuron to a vessel is widely studied, particularly in the context of understanding functional neuroimaging11, the retrograde influence from the vessels to the neurons, responsible for converting oxygen and glucose to ATP is still not completely understood. One widely accepted theory is that the glucose transported to the neurons are picked up directly by neurons and used as a substrate to carry out oxidative phosphorylation, which is the major contributor of ATP, the energy currency used to fuel the action of ionic pumps12,13,14. Another exciting study still debated15,16,17,18,19 is the astrocyte-neuron lactate shuttle theory20, which posits an important role to astrocyte in mediating neurovascular coupling. Astrocytes convert the glucose released from the cerebral vessels to lactate and provide it to the neurons, where it is converted into pyruvate and ultimately into ATP.

Though there have been efforts to model the elaborate bidirectional signaling underlying neural energetics at the single neuron level21,22,23,24,25, there is also an obvious need to study energetics at a network level. There is a growing awareness that impaired neuroenergetics is involved in several important brain disorders26,27,28,29,30. Abnormally high energy consumption levels have been linked to the idiopathic loss of cells in Substantia nigra in Parkinson’s disease31,32,33. Mitochondrial dysfunction in the neurons of CA1 and CA3 in the hippocampus is linked to cellular pathology underlying Alzheimer’s disease34,35. There have been bold proposals that metabolic impairments are the common underlying cause behind all forms of neurodegeneration36. Disruptions in neurovascular coupling evidently accompany the brain disorders like stroke and vascular dementia37,38,39,40,41. It would be beneficial to develop computational models that provide insights into the genesis of pathological neuroenergetics in the aforementioned diseases. But if the models of neuroenergetics at the single-unit level are so complicated, it would not be a pragmatic enterprise to extend them as they are to a network level and study neuroenergetics in an extremely detailed fashion at a network level.

Therefore, to study neuroenergetics at the network level, the description at the single-unit level must be appropriately simplified. Description of neural energetics in terms of a large number of molecular metabolic substrates (e.g., ATP, pyruvate, lactate, glucose) makes the model conceptually opaque. The study of energetics in engineering is greatly facilitated by the unified view of energy that has been worked out in physics over the centuries. Behind the innumerable forms of energy found in nature (mechanical, electrical, chemical, thermal, etc.), energy is one, denoted by common units (Joules). However, such an elegant, intellectually satisfying, and unified view of energy transformations in neuroenergetics, though desirable, is a far-off goal considering the current state of neuronal modeling.

The development of simplified neural network models in the ’80 s had led to the connectionist revolution. Although more complex modeling approaches were available at that time—like the Hodgkin-Huxley model, the cable equation, and dendritic processing—networks constructed using simple sigmoidal neurons have succeeded in providing insights into a wide variety of phenomena in psychology and cognitive science in addition to artificial engineering domain42,43.

Likewise, one may envisage that the development of simplified neuroenergetic models at the network level may give valuable insights into possible optimal energy utilization strategies of the brain. There have been efforts to construct simplified neuro-energetic models in recent times at the single neuron level. Some of these efforts assume that the effect of energy supply to a neuron can be expressed in abstract terms as regulating the neuron’s threshold of activation: higher (lower) energy leads to a smaller (higher) threshold44,45.

By adopting such a simplified depiction of the dependence of neural function on energy, we construct a novel class of neural networks known as Artificial Neuro-Vascular Networks (ANVNs). In these networks, a separate vascular tree caters to the energy requirements of a neural network. Error gradient, which is usually used to update the weights of the neural network, is propagated, from the neural network, up the vascular tree to update the “strengths” of various branches of the vascular tree. The efficiency of energy utilization of the network, evaluated in terms of energy consumed to achieve a given level of output performance, is studied.

The outline of the paper is as follows. In the first half of the work, we bring out the importance of training the energy network of an energy-dependent network, in this case, the highly energy-dependent biological neural network. The energy network, in this case, is the vascular network that provides adequate nourishment to the brain tissues. We connect a neural network with a trainable vascular network to form an ANVN. The ANVN is then studied under three training regimes of the vascular tree (i) Leaving the vascular tree untrained, (ii) Training the vascular tree sequentially after first training the neural network, and (iii) Simultaneously training both neural and vascular networks. The improved performance in terms of accuracy and energy efficiency during simultaneous training (regime iii) establishes the need to train the vascular network. The energy provided to the network is variable, and the network takes up whatever energy is given to it.

In the second half of the work, we modify the ANVN to ANVN_R (ANVN with reservoir) such that it takes only the required energy and rejects the excess energy. This modification brings out the notion of an optimum network size where the energy efficiency can be maximized. The importance of the network size in maintaining its robustness to the initial availability of energy is also studied in the same section. We also look at how transfer learning would manifest in the vascular weight modification and observes that the weights representing the microvasculature undergo the maximum modification.

Finally, we study whether the energy consumed by a neuron reflects its contribution to the network performance. The model shows that such a correlation between energy consumption and the contribution of the neuron to the network exists only when the network size is small. We also explore how bringing an explicit energy constraint in the cost function, as a form of regularization, would affect the network behavior (Please see the supplementary material Sect. 4).

So far we have motivated our work in terms of energetic cost of information processing. However, it is possible to approach the problem more formally using concepts from statistical thermodynamics invoking Jarzynski equality46, in the context of the free energy principle47. In statistical thermodynamics, the link between informational work and thermodynamic work can be understood by maximizing the marginal likelihood of any (e.g. neural network) model of how data are generated. Specifically, in the context of the free energy principle, it translates into maximizing a variational bound on marginal likelihood, also known as an evidence lower bound48. Drawing analogies with regularization theory, interestingly, free energy can be broken down into accuracy and complexity49. The complexity term is crucial here and corresponds to the Kullback–Leibler divergence between a posterior and prior over the unknown model parameters. This scores the degree of Bayesian belief updating or information gain associated with parameter updates. In turn, this corresponds to the computational cost that, via the Jarzynski equality, can be expressed in terms of energy (i.e., Joules).

As in regularization theory, according to free energy principle, the network or model is trained by simultaneously maximising classification accuracy while minimizing the network complexity. In the present model, we control the network complexity by constraining the energy which in turn restrains the growth of the weights. Specifically, the complexity cost of a weight depends on its contribution to the output error or classification accuracy. Therefore, in the proposed model we link the energy demand of a weight to the corresponding error gradients generated by the backpropagation algorithm.

In summary, when the available energy is greater than a threshold, the biases are set to low values and each neuron is more likely to respond to its inputs. Conversely, if energy is limited, the bias increases linearly; thereby, reducing the propensity of the neuron to fire, for the same level of inputs.

Materials and methods

The Artificial Neuro Vascular Network (ANVN) is designed with the intention to explore the characteristics of an energy-dependent neural network. In ANVN, we establish a bidirectional coupling between a simple feedforward neural network like the Multi-Layer Perceptron (MLP) and a vascular network model. The bidirectionality, in this context, refers to the following flows: the forward flow of energy from the vascular network to the neural network and the feedback flow of energy demand error from the neural network to the vascular network. Figure 1 shows the schematic of the model. In Fig. 1, ANVN with reservoir (ANVN_R) is shown. For the initial part of the study, we consider ANVN, which has only the root node and no reservoir. The source and reservoir in Fig. 1 are the part of the modified ANVN with reservoir (ANVN_R), which we will use in the later part of the study. The hidden layer of the neural network is shown by the yellow box, and the yellow circles indicate the neurons. As shown in the figure, each neuron receives energy from the leaf nodes of the vascular network (green circles), and the energy available at each leaf node depends on the weights of the vascular tree indicated by \({U}_{y,x}\) connecting nodes x and y. During the forward pass in MLP, the bias of the hidden neurons is decided by the amount of energy available at the neuron. During backpropagation, the gradient in biases of hidden layer neurons is converted to gradients in energy and is propagated along the vascular tree from the leaf node to the root node (till source node for ANVN_R). The weights of the vascular tree are updated depending on this energy gradient. The simulations were carried out using MATLAB R2020a. The pseudocodes of the simulations are provided in the supplementary material Sect. 5.

Figure.1
figure 1

The schematic representation of ANVN_R.

The vascular network

The vascular network has a tree structure that begins at a root node \((R)\) and branches uniformly depending on the predefined branching factor (\(\le k\) branches at every junction) until it reaches the leaf nodes. The leaf nodes (F) supply energy to the hidden neurons of the MLP in a one-to-one fashion, and therefore equal in number to the number of hidden neurons (\(N\)). Hence the number of leaf nodes is initialized to be equal to the number of hidden neurons (\(F=N\)). The vascular tree is defined by fixing \(k\,\mathrm{a}\mathrm{n}\mathrm{d}\,N\) apriori. The total number of levels in the tree (\(L\)) is calculated as,

$${\mathbf{\it L}} = 1 + \left\lceil {\frac{{{\mathbf{\it log}}({\mathbf{\it N}})}}{{{\mathbf{\it log}}({\mathbf{\it k}})}}} \right\rceil$$
(1)

The total number of nodes (\({T}_{n}\)) in the tree is calculated as follows,

$${\mathbf{\it T}}_{{\mathbf{\it n}}} = \sum\limits_{{{\mathbf{\it l}} = 0}}^{{{\mathbf{\it L}} - 1}} {\left\lceil {\frac{{\mathbf{\it N}}}{{{\mathbf{\it k}}^{{\mathbf{\it l}}} }}} \right\rceil }$$
(2)

In case the number of neurons \(N\), and consequently the number of leaf nodes (F) is not a power of branching factor (\(k)\), considering the level of the root node (R) as \(l=1\), each level gives rise to \(c=\lceil\frac{N}{{k}^{L-(l+1)}}\rceil\) children in such a way that first \(\lfloor\frac{c}{k}\rfloor\) nodes have \(k\) branches, and the remaining nodes will have \(<k\) branches \(\left(\lceil\frac{N}{{k}^{L-(l+1)}}\rceil-k*\lfloor\frac{c}{k}\rfloor\right)\). The tree structure is represented using the adjacency matrix (A), which is a two-dimensional matrix of size \({T}_{n}\times {T}_{n}\). An entry \({A}_{yx}\) of the matrix is unity if there exists an edge from node \(x\) to node \(y\).

The input energy source is connected to the root node. Energy from the root \(({E}_{s})\) flows down the tree from the root node to the leaf nodes and from there to the hidden neurons. Each parent node \((x)\) is connected to its child node \((y)\) by a weighted connection \({U}_{yx}^{E}\). This weight defines the amount of energy transferred from the parent node to the child node. Since a given parent node can have up to \(k\) branches, each weight connection is absolute normalized with respect to its sibling branches (\(\le k\)) to ensure the conservation of energy.

$${\boldsymbol{U}}_{{{\boldsymbol{yx}}}}^{{\boldsymbol{E}}} : = \frac{{{\mkern 1mu} {\mkern 1mu} {\boldsymbol{U}}_{{{\boldsymbol{yx}}}}^{{\boldsymbol{E}}} }}{{\sum\nolimits_{{{\boldsymbol{i}} = 1}}^{{\boldsymbol{c}}} {\left| {{\boldsymbol{U}}_{{{\boldsymbol{cx}}}}^{{\boldsymbol{E}}} } \right|} {\mkern 1mu} {\mkern 1mu} }},{\mkern 1mu} {\mkern 1mu}\quad {\boldsymbol{c}} \le {\boldsymbol{k}}$$
(3)

The vascular weights then represent the fraction of energy that flows through a branch compared to its sibling branches. The vascular weight matrix (\({U}^{E}\)) of size \({T}_{n}\times{T}_{n}\) is obtained by replacing the unit values of the adjacency matrix (A) with the weight of that edge. The energy of all parent nodes of level \(l\) is projected on the weight matrix to obtain the energy of the children nodes, which constitute the nodes of level \(l\)+1.The energy vector (\({\boldsymbol{E}}_{\boldsymbol{L}}\)) is of size \({T}_{n}\times 1\) and represents the energy distribution across all the nodes of the tree. \({\boldsymbol{E}}_{\boldsymbol{L}}\) has to be calculated recursively by updating \({\boldsymbol{E}}_{\boldsymbol{x}}\) at each level as shown in Eq. (4), starting from the level of the root node to the level of terminal leaf nodes in order to find the energy distribution at all the nodes. \({\boldsymbol{E}}_{\boldsymbol{x}}\) is initialized such that the energy at the root node is equal to the available energy \({E}_{s}\). This initialized vector which is used as \({(\boldsymbol{E}}_{1})\) has nonzero value only at the first index, which represents the root node.

$${\boldsymbol{E}}_{{\boldsymbol{x}}} = \sum\limits_{{{\boldsymbol{l}} = 2}}^{{\boldsymbol{x}}} {{\boldsymbol{U}}^{{\boldsymbol{E}}} } {\boldsymbol{E}}_{{{\boldsymbol{l}} - 1}} ,{\mkern 1mu} {\mkern 1mu} {\mkern 1mu} {\mkern 1mu} {\mkern 1mu} {\mkern 1mu} {\mkern 1mu} {\mkern 1mu} {\mkern 1mu} {\mkern 1mu} 2 \le {\boldsymbol{x}} \le {\boldsymbol{L}}$$
(4)
$${\boldsymbol{E}}_{1}(\boldsymbol{i})=\left\{\begin{array}{c}{\boldsymbol{E}}_{\boldsymbol{s}}\\ 0\end{array}\right.\genfrac{}{}{0pt}{}{\boldsymbol{i}=1}{\boldsymbol{o}\boldsymbol{t}\boldsymbol{h}\boldsymbol{e}\boldsymbol{r}\boldsymbol{w}\boldsymbol{i}\boldsymbol{s}\boldsymbol{e}}\,\,,\,\,\,\,\,\,1\le \boldsymbol{i}\le {\boldsymbol{T}}_{\boldsymbol{n}}$$
(5)

The vascular weights are initially randomly distributed, resulting in a random distribution of energies at leaf nodes. The energy available at each leaf node is used to calculate that hidden neuron’s bias, as described in Eq. (7).

The neural network

The neural network used in this model is an MLP with a single hidden layer. For the first half of the study, the number of neurons in the hidden layer is fixed as N = 512, and the performance of ANVN is studied as input energy is varied. A weighted connection \({W}_{jk}^{f}\) connects neuron ‘k’ of input layer to neuron ‘j’ of the hidden layer and \({W}_{ij}^{s}\) connects the neuron ‘j’ of the hidden layer to the output neuron, ‘i’. The bias of the hidden layer neurons and output neurons are represented by \({b}_{j}^{f}\) and \({b}_{i}^{s}\) respectively.

The weights connecting the input layer and the hidden layer (\({W}^{f}\)) are absolute normalized.

$${\boldsymbol{W}}_{\boldsymbol{j}\boldsymbol{k}}^{\boldsymbol{f}}:=\frac{{\boldsymbol{W}}_{\boldsymbol{j}\boldsymbol{k}}^{\boldsymbol{f}}}{\sum _{\boldsymbol{k}}\left\vert{\boldsymbol{W}}_{\boldsymbol{j}\boldsymbol{k}}^{\boldsymbol{f}}\right\vert}$$
(6)

The bias of the neurons in the hidden layer depends on the energy released to it from the leaf node with which it is associated. The bias-energy relationship (Fig. 2) is defined by Eq. (7). The higher the energy, the lower the bias, and hence higher the probability of firing for the neuron.

Figure 2
figure 2

available at the associated leaf node.

The relation between the bias of a neuron and the energy

$${\boldsymbol{b}}_{\boldsymbol{j}}^{\boldsymbol{f}}=\left\{\begin{array}{c}f({\boldsymbol{E}}_{\boldsymbol{j}})=1-{\boldsymbol{E}}_{\boldsymbol{j}},\,\,{0\le \boldsymbol{E}}_{\boldsymbol{j}}\le 2\\ -1,{\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\boldsymbol{E}}_{\boldsymbol{j}}>2\end{array}\right.$$
(7)

Given that the input vector is \(x\), the net input to the hidden layer is given by

$${\boldsymbol{h}}_{\boldsymbol{j}}^{\boldsymbol{f}}={\sum }_{\boldsymbol{k}}{\boldsymbol{W}}_{\boldsymbol{j}\boldsymbol{k}}^{\boldsymbol{f}}{\boldsymbol{x}}_{\boldsymbol{k}}-{\boldsymbol{b}}_{\boldsymbol{j}}^{\boldsymbol{f}}$$
(8)

The output of the hidden neuron layer is obtained by passing the net input (\({h}_{j}^{f}\)) through a sigmoid function (\({g}_{1}\)- defined in Sect. 3 of supplementary material).

$$\user2{V}_{\user2{j}} = \user2{g}_{1} (\user2{h}_{\user2{j}}^{\user2{f}} )$$
(9)

The net input to the output neurons hence can be written as

$${\boldsymbol{h}}_{\boldsymbol{i}}^{\boldsymbol{s}}={\sum }_{\boldsymbol{j}}{\boldsymbol{W}}_{\boldsymbol{i}\boldsymbol{j}}^{\boldsymbol{s}}{\boldsymbol{V}}_{\boldsymbol{j}}-{\boldsymbol{b}}_{\boldsymbol{i}}^{\boldsymbol{s}}$$
(10)

The output of each neuron is obtained by passing the net input through the sigmoid function (\({g}_{2}\)-defined in Sect. 3 of supplementary material).

$$\user2{y}_{\user2{i}} = \user2{g}_{2} (\user2{h}_{\user2{i}}^{\user2{s}} )$$
(11)

The weights and biases are updated to minimize the network error. Given that d is the desired output, the cost function without the regularization of energy is given by

$$\boldsymbol{C}=\frac{1}{2}\sum_{\boldsymbol{i}}{\vert\vert{\boldsymbol{d}}_{\boldsymbol{i}}-{\boldsymbol{y}}_{\boldsymbol{i}}\vert\vert}^{2}$$
(12)

The gradients of the weights and biases are obtained as shown below in order to minimize the cost function

$$\boldsymbol{\Delta }{\boldsymbol{W}}_{\boldsymbol{i}\boldsymbol{j}}^{\boldsymbol{s}}=-\boldsymbol{\eta }\frac{\partial \boldsymbol{C}}{\partial {\boldsymbol{W}}_{\boldsymbol{i}\boldsymbol{j}}^{\boldsymbol{s}}}$$
(13)
$${\boldsymbol{\Delta }\boldsymbol{b}}_{\boldsymbol{i}}^{\boldsymbol{s}}=-\boldsymbol{\eta }\frac{\partial \boldsymbol{C}}{\partial {\boldsymbol{b}}_{\boldsymbol{i}}^{\boldsymbol{s}}}$$
(14)
$$\boldsymbol{\Delta }{\boldsymbol{W}}_{\boldsymbol{j}\boldsymbol{k}}^{\boldsymbol{f}}=-\boldsymbol{\eta }\frac{\partial \boldsymbol{C}}{\partial {\boldsymbol{W}}_{\boldsymbol{j}\boldsymbol{k}}^{\boldsymbol{f}}}$$
(15)

Neurovascular coupling

Neurovascular coupling is implemented in this network in the form of dependence of the bias of each neuron to the energy level of the closest vascular node. Based on conductance-based neuron models, earlier studies have suggested that the effect of energy supply to a neuron can be expressed in the form of changes in the threshold of firing44,50. During the forward pass, the bias of the hidden neurons is entirely dependent on the energy available at the leaf nodes (Eq. 7). As shown in Fig. 1, the state of the vascular network hence influences the activity of the neuron. The gradients at each level are calculated using the backpropagated error estimated at that level. Since the bias of the hidden layer is dependent on the leaf node energy, Ej, available to that neuron, the gradient of the bias at the hidden neuron layer is converted in terms of the gradient of the energy as shown below.

$$\Delta {\boldsymbol{E}}_{\boldsymbol{j}}=\boldsymbol{\eta }\frac{\partial \boldsymbol{C}}{\partial {\boldsymbol{b}}_{\boldsymbol{j}}^{\boldsymbol{f}}}\frac{\partial {\boldsymbol{b}}_{\boldsymbol{j}}^{\boldsymbol{f}}}{\partial {\boldsymbol{E}}_{\boldsymbol{j}}}$$
(16)

Given that the output error, \({e}_{i}={d}_{i}-{y}_{i},\) the error terms at the hidden layer and output layer, \({\delta }_{j}^{f}\) and \({\delta }_{i}^{s}\) respectively are defined as below

$${\boldsymbol{\delta}}_{\boldsymbol{i}}^{\boldsymbol{s}}\,\,=\,\,{\boldsymbol{e}}_{\boldsymbol{i}}\,\,{{\boldsymbol{g}}_{2}^{\boldsymbol{{\prime}}}(\boldsymbol{h}}_{\boldsymbol{i}}^{\boldsymbol{s}})$$
(17)
$${\boldsymbol{\delta }}_{\boldsymbol{j}}^{\boldsymbol{f}}=\sum _{\boldsymbol{i}}{\boldsymbol{\delta }}_{\boldsymbol{i}}^{\boldsymbol{s}}{\boldsymbol{W}}_{\boldsymbol{i}\boldsymbol{j}}^{\boldsymbol{s}}{{\boldsymbol{g}}_{1}^{\boldsymbol{\text{'}}}(\boldsymbol{h}}_{\boldsymbol{j}}^{\boldsymbol{f}})$$
(18)

The gradients in terms of the partial derivative of the cost function can be rewritten using these error terms, as shown below.

$$\boldsymbol{\Delta }{\boldsymbol{W}}_{\boldsymbol{i}\boldsymbol{j}}^{\boldsymbol{s}}=\boldsymbol{\eta }{\boldsymbol{\delta }}_{\boldsymbol{i}}^{\boldsymbol{s}}{\boldsymbol{V}}_{\boldsymbol{j}}$$
(19)
$${\boldsymbol{\Delta }\boldsymbol{b}}_{\boldsymbol{i}}^{\boldsymbol{s}}=-\boldsymbol{\eta }{\boldsymbol{\delta }}_{\boldsymbol{i}}^{\boldsymbol{s}}$$
(20)
$$\boldsymbol{\Delta }{\boldsymbol{W}}_{\boldsymbol{j}\boldsymbol{k}}^{\boldsymbol{f}}={\boldsymbol{\eta }\boldsymbol{\delta }}_{\boldsymbol{j}}^{\boldsymbol{f}}{\boldsymbol{x}}_{\boldsymbol{k}}$$
(21)
$$\Delta {\boldsymbol{E}}_{\boldsymbol{j}}={-\boldsymbol{\eta }\boldsymbol{\delta }}_{\boldsymbol{j}}^{\boldsymbol{f}}{\boldsymbol{f}}^{\boldsymbol{\text{'}}}({\boldsymbol{E}}_{\boldsymbol{j}})\,\,=\left\{\begin{array}{c}{\boldsymbol{\eta }\boldsymbol{\delta }}_{\boldsymbol{j}}^{\boldsymbol{f}},\\ 0,\end{array}\genfrac{}{}{0pt}{}{\,\,0\le {\boldsymbol{E}}_{\boldsymbol{j}}\le 2}{\,\,{\boldsymbol{E}}_{\boldsymbol{j}}>2}\right.$$
(22)

The gradient of energy calculated using Eq. (22) is used to update the weights between a parent node \(x\) and child node y of the vascular tree.

$$\Delta {\,\,\boldsymbol{U}}_{\boldsymbol{y}\boldsymbol{x}}^{\boldsymbol{E}}={\,\,\boldsymbol{U}}_{\boldsymbol{y}\boldsymbol{x}}^{\boldsymbol{E}}+{\boldsymbol{\eta }}_{\boldsymbol{v}}\Delta {\boldsymbol{E}}_{\boldsymbol{y}}$$
(23)

The energy gradient (\(\Delta {E}_{\rm{x}}\)) at each parent node, x is taken as the average of the energy gradients of the children nodes.

$$\Delta {\boldsymbol{E}}_{\boldsymbol{x}}=\frac{1}{\boldsymbol{c}}\sum _{\boldsymbol{y}=1:\boldsymbol{c}}(\Delta {\boldsymbol{E}}_{\boldsymbol{y}})$$
(24)

where c is the total number of child nodes of the parent node x.

In order to incorporate L2 regularization of weights (study described in supplementary material Sect. 4), the cost function needs to be changed, as shown below. This, in turn, changes the gradient of \({W}_{ij}\)

$$\boldsymbol{C}=\frac{1}{2}\sum_{\boldsymbol{i}}{\vert\vert{\boldsymbol{d}}_{\boldsymbol{i}}-{\boldsymbol{y}}_{\boldsymbol{i}}\vert\vert}^{2}+{\boldsymbol{\lambda }}_{\boldsymbol{w}}\frac{1}{2}\sum_{\boldsymbol{i}}\sum_{\boldsymbol{j}}{\vert\vert{\boldsymbol{W}}_{\boldsymbol{i}\boldsymbol{j}}\vert\vert}^{2}$$
(25)
$$\boldsymbol{\Delta }{\boldsymbol{W}}_{\boldsymbol{i}\boldsymbol{j}}=\boldsymbol{\eta }{(\boldsymbol{\delta }}_{\boldsymbol{i}}^{\boldsymbol{s}}{\boldsymbol{V}}_{\boldsymbol{j}}-{\boldsymbol{\lambda }}_{\boldsymbol{w}}{\boldsymbol{W}}_{\boldsymbol{i}\boldsymbol{j}})$$
(26)

In order to incorporate L1 regularization of energy (study described in supplementary material Sect. 4), the cost function needs to be changed, as shown below. This, in turn, changes the gradient of \({E}_{j}\)

$$\boldsymbol{C}=\frac{1}{2}\sum _{\boldsymbol{i}}{\vert\vert{\boldsymbol{d}}_{\boldsymbol{i}}-{\boldsymbol{y}}_{\boldsymbol{i}}\vert\vert}^{2}+{\boldsymbol{\lambda }}_{\boldsymbol{E}}\sum _{\boldsymbol{j}}\left\vert{\boldsymbol{E}}_{\boldsymbol{j}}\right\vert$$
(27)
$$\Delta \user2{E}_{\user2{j}} = \user2{\eta }(\user2{\delta }_{\user2{j}}^{\user2{f}} - \user2{\lambda }_{\user2{E}} ),\user2{~}{\text{since}\user2{~}}\user2{E}_{\user2{j}} \ge 0$$
(28)

The performance of the ANVN is evaluated based on the accuracy (\(\alpha\)) and the total energy consumed (\(\xi\)). The accuracy of the network (\(\alpha\)) at any given root energy \(\xi\), is defined as the fraction of the number of correct class predictions to the total number of predictions made from a given test data set.

$${\boldsymbol{\alpha}}_{\boldsymbol{\xi}}=\frac{\boldsymbol{N}\boldsymbol{u}\boldsymbol{m}\boldsymbol{b}\boldsymbol{e}\boldsymbol{r}\,\,\boldsymbol{o}\boldsymbol{f}\,\,\boldsymbol{c}\boldsymbol{o}\boldsymbol{r}\boldsymbol{r}\boldsymbol{e}\boldsymbol{c}\boldsymbol{t}\,\,\boldsymbol{p}\boldsymbol{r}\boldsymbol{e}\boldsymbol{d}\boldsymbol{i}\boldsymbol{c}\boldsymbol{t}\boldsymbol{i}\boldsymbol{o}\boldsymbol{n}\boldsymbol{s}}{\boldsymbol{T}\boldsymbol{o}\boldsymbol{t}\boldsymbol{a}\boldsymbol{l}\,\,\boldsymbol{n}\boldsymbol{u}\boldsymbol{m}\boldsymbol{b}\boldsymbol{e}\boldsymbol{r}\,\,\boldsymbol{o}\boldsymbol{f}\,\,\boldsymbol{p}\boldsymbol{r}\boldsymbol{e}\boldsymbol{d}\boldsymbol{i}\boldsymbol{c}\boldsymbol{t}\boldsymbol{i}\boldsymbol{o}\boldsymbol{n}\boldsymbol{s}}$$
(29)

The total energy consumed (\(\xi\)) is calculated as the sum of the energies available at all the \(F\) leaf nodes, which subsequently is equal to the energy taken by the root node.

$$\boldsymbol{\xi }=\sum _{\boldsymbol{F}}{\boldsymbol{E}}_{\boldsymbol{F}}$$
(30)

The minimum accuracy (\({\alpha }_{{\xi }_{0}}\)) of the network is the accuracy at minimum root energy (\(\xi =1\)). The efficiency (\({\psi }_{\xi }\)) of the network at any root energy \(\xi\), is defined as the ratio of relative accuracy and energy consumed.

$${\boldsymbol{\psi }}_{\boldsymbol{\xi }}=\frac{{\boldsymbol{\alpha }}_{\boldsymbol{\xi }}-{\boldsymbol{\alpha }}_{{\boldsymbol{\xi }}_{0}}}{\boldsymbol{\xi }}$$
(31)

Training regimes in the vascular network

In order to investigate the necessity of training the vascular network, we propose three training regimes to analyze ANVN. In all three regimes, the networks are trained using MNIST data set51.

Untrained vascular network

Under this regime, the neural network is trained a priori without any vascular network. This trained network is then connected with the leaf nodes of the vascular network. The weights of the vascular tree are non-trainable and predefined such that the energy at the root node is equally distributed among the leaf nodes. The bias of the neurons now depends on the energy available at its leaf node. The bias derived from the energy at each neuron is calculated as per the relationship (Eq. 7) shown in Fig. 2. This untrained variant of ANVN is then tested using the same data set used to evaluate the performance of the trained MLP. The performance in terms of accuracy and efficiency is evaluated by varying the energy provided at the root node.

Sequentially trained vascular network

The second regime aims to study if training of the vascular network would lead to any improvement in the performance of the neural network by achieving an optimal delivery of energy to the hidden neurons. For testing this, the neural network is trained independently first. It is then incorporated in ANVN by connecting the hidden neurons to the leaf nodes of the vascular tree, and the weights of the tree are trained subsequently. The bias of the hidden neuron depends on the energy at the leaf nodes. This energy-dependent bias (\({b}_{j}^{E}\)) will be different from the trained bias \({(b}_{j}^{T})\) and the difference (\(\Delta {b}_{j}={b}_{j}^{T}\)-\({b}_{j}^{E}\)) is used to calculate the energy gradient (\(\Delta {E}_{j}\)) using the bias energy relationship (Eq. 7). This gradient of energy is used as vascular feedback to update the vascular weights.

Simultaneously trained vascular network

In this case, the neural and vascular networks are trained simultaneously. The untrained vascular network is connected to the hidden neurons of an untrained MLP to form the ANVN like before (Fig. 1). During the forward pass of the MLP, the energy at the leaf nodes decides the bias of the hidden layer. The gradient in bias obtained during the backpropagation is used to find the gradient in energy (Eq. 16). A neuronal update that seeks to reduce bias must demand more energy from the vascular tree. The energy gradient is an estimate of the neuron’s energy deficit to attain the required change in bias. This information is propagated along the vascular tree, upwards from the leaf nodes to the root node, to modify the weights so that the energy supply at the leaf nodes matches the neuronal demand.

Simultaneously trained ANVN with an energy reservoir (ANVN_R)

In all three regimes discussed above, a pre-determined amount of energy is provided at the root node and distributed to all the hidden neurons by the leaf nodes. The network uses up the energy given to it regardless of whether the energy is in excess compared to their actual demand. Here we modify the ANVN such that the vascular network takes only the required energy from the root node, rejects the excess energy, and saves it in a reservoir. The ANVN with reservoir is denoted by ANVN_R. A modification was made to the calculation of energy gradient such that any neuron receiving a per capita energy > 2 units returns the excess energy by updating the weights using a small negative slope (\(\gamma\)=0.005). The energy gradient Eq. (22) was hence updated as shown below.

$$\Delta {\mathbf{\it E}}_{{\mathbf{\it j}}} = \left\{ {\begin{array}{*{20}l} {{\mathbf{\eta \delta }}_{{\mathbf{\it j}}}^{{\mathbf{\it f}}} ,} \hfill & {0 \le {\mathbf{\it E}}_{{\mathbf{\it j}}} \le 2} \hfill \\ { - \gamma ,} \hfill & {{\mathbf{\it E}}_{{\mathbf{\it j}}} > 2} \hfill \\ \end{array} } \right.$$
(32)

This energy gradient modifies the vascular weights from the leaf nodes to the root and also the weight connecting the root node to the source (Fig. 1). Updating the source weight naturally changes the weight of the reservoir due to normalization carried out among vascular weights emerging from the same parent node (Eq. 3).

Results

Performance of ANVN under various vascular training schemes

The initial part of the work focuses on establishing the importance of vascular training on the neural network. The ANVN was trained under 3 different regimes, and their performances were evaluated based on accuracy and energy efficiency.

Regime 1: Pre-trained neural network connected to untrained vascular tree.

A neural network with 512 hidden neurons was trained independently using the MNIST dataset with 500 data points. The hidden layer of the neural network was then connected to an untrained vascular network with 512 leaf nodes. The branching factor of the tree was also varied such that the tree has a minimum of two levels (\(k=512\)) and a maximum of 10 levels (\(k=2\)). The result shown here is for \(k=8\), which results in a tree with 4 levels. The bias of the hidden layer of the ANVN was now dependent on the energy available at the leaf nodes. The energy served at the root node was varied from 10 to 500 units. The performance of the ANVN was evaluated using test data set of 200 data point for each value of root energy over the same test data set used to test the performance of the independently trained neural network. The performance in terms of test accuracy and efficiency was abysmal for all root energies (dotted line with star marker in Fig. 3).

Figure 3
figure 3

Accuracy and energy efficiency vs. root energy when vascular network is (i) untrained (dotted line with star marker), (ii) sequentially trained (dashed), and (iii) simultaneously trained (solid) with MLP.

Regime 2: Vascular tree sequentially trained after neural network training.

The second training regime was to sequentially train the vascular network. The neural network was pretrained similar to that in regime 1. The trained neural network was then connected to a trainable vascular tree made of 512 leaf nodes with random weight initialization. The difference in the trained bias and the bias obtained from the energy at the leaf node was used to train the vascular network. The vascular network was also trained using the same MNIST data set (500 training data points) used to train the neural network. Once the vascular network was trained sequentially, the performance of ANVN was evaluated using 200 test data points by varying the energy supplied at the root node. The energy was varied from 10 to 500 units, and the accuracy (Eq. 29) and efficiency (Eq. 31) were calculated in each case. The maximum efficiency attained by the network was just around 0.2 units. The network showed an improved accuracy at higher energies (> 300 units) (Fig. 3 blue dashed line). But the efficiency plot shows a steady decrease with an increase in root energy.

Regime 3: Simultaneously trained MLP and vascular tree.

The third training regime followed was training both the MLP and the vascular network simultaneously. The network is trained using 500 data points of the MNIST data set. The gradient of bias obtained using the backpropagation algorithm was converted into a gradient of energy in order to update the vascular weights. The training was carried out for 20 k epochs by varying the root energy from 10 to 500 units. The network was then evaluated using the test data set of 200 data points. The simultaneous training of the vascular network and neural network improved the network performance even further. As shown by the solid line in Fig. 3, the network attained a peak efficiency of 0.62 units with test accuracy above 80% at much lesser energy (100 units) compared to the networks trained using regimes 1 and 2. The network was able to achieve better efficiency at lower energy range, and on further increase in root energy, even though accuracy was maintained, the efficiency kept dropping steadily. Thus, simultaneous training of neural and vascular networks is desirable to carry out energy-efficient data processing.

Network performance is invariant with respect to the branching factor

The vascular tree in this model is determined by the number of hidden neurons of MLP, (\(N\)) (which equals the number of leaf nodes), and the branching factor (\(k\)). Change in the branching factor changes the topology of the tree. We now consider the effect of the branching factor, \(k\), on MLP learning.

Let the total number of vascular nodes be \({t}_{n}\) and vascular nodes \(x\) and \(y\) be connected by weights \({U}_{yx}^{E}\). The vascular root node is supplied by an energy source,\({E}_{s}\). The parameters of the model comprise the values of the weights and input energy. In the described tree, every node except the root node is connected to its parent node; hence there exists \({t}_{n}-1\) weight connections. Including the input energy (root node energy), the total number of parameters add up to (\({t}_{n}-1)+1={t}_{n}\).

The normalization of weights at each node forms the constraint of the model along with the energy consumed at each vascular node. Since the number of weight connections is \({t}_{n}-N\), the number of constraints resulting from the normalization of weights account to \({t}_{n}-N\). The second set of constraints are the energy consumed at each vascular leaf node. Since there are \(N\) leaf nodes, the total number of constraints of the network adds up to (\({t}_{n}-N)+N={t}_{n}\).

Hence regardless of the branching factor, the number of parameters = the number of constraints, making the solution unique. This feature was observed in the model from the similar network characteristics exhibited by networks trained using all the three training paradigms (untrained neural, sequentially trained, and simultaneously trained) regardless of the variation in the branching factor, \(k\) (Figs. 4, 5 and 6).

Figure 4
figure 4

Untrained ANVN: energy efficiency and accuracy across root node for various branching factors.

Figure 5
figure 5

Sequentially trained ANVN: energy efficiency and accuracy across root node for various branching factors.

Figure 6
figure 6

Simultaneously trained ANVN: energy efficiency and accuracy across root node for various branching factors.

Energy deficit vs. accuracy correlation study for various root energies

To study the effect of energy on the accuracy of the network, we need to check if the energy deficit would result in a drop in the accuracy. We define a term, ‘energy deficit’ (\(\stackrel{\sim }{E}\)) as the difference of desired energy (\({E}_{D})\) and actual available energy (\({E}_{A}\)). The desired energy of a pretrained neural network is the total energy calculated using the trained biases (\({b}_{j}^{D}\)) based on Eq. (7). The available energy at an ANVN is the sum of energies at the leaf node.

$$\mathop {\mathbf{\it E}}\limits^{\sim } = {\mathbf{\it E}}_{{\mathbf{\it D}}} - {\mathbf{\it E}}_{{\mathbf{\it A}}}$$
(33)
$${\boldsymbol{E}}_{\boldsymbol{D}}=\sum _{\boldsymbol{j}}1-{\boldsymbol{b}}_{\boldsymbol{j}}^{\boldsymbol{D}}$$
(34)

It is easier to quantify energy deficit using the first and second training schemes (untrained and sequential training). For a sequentially trained ANVN, the vascular tree is trained after neural network training to match the desired energy to provide a bias close to the trained neural network. Here there is a provision to cross-check if there is an actual deficit in energy by calculating the deviation of the trained vascular leaf node energy from the desired energy. The variation of energy deficit (\({E}_{D}\)) and accuracy (\(\alpha\)) as a function of root energy (\({E}_{s}\)) is plotted in Fig. 7a. The network has 512 neurons in the hidden layer, and the branching factor of the vascular tree is 32. The training and testing procedures are similar to the sequential training procedure described under the subsection, ‘Regime 2’. At lower root energies, the energy deficit is high, and the accuracy is low. As the source energy increases, the energy deficit reduces, and accuracy improves. Estimation of correlation between Energy Deficit (\(\stackrel{\sim }{E}\)) and Accuracy (\(\alpha\)) showed a strong negative correlation (Pearson correlation coefficient = − 0.94). This showed that energy deficit inversely affects the accuracy of the network.

Figure 7
figure 7

Accuracy and energy deficit variation across root energy (a) for sequentially trained ANVN (b) for untrained ANVN.

In untrained ANVN, the desired energy of the network is unknown to the vascular network due to a lack of vascular training. Hence, the accuracy shows no improvement (Fig. 7b) even when the energy deficit is low, once again highlighting the point that high energy input is not of any advantage unless the vascular network is trained. The network size and branching factor are the same as the sequential ANVN.

Simultaneously trained ANVN with an energy reservoir

A comparison of vascular training regimes 1,2, and 3 established that the network performance tremendously improves at lower source energy when the vascular network is simultaneously trained with the neural network. The simultaneously trained ANVN attained high accuracy at relatively smaller root energy, but the accuracy did not improve on further increase in root energy as observed in Fig. 3. This was reflected in the steady decrease in efficiency of the network. An ideal network should be able to reject the unwanted excess energy provided to it. The ANVN was hence modified to obtain the energy from the source as demanded by the neural network and reject the rest so that it can be saved in the reservoir.

The root node of the network was connected to a constant energy source of value 5000 units through a weighted connection (Fig. 1). Another weighted connection from the energy source connected it to the reservoir. If the weight of the connection to the root node of ANVN_R is defined as \({U}_{0}\), then the weight connecting the energy source and reservoir would become \(1-{U}_{0}\). The vascular weights, including \(U\) were trained so that only the energy demanded by the network would be received, and any excess energy (i.e., energy not taken up by the neural network) would be pushed into the reservoir. The initial value of \({U}_{0}\) determines the initial energy available to the hidden neurons. The total energy consumed by the neural network depends on the demand of the neurons in the hidden layer. Hence limiting the number of neurons in the hidden layer would become critical. The ANVN_R was hence studied by varying the number of hidden neurons.

The performance of the ANVN_R was evaluated in terms of accuracy and efficiency. The efficiency (\({\psi }_{n}\)) of a network with ‘\(n\)’ number of hidden neurons (\(1<n<N\)) in the hidden layer was defined as the fraction of relative accuracy and energy consumed by the network. The relative accuracy was calculated as the difference of accuracy using any given number of hidden neurons (\({\alpha }_{n}\)) and the accuracy achieved using the minimum number of neurons (\({\alpha }_{{n}_{0}}),n={n}_{0}.\) In the current model, the minimum number of hidden neurons used was \({n}_{0}=16\). The total energy consumed (\({\xi }_{n}\)) was calculated by the sum of the energies available at the terminal leaf nodes of the vascular tree.

$${\boldsymbol{\psi }}_{\boldsymbol{n}}=\frac{{\boldsymbol{\alpha }}_{\boldsymbol{n}}-{\boldsymbol{\alpha }}_{{\boldsymbol{n}}_{0}}}{{\boldsymbol{\xi }}_{\boldsymbol{n}}}$$
(35)

The network was observed by varying the number of neurons in the hidden layer between 16 and 500. Each network was probed multiple times by varying the initial average per capita energy received by the leaf nodes between 0.2 units and 1 unit by adjusting the initial value of \({U}_{0}.\) Similar to the ANVN, we used 500 data points for training and 200 data points for testing the network. The training was carried out for 20 k epochs.

Figures 8a and 9 show that the accuracy of the network increased sharply (slope ~ 1.4) with an increase in the number of hidden neurons until \(N=32\). Further increase in the number of hidden neurons resulted in a relatively slower improvement in accuracy (slope ~ 0.36) till N = 64. The increase in accuracy became even slower beyond \(N=64\) (slope ~ 0.04) till \(N=175\), and a further increase in the number of hidden neurons almost saturated the gain in accuracy. Nevertheless, it is interesting to note that the highest accuracy (98%) attained by the ANVN_R (\(N=500\)) was much higher than the accuracy obtained by simultaneous training of ANVN (\(N=512\)) without reservoir (85%). The efficiency of the network peaked when \(N\) was in the range 28 to 36 (Fig. 8a) and then fell systematically with a further increase in the number of hidden neurons. Even though the accuracy increased from approximately 80% at \(N\sim 36\) to around 98% at \(N\sim 500\), the efficiency started to drop beyond \(N\sim 36\). This shows that for the network to be maximally efficient, it needs to compromise the maximum accuracy attainable. Even though the average per capita energy consumption showed a slightly decreasing trend with an increase in the number of hidden neurons (Fig. 9a), the total energy consumption increased linearly (Fig. 9b), which in turn resulted in the fall in efficiency.

Figure 8
figure 8

Study of ANVN_R: (a) The test accuracy and energy efficiency across the number of hidden layer neurons. (b) The box plot for visualizing the settling points of per capita energy consumption for each initial energy (varied from 0.2 units to 1 unit) given to single neurons given for each network size. The red mark shows the median value of the per capita energy consumed by the trained network and its maximum and the minimum values determine the height of the box.

Figure 9
figure 9

Study of variation in accuracy and energy consumption in ANVN_R with an increase in the number of hidden neurons: (a) test accuracy vs. per capita energy consumption (b) test accuracy vs. total energy consumption.

Each network with a given number of hidden neurons was observed by varying the initial per capita energy made available to the neurons. This variation in the initial per capita energy, delivered by the leaf nodes of the vascular tree to the hidden neurons, resulted in an interesting observation. The pattern of variation of accuracy against energy consumption as epochs progressed differed for networks with different numbers of hidden neurons.

Figure 10a–c shows the trajectory of the network evolution in the space of per capita energy consumption vs. accuracy for various initial conditions. As epochs progressed, the trajectories seemed to converge to a point (Fig. 10a) when the number of hidden neurons was small (\(N<64\)). This point appears to be a stable fixed point of the network dynamics on the per capita energy vs. accuracy space. Further increase in the number of hidden neurons, the trajectories did not converge to a point anymore but seemed to approach a line. It appears the former fixed point had given place to a line of attractor, as shown in Fig. 10c. The transition can be observed clearly in the box plot shown in Fig. 8b. The box plots describe the variation in per capita energy consumption of the networks at the steady state given a range of initial per capita energies (0.2 units to 1unit). By steady state, we mean that the network is trained for a sufficiently long time (20 k epochs). The red mark in each box shows the median value of steady state per capita energy consumption across varied initial per capita energy of a network with a given number of hidden neurons (\(N\)). The maximum and the minimum values of the per capita energy consumed by the trained network determine the height of the box. The height of the box was small for smaller networks (\(N<64\)), which indicated a low variation in energy consumption, indicating a fixed point attractor. As the number of hidden neurons was increased, the height of the box increased, indicating a line of attractor for steady state per capita energy consumption. The transition (Fig. 10b) of the network stable state from a fixed point to a line of attractors happened between the hidden neurons number 44 to 64 (Fig. 8b).

Figure 10
figure 10

Study of variation in accuracy and energy consumption with an increase in the number of hidden neurons. Each color indicated the trajectory of individual simulations with different initial energy given to the network. The starting point at the x axis denotes the initial energy of each simulations: (a) An example of the trajectories converging to fixed-point attractor, (b) an example of a transition point from fixed point to line of attractors, (c) an example where trajectories converge to a line of attractor.

To summarize, for smaller networks, the network seeks the same point in the per capita energy vs. accuracy space irrespective of the initial conditions. On the other hand, for larger networks, the final state of the network was strongly dependent on the initial per capita energy. Furthermore, in the latter case, the final state shows variation primarily in the per capita energy and not in the final accuracy achieved. It is also interesting to note that this transition happens roughly at the same number of hidden neurons even when the network is trained with a different data set (EMNIST—results in the supplementary material Fig S1, Fig S2).

Transfer learning

The importance of vascular training in ensuring optimal performance of the neural network was proven by the results discussed earlier in this study. The plasticity of vasculature is observed experimentally in many recent studies52,53,54,55. This posed the following question: how effectively does a vascular tree trained on data set A meet the energy demands of the network when it is trained subsequently on data set B? In order to answer this question, the simultaneously trained ANVN_R was initially trained with MNIST data51, and then the vascular weights were frozen to use as the initial weights for training a different data set (EMNIST56 data in this case). The neural network of ANVN_R had 100 neurons in the hidden layer, and the vascular network was assigned a branching factor of \(k=8\) in order to explore the changes in vascular weights in 4 levels of branching. Training using the MNIST data set was carried out for 20 k epochs with 500 training samples and was tested using 200 data points. Similarly, the training using the EMNIST data set also was carried out for 20 k epochs using 500 training data points. The EMNIST data (200 points) were used to test the trained network. The difference in the vascular weights \(({U}_{A})\) of ANVN_R trained using MNIST and the weights \(({U}_{B})\) of ANVN_R trained using EMNIST was quantified using the Root Mean Squared Error (RMSE) between them. Each vascular node was assigned a level number \((1<l<L\)) based on its hierarchical position from root node, with root node being at level, \(l=1\). The RMSE between \({U}_{A}\) and \({U}_{B}\) was estimated for each level l by considering the weights emerging from all nodes (i l) in the same level.

$$\boldsymbol{R}\boldsymbol{M}\boldsymbol{S}\boldsymbol{E}(\boldsymbol{l})=\sum_{\boldsymbol{i}\epsilon \boldsymbol{l}}{({\boldsymbol{U}}_{\boldsymbol{A}}^{\boldsymbol{i}}-{\boldsymbol{U}}_{\boldsymbol{B}}^{\boldsymbol{i}})}^{2}$$
(36)

The ANVN_R that was already trained on MNIST data learned the EMNIST data set, which is similar to MNIST, much faster, as shown in Fig. 11. The red curve indicating the training accuracy of the network trained on EMNIST increased to approximately 80% very quickly. In Fig. 12a, the RMSE in each level starting from the level of the root node (Level 1) to leaf nodes (Level 4) is plotted. The variation of RMSE with respect to levels in the vascular tree showed that the most significant changes happened at level 4, which is the level of the leaf nodes and hence closest to the neurons. The RMSE between \({U}_{A}\) and \({U}_{B}\) systematically reduced from level 4 to level 1, which means that the maximum changes occurred at the leaf node, and the changes were minimal at levels farthest from the leaf nodes and close to the root node. As the training progressed, this difference in the vascular weights at various levels in the vascular tree became more prominent (Fig. 12b). The change followed a similar pattern when explored in the network without reservoir (ANVN trained under regime 3—simultaneous straining), simulated for a larger number of levels (\(L=7\)) (Fig. S3 and S4 in supplementary material).

Figure 11
figure 11

Training accuracy of the initially trained network (blue) takes more epochs to reach a high accuracy as compared to the network trained by transfer learning (red).

Figure 12
figure 12

Variation of RMSE of the vascular weight at each level in the vascular tree (a) the RMSE between vascular weights at each level. The colors represent the RMSE at specified epochs. (b) The variation of RMSE across epochs for each level.

Correlation between error contribution and energy consumption

The hidden neurons in an ANVN and ANVN_R network consume, at the single neuron level, energy ranging from a minimum value of 0 units to a maximum of 2 in ANVN_R and more than that in the case of ANVN. A few neurons appeared to consume higher energy compared to other neurons in the same network. This variation of energy consumption among individual neurons of a network prompted the following question: Is the energy consumed by an individual neuron related to its contribution to the network’s performance? In order to answer this question, the correlation between the error contributed by each neuron and the energy consumed by the same neuron was calculated. To this end, during testing, for a network with \(N\) hidden neurons, neuron ‘\(j\)’ was switched off by making its output zero, and the test error (\({\varepsilon }_{N}(j)\)) was observed. The measured error is a readout of the contribution of the switched-off neuron to the network performance. Given a test sample size of M data points, the prediction error was calculated in terms of the root mean squared error (RMSE) between the desired output \({d}_{i}\) and the predicted output \({g(h}_{i}^{s})\), \(1<i<{n_o}\), where \(n_o\) is the number of neurons in the output layer.

$${\boldsymbol{R}\boldsymbol{M}\boldsymbol{S}\boldsymbol{E}}_{\boldsymbol{c}\boldsymbol{o}\boldsymbol{n}\boldsymbol{t}\boldsymbol{r}\boldsymbol{o}\boldsymbol{l}\,\,}=\frac{1}{\boldsymbol{M}}\sum _{\boldsymbol{M}}\sum _{\boldsymbol{i}=1:\boldsymbol{n_o}}\frac{{({\boldsymbol{d}}_{\boldsymbol{i}}\,\,-{{\boldsymbol{g}}_{2}(\boldsymbol{h}}_{\boldsymbol{i}}^{\boldsymbol{s}}))}^{2}}{\boldsymbol{n_o}}$$
(37)

Since the data used was from the MNIST data set, the labels range from 0 to 9. Thus, for a given data point p, if the class number is \({\varphi }^{p}\), the desired output \(d\) was defined such that

$${\boldsymbol{d}}_{\boldsymbol{i}}=\left\{\begin{array}{c}1,\\ 0,\end{array}\right.\genfrac{}{}{0pt}{}{\,\,\mathrm{i}={\varphi }^{p}+1}{\,\,\mathrm{f}\mathrm{o}\mathrm{r}\,\,\mathrm{a}\mathrm{l}\mathrm{l}\,\,\mathrm{o}\mathrm{t}\mathrm{h}\mathrm{e}\mathrm{r}\,\,\mathrm{i}}$$
(38)

The output of the jth neuron in the hidden layer \({\boldsymbol{V}}_{\boldsymbol{j}}\) is obtained by passing the net input to the hidden layer \({h}_{j}^{f}\)(Eq. 8) through a sigmoid function (\({g}_{1}\)) as explained earlier in Eq. (9). In order to calculate the error contribution, RMSE is estimated while switching off neurons 1 to j one by one for a network with j ranging from 1 to \(N\). While switching of kth hidden neuron, the Eq. (9) describing the output of the hidden layer \(({\boldsymbol{V}}_{\boldsymbol{j}})\) was modified as

$${\boldsymbol{V}}_{\boldsymbol{j}}^{\boldsymbol{k}}=\left\{\begin{array}{c}0\\{{\boldsymbol{g}}_{1}(\boldsymbol{h}}_{\boldsymbol{j}}^{\boldsymbol{f}})\end{array}\right.\genfrac{}{}{0pt}{}{\,\,\mathbf{j}=\boldsymbol{k}}{\,\,\mathbf{f}\mathbf{o}\mathbf{r}\,\,\mathbf{a}\mathbf{l}\mathbf{l}\,\,\mathbf{o}\mathbf{t}\mathbf{h}\mathbf{e}\mathbf{r}\,\,\mathbf{j}}$$
(39)

The root mean squared error obtained by shutting off kth neuron (\({\mathrm{R}\mathrm{M}\mathrm{S}\mathrm{E}}_{k})\) was estimated as below by using \(V_j^k\) to estimate the new \(h_i^s\) (see Eq. 10).

$${\mathbf{R}\mathbf{M}\mathbf{S}\mathbf{E}}_{\boldsymbol{k}}=\frac{1}{\boldsymbol{M}}\sum _{\boldsymbol{M}}\sum _{\boldsymbol{i}=1:\boldsymbol{n_o}}\frac{{({\boldsymbol{d}}_{\boldsymbol{i}}\,\,-{{\boldsymbol{g}}_{2}(\boldsymbol{h}}_{\boldsymbol{i}}^{\boldsymbol{s}}))}^{2}}{\boldsymbol{n_o}}$$
(40)

The gradient ΔRMSE for each hidden neuron was calculated as

$${\mathbf{\Delta }\mathbf{R}\mathbf{M}\mathbf{S}\mathbf{E}}_{\boldsymbol{k}}={{\mathbf{R}\mathbf{M}\mathbf{S}\mathbf{E}}_{\boldsymbol{k}}-\boldsymbol{R}\boldsymbol{M}\boldsymbol{S}\boldsymbol{E}}_{\boldsymbol{c}\boldsymbol{o}\boldsymbol{n}\boldsymbol{t}\boldsymbol{r}\boldsymbol{o}\boldsymbol{l}}$$
(41)

Two types of correlations are studied:

1. For a network (ANVN_R) with hidden neurons \((n=1:N),\) the Pearson correlation coefficient between \({{\Delta }\mathrm{R}\mathrm{M}\mathrm{S}\mathrm{E}}_{j}\) and \({E}_{j}\) , termed the Energy Error Correlation Coefficient (EECC), is calculated for each number of hidden neuron (\(n\)), where \({E}_{j}\) denotes the energy consumed by jth neuron, j ranging from 1 to N. The Fig. 13 shows the EECC between \({\Delta }\mathrm{R}\mathrm{M}\mathrm{S}\mathrm{E}\) and \({E}_{j}\) as the number of neurons is increased. For this network, the vascular network can return the excess energy to the reservoir. The network is simulated for different initial weights that connect the source node to the root node and reservoir. The results of all the simulations are averaged to get the plot shown in Fig. 13.

Figure 13
figure 13

The variation of correlation coefficient with the number of hidden neurons. The network has the ability to return the excess energy to a reservoir. There is a high correlation at a low number of hidden neurons, and the correlation reduces as the number of neurons increases. (a) A comparison with change in accuracy. (b) A comparison with the change in energy efficiency.

The EECC was observed to be higher for a smaller number of hidden neurons, and as the number of hidden neurons increased, the EECC between the energy consumed by a neuron and its contribution to the network decreased slightly. This means that in larger networks, the neurons are not used as efficiently as in smaller networks. This seems to be the reason for a reduced efficiency observed for larger networks (Fig. 13b) even though the maximum accuracy increased with the number of hidden neurons (Fig. 13a).

2. For a network (ANVN) with a fixed number of hidden neurons \(N\) and variable input energy, where the network is unable to reject excess energy, the EECC between \({{\Delta }\mathrm{R}\mathrm{M}\mathrm{S}\mathrm{E}}_{j}\) and \({E}_{j}\) was calculated for each total input source energy \({E}_{s}\), where the vector \({E}_{j}\) denotes the energy consumed by \({j}^{th}\) neuron,\(j\) ranging from 1 to N. The Fig. 14 shows the correlation of \({\Delta }\mathrm{R}\mathrm{M}\mathrm{S}\mathrm{E}\) with \({E}_{j}\) as the input energy is increased. The number of leaf nodes was fixed at 512 hidden neurons. The network was simulated for different branching factors (\(K=\mathrm{2,3},\mathrm{4,6},\mathrm{8,16,32,64,256,512}\)). The results of all the simulations are averaged to get the plot shown in Fig. 14

Figure 14
figure 14

The variation of correlation coefficient with energy provided at the root node. The number of hidden neurons is fixed at N = 512. There is no reservoir in the network, and the network cannot give back excess energy. There is a high correlation at a low number of hidden neurons, and the correlation reduces as the number of neurons increases. (a) A comparison with change in accuracy. (b) A comparison with change in energy efficiency.

The EECC was higher for smaller energy, and as total input source energy increased, the EECC between the energy consumed by a neuron and its contribution to the network reduced significantly. This shows that when excess energy is available, the neurons are not efficiently utilized. In this network also, the lack of correlation of a neuron’s contribution to performance and its energy consumption might be the reason for a reduced efficiency when input energy is very high (Fig. 14b).

Discussion

One of the fascinating facts about the brain is that even though it only comprises 2% of the entire body mass, it consumes about 20% of the overall energy budget of the body. The evolution based on survival of the fittest placed a strict constraint on the available energy, which is reflected in the way the brain evolved1,2. Artificial neural networks inspired by biological neural networks have made significant advances in many fields and led to a renaissance of sorts in artificial intelligence57,58. However, most artificial neural network models seem to focus on only information processing and ignore energy constraints. In this study, we tried to explore how energy dependence would affect the performance of an artificial neural network. Recent studies both experimental27,28,37,59 and computational44,45,60,61,62 have brought forth the importance of vascular and glial networks in the function of neural networks.

In this model, we simplified the complex energy delivery system of the biological neural network into an energy flow tree with weighted branches that are trainable. By comparing three training regimes where the vascular network was (i) untrained, (ii) sequentially trained after neural network, and (iii) simultaneously trained with the neural network, we show that simultaneous training of neural and the vascular network is the most energy-efficient. The training of vascular weights implies the rearrangement of the structure of the vasculature. The adaptation of the microvasculature following changes in the neural activity is well known and observed experimentally52,53.

The ANVN_R showed that beyond a certain neural network size, the accuracy improvement was at the cost of energy efficiency. With the increase in the neural network size, the total energy consumption linearly increased while the increase in accuracy was negligible. Hence there exists a range of neural network sizes that can give an energy-efficient performance at a reasonable accuracy. In an environment where energy resources are limited, an ideal choice would be to settle for a network size where each neuron is maximally utilized, thereby permitting the neural network to be energy efficient. In a biological network, since the energy availability is limited, it makes sense that the network settles for a lower accuracy to achieve energy-efficient computing3,26,63,64.

We explored the robustness of ANVN_R to the change in available initial energy. The network always converged to a stable settling point in the accuracy vs. per capita energy consumption space until a certain network size. Beyond that size, the trajectories of the neural network dynamics on the per capita energy consumption vs. accuracy space converged to a line of attractor, showing the dependence on the initial availability of the energy. This shows that the network size has to be small to ensure the robustness of the system. Providing an additional energy limitation constraint (\({L}^{1}\) regularization) made the network more robust to the variation in initial energy available during training, and the stable settling point existed for a larger network size without compromising accuracy (study described in Supplementary material Sect. 4).

Retraining an ANVN_R with a different data set showed how transfer learning would manifest in the vascular network. The vascular weights at the level closest to the neurons (representing microvasculature) underwent maximum changes. Interpreting this result in biological terms, it appears that, in the vascular network, the retraining of the network using any new data set would result in a greater change at the level of the microvasculature than in larger vessels like the penetrating arterioles. This agrees with the microvascular plasticity observed in many in vivo models65,66,67,68,69.

Above all, the correlation between a neuron’s energy consumption and its contribution to the network performance (quantified using EECC) was evident only when the network size was small. It was observed (Blue plot of Figs. 13 and 14) that when the number of hidden layers is small, there existed a positive correlation between the error contribution and energy consumption, meaning that the neurons that contribute the most towards the accuracy of the network consume more energy. However, surprisingly this correlation reduced with an increase in the number of neurons (Fig. 13) as well as when the network receives higher energy (Fig. 14). Even though the test accuracy for a network achieved a higher value on increasing the number of hidden neurons (red plot in Fig. 13a) or input energy (red plot in Fig. 14a), this was at the cost of efficiency of the network (red plots in Figs. 13b and 14b). This meant that at a larger network size, the neurons tend to be less efficient compared to the smaller network. The network ends up using all the available neurons instead of using just sufficient neurons, ensuring good performance but at the cost of very high energy consumption. We may describe this phenomenon as some sort of “metabolic obesity” wherein the neural network consumes too much energy for the performance it delivers. This shows that limiting the network size at an optimal point is necessary to ensure both the energy efficiency and metabolic stability of the network.

In this paper, we examine the relationship between energy availability and input/output performance in an MLP. It was earlier shown, using an electronic implementation of a Hopfield memory neural network, that improved retrieval performance is correlated with higher energy consumption in the form of increased dissipation through resistive elements70. Similar results were also reported in the case of an oscillatory associative neural network model71.

Taking a broader view, one must note that results that point to a link between energy consumption and informational performance are not limited to computational neuroscience models or even artificial neural network models. This link has been the central question of the field of the physics of computation and has a long history. Since irreversible processes in a thermodynamic system are associated with an irretrievable loss of energy as heat, more than half a century ago, Ralph Landauer asked if irreversible operations (e.g., addition: \(x+y=z\)) in a physical computing device are accompanied by dissipation of energy as heat. This profound question had led to the creation of the whole field of reversible computation72,73,74. More recently, Karl Friston had proposed a free-energy theory of the brain that seeks to describe neuroenergetics and neural information processing in a unified thermodynamic framework75. Theories of this type must be adequately extended and applied to a wide variety of computational neural architectures, both in neuroscience and in artificial intelligence, so as to simultaneously achieve optimal informational and metabolic efficiency.

Our model presented the preliminary idea that the brain choosing energy-efficient performance over maximal performance is a characteristic of a robust neural network with limited energy availability. Extending this idea to a deep network would give an insight into how energy availability influences the feature extraction and learning exhibited by the neural networks. The correlation between the higher energy consumption levels and cognitive performance has become increasingly evident in recent years59,76,77,78,79,80. Such a model would also help in understanding how energy availability impacts cognitive performance.

The current model raises the following question: do real neural networks minimize free energy? Recent studies on neuronal cell cultures show that that is indeed the case81. In this study, the learning trajectories on the information plane show that, as accuracy increases, there is a concomitant increase in complexity, which is linked to energy cost. The learning trajectories in our simulations as depicted in Fig. 10 is strongly reminiscent of the learning trajectories in the information plane of of Isomura & Friston, 201881. Similar results were also reported in deep learning theory82.

Our simulation results suggest that and explain why neurons that contribute most to accuracy also consume more energy. This happens because increase in accuracy is accomplished at the cost of energy expenditure. However, when the network is overtrained, efficiency in terms of energy consumption is lost, leading to high accuracy, but also abnormally high energy consumption. On the basis of the above, one would predict that networks with too many neurons will not generalise, or perform well on transfer learning tasks83.