Artificial neurovascular network (ANVN) to study the accuracy vs. efficiency trade-off in an energy dependent neural network

Artificial feedforward neural networks perform a wide variety of classification and function approximation tasks with high accuracy. Unlike their artificial counterparts, biological neural networks require a supply of adequate energy delivered to single neurons by a network of cerebral microvessels. Since energy is a limited resource, a natural question is whether the cerebrovascular network is capable of ensuring maximum performance of the neural network while consuming minimum energy? Should the cerebrovascular network also be trained, along with the neural network, to achieve such an optimum? In order to answer the above questions in a simplified modeling setting, we constructed an Artificial Neurovascular Network (ANVN) comprising a multilayered perceptron (MLP) connected to a vascular tree structure. The root node of the vascular tree structure is connected to an energy source, and the terminal nodes of the vascular tree supply energy to the hidden neurons of the MLP. The energy delivered by the terminal vascular nodes to the hidden neurons determines the biases of the hidden neurons. The “weights” on the branches of the vascular tree depict the energy distribution from the parent node to the child nodes. The vascular weights are updated by a kind of “backpropagation” of the energy demand error generated by the hidden neurons. We observed that higher performance was achieved at lower energy levels when the vascular network was also trained along with the neural network. This indicates that the vascular network needs to be trained to ensure efficient neural performance. We observed that below a certain network size, the energetic dynamics of the network in the per capita energy consumption vs. classification accuracy space approaches a fixed-point attractor for various initial conditions. Once the number of hidden neurons increases beyond a threshold, the fixed point appears to vanish, giving place to a line of attractors. The model also showed that when there is a limited resource, the energy consumption of neurons is strongly correlated to their individual contribution to the network’s performance.


The ANVN with reservoir (ANVN_R) results using EMNIST data set
The ANVN_R was simulated using the EMINST data also. The EMNIST is a dataset similar to MNIST, but with alphabets instead of numbers. We took 500 training data points and 200 test data points for all simulations, such that the data points were equally spanned in 10 predefined classes (Capital Letters A to J). The data was limited to 10 classes so as to use the same network of ANVN_R that we used for MNIST classification. Figure S1: Study of variation in accuracy and energy consumption in ANVN_R with an increase in the number of hidden neurons (using EMNIST data set): (a) Test accuracy vs per capita energy consumption (b) Test accuracy vs total energy consumption Figure S1.a,b shows the variation of percapita energy and total energy respectively as the number of neurons in the hidden layer increases. Similar to the results of MNIST, the EMNIST also gave a peak efficiency while the number of hidden neurons were around N=28 to N=36 (fig S2.a). Beyond that, the efficiency dropped and accuracy saturated. It was surprising that the robustness of the network to initial energy also was lost (fig S2.b) at almost similar network size (N~64) as obtained when run using MNIST Figure S2: (a) The test accuracy and energy efficiency across the number of hidden layer neurons. (b) The box plot for visualizing the settling points of per capita energy consumption for each initial energy (varied from 0.2 units to 1 unit) given to single neurons given for each network size. The red mark shows the median value of the per capita energy consumed by the trained network and its maximum and the minimum values determine the height of the box (EMNIST data)

Transfer learning in ANVN trained under regime 3
The transfer learning was carried out in an ANVN network also and the results were similar to that of ANVN_R. As was observed in ANVN, the network learned faster when second data set was introduced ( fig. S.3) and the difference in vascular weights was highest at the level closest to neurons ( fig. S4 a,b). (Similar to the results obtained while using MNIST ( fig.3 and fig 8.a), in the case of EMNIST also, the ANVN trained using regime 3 gave a lower accuracy than the ANVN with reservoir). The hidden layer had 512 neurons. Input energy given was 500 units. The branching factor was set to k=3 in order to simulate a network with 7 layers. Figure S3: Training accuracy of the initially trained ANVN network (blue) takes more epochs to reach a high accuracy as compared to the network trained by transfer learning (red) Figure S4: Variation of RMSE of the vascular weight at each level in the vascular tree (ANVN_R) (a) The RMSE between vascular weights at each level. The colors represent the RMSE at specified epochs. (b) The variation of RMSE across epochs for each level

The definition of sigmoid function
In the design of ANVN discussed in the main manuscript, the net input at each neuron in the hidden layer is passed through a sigmoid function as described in eqn.9 and eqn.11. A sigmoid function ( ) is defined as, The sigmoid function applied at the hidden layer of the ANVN ( 1 ) is defined such that = 5 and = 0.
The sigmoid function applied at the output layer of the ANVN ( 2 ) is defined such that = 1 and = 0.
Such a difference in the definition of sigmoid between the two layers is to accommodate the fact that the afferent weights from input layer to hidden layer are normalized and hence the net input (eqn.8) received by each neuron will be between a minimum of -1 (when = 1) and a maximum of 2 (when = −1). Hence the sigmoid function needs to have the entire curve

Regularization of Energy
The simultaneous training of neural and vascular networks in ANVN_R showed that increasing the number of hidden neurons beyond a point increases the total energy consumption without much improvement in accuracy. Such behavior might be a consequence of cost function not explicitly demanding minimal energy consumption. Hence, we decided to study the effects of constraining the cost function by introducing regularization. Regularization of weights is a technique widely used to improve the generalization of a neural network 53 . It prevents the overfitting of data. We explored if regularization of energy can bring about any change in the performance of ANVN. Two methods of regularization were explored. One was by penalizing the weights, by implementing 2 regularization of weights (eqn.Error! Reference source not found.). The second method was directly constraining the energy by imposing 1 regularization of energy (eqn.Error! Reference source not found.). For directly including regularization of energy in the cost function, the 1 regularization of was preferred over 2 since the biological implication of 1 minimization of energy was more meaningful. The range of initial values of average per capita energy was varied between 0.4 units and 1 unit. Any neuron receiving a per capita energy > 2 units (due to random initialization of the vascular weights) will return the excess energy by updating the weights using a small negative slope (eqn.Error! Reference source not found.). The training data and testing were done using 500 and 200 data points, respectively, from the MNIST data set. Each network was trained for 20k epochs.
The 2 regularization of weights did not show any notable drop in energy consumption with an increase in the number of hidden neurons ( fig. S6.b and S7.b). Instead, the accuracy dropped on introducing 2 regularization. However, on imposing the constraint directly on the magnitude of the energy consumed by the hidden neurons ( 1 regularization of energy), a significant drop in energy consumption was observed with regularization ( Fig S6.a and S7.a). The drop was higher with a higher regularization factor (λ). Moreover, the accuracy was maintained similar to that without regularization. Also, the transition of the network from a fixed-point attractor to a line of attractor appeared to occur much later in the case of a network regularized using 1 norm of energy ( fig. S8.a) when compared to the non-regularized network ( fig. 8) and the network regularized using 2 norm of weights ( fig. S8.b). On imposing regularization of energy, the network converges to a fixed-point attractor in the per capita energy consumption vs. accuracy space for a range of larger networks making it more robust to variation in initial energy. Due to the large variation in the accuracy attained by the smallest network across the regularization factors, the relative accuracy, and hence by current definition, the efficiency also cannot be compared across the λ values.