Article | Published:

Training deep neural networks for binary communication with the Whetstone method

Abstract

The computational cost of deep neural networks presents challenges to broadly deploying these algorithms. Low-power and embedded neuromorphic processors offer potentially dramatic performance-per-watt improvements over traditional processors. However, programming these brain-inspired platforms generally requires platform-specific expertise. It is therefore difficult to achieve state-of-the-art performance on these platforms, limiting their applicability. Here we present Whetstone, a method to bridge this gap by converting deep neural networks to have discrete, binary communication. During the training process, the activation function at each layer is progressively sharpened towards a threshold activation, with limited loss in performance. Whetstone sharpened networks do not require a rate code or other spike-based coding scheme, thus producing networks comparable in timing and size to conventional artificial neural networks. We demonstrate Whetstone on a number of architectures and tasks such as image classification, autoencoders and semantic segmentation. Whetstone is currently implemented within the Keras wrapper for TensorFlow and is widely extendable.

A preprint version of the article is available at ArXiv.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Data availability

All data used come from publicly available datasets: MNIST34, Fashion-MNIST35, CIFAR36 and COCO19. Whetstone is available at https://github.com/SNL-NERL/Whetstone, licensed under the GPL.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  1. 1.

    He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. Proc. IEEE Conference on Computer Vision and Pattern Recognition 770–778 (IEEE, 2016).

  2. 2.

    Pinheiro, P. O., Collobert, R. & Dollár, P. Learning to segment object candidates. Proc. 28th International Conference on Neural Information Processing Systems 2, 1990–1998 (2015).

  3. 3.

    LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).

  4. 4.

    Yang, T.-J., Chen, Y.-H. & Sze, V. Designing energy-efficient convolutional neural networks using energy-aware pruning. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 6071–6079 (IEEE, 2017).

  5. 5.

    Coppola, G. & Dey, E. Driverless cars are giving engineers a fuel economy headache. Bloomberg.com https://www.bloomberg.com/news/articles/2017-10-11/driverless-cars-are-giving-engineers-a-fuel-economy-headache (2017).

  6. 6.

    Horowitz, M. 1.1 Computing’s energy problem (and what we can do about it). In 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC) 10–14 (IEEE, 2014).

  7. 7.

    Jouppi, N. P. et al. In-datacenter performance analysis of a tensor processing unit. In 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) 1–12 (IEEE, 2017).

  8. 8.

    Rao, N. Intel® nervana™ neural network processors (NNP) redefine AI silicon. Intel https://ai.intel.com/intel-nervana-neural-network-processors-nnp-redefine-ai-silicon/ (2018).

  9. 9.

    Hemsoth, N. Intel, Nervana shed light on deep learning chip architecture. The Next Platform https://www.nextplatform.com/2018/01/11/intel-nervana-shed-light-deep-learning-chip-architecture/ (2018).

  10. 10.

    Markidis, S. et al. Nvidia tensor core programmability, performance & precision. Preprint at https://arxiv.org/abs/1803.04014 (2018).

  11. 11.

    Merolla, P. A. et al. A million spiking-neuron integrated circuit with a scalable communication network and interface. Science 345, 668–673 (2014).

  12. 12.

    Khan, M. M. et al. Spinnaker: mapping neural networks onto a massively-parallel chip multiprocessor. In IEEE International Joint Conference on Neural Networks, 2008, IJCNN 2008 (IEEE World Congress on Computational Intelligence) 2849–2856 (IEEE, 2008).

  13. 13.

    Schuman, C. D. et al. A survey of neuromorphic computing and neural networks in hardware. Preprint at https://arxiv.org/abs/1705.06963 (2017).

  14. 14.

    James, C. D. et al. A historical survey of algorithms and hardware architectures for neural-inspired and neuromorphic computing applications. Biolog. Inspired Cogn. Architec. 19, 49–64 (2017).

  15. 15.

    Knight, J. C., Tully, P. J., Kaplan, B. A., Lansner, A. & Furber, S. B. Large-scale simulations of plastic neural networks on neuromorphic hardware. Front. Neuroanat. 10, 37 (2016).

  16. 16.

    Sze, V., Chen, Y.-H., Yang, T.-J. & Emer, J. S. Efficient processing of deep neural networks: a tutorial and survey. Proc. IEEE 105, 2295–2329 (2017).

  17. 17.

    Bergstra, J., Yamins, D. & Cox, D. D. Hyperopt: a python library for optimizing the hyperparameters of machine learning algorithms. In Proceedings of the 12th Python in Science Conference 13–20 (Citeseer, 2013).

  18. 18.

    Li, L., Jamieson, K., DeSalvo, G., Rostamizadeh, A. & Talwalkar, A. Hyperband: a novel bandit-based approach to hyperparameter optimization. J. Mach. Learn. Res. 18, 6765–6816 (2017).

  19. 19.

    Lin, T.-Y. et al. Microsoft coco: common objects in context. In European Conference on Computer Vision, 740–755 (Springer, 2014).

  20. 20.

    Hunsberger, E. & Eliasmith, C. Training spiking deep networks for neuromorphic hardware. Preprint at https://arxiv.org/abs/1611.05141 (2016).

  21. 21.

    Esser, S. K., Appuswamy, R., Merolla, P., Arthur, J. V. & Modha, D. S. Backpropagation for energy-efficient neuromorphic computing. In Advances in Neural Information Processing Systems 28 (eds Cortes, C., Lawrence, N. D., Lee, D. D., Sugiyama, M. & Garnett, R.) 1117–1125 (Curran Associates, Red Hook, 2015).

  22. 22.

    Esser, S. et al. Convolutional networks for fast, energy-efficient neuromorphic computing. 2016. Preprint at http://arxiv.org/abs/1603.08270 (2016).

  23. 23.

    Rueckauer, B., Lungu, I.-A., Hu, Y., Pfeiffer, M. & Liu, S.-C. Conversion of continuous-valued deep networks to efficient event-driven networks for image classification. Front. Neurosci. 11, 682 (2017).

  24. 24.

    Bohte, S. M., Kok, J. N. & La Poutré, J. A. Spikeprop: backpropagation for networks of spiking neurons. In European Symposium on Artificial Neural Networks 419–424 (ELEN, London, 2000).

  25. 25.

    Huh, D. & Sejnowski, T. J. Gradient descent for spiking neural networks. Preprint at https://arxiv.org/abs/1706.04698 (2017).

  26. 26.

    Cao, Y., Chen, Y. & Khosla, D. Spiking deep convolutional neural networks for energy-efficient object recognition. Int. J. Comput. Vis. 113, 54–66 (2015).

  27. 27.

    Hunsberger, E. & Eliasmith, C. Spiking deep networks with LIF neurons. Preprint at https://arxiv.org/abs/1510.08829 (2015).

  28. 28.

    Liew, S. S., Khalil-Hani, M. & Bakhteri, R. Bounded activation functions for enhanced training stability of deep neural networks on visual pattern recognition problems. Neurocomputing 216, 718–734 (2016).

  29. 29.

    Nise, N. S. Control Systems Engineering, 5th edn (Wiley, New York, NY, 2008).

  30. 30.

    Chollet, F. et al. Keras https://github.com/fchollet/keras (2015).

  31. 31.

    Rothganger, F., Warrender, C. E., Trumbo, D. & Aimone, J. B. N2A: a computational tool for modeling from neurons to algorithms. Front. Neural Circuits 8, 1 (2014).

  32. 32.

    Davison, A. P. et al. Pynn: a common interface for neuronal network simulators. Front. Neuroinform. 2, 11 (2009).

  33. 33.

    Hubara, I., Courbariaux, M., Soudry, D., El-Yaniv, R. & Bengio, Y. Binarized neural networks. In Proceedings of Advances in Neural Information Processing Systems 4107–4115 (Curran Associates, Red Hook, 2016).

  34. 34.

    LeCun, Y., Cortes, C. & Burges, C. Mnist handwritten digit database. AT&T Labs http://yann.lecun.com/exdb/mnist 2 (2010).

  35. 35.

    Xiao, H., Rasul, K. & Vollgraf, R. Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. Preprint at https://arxiv.org/abs/1708.07747 (2017).

  36. 36.

    Krizhevsky, A. & Hinton, G. Learning Multiple Layers of Features from Tiny Images. Technical Report, Univ. Toronto (2009).

Download references

Acknowledgements

This work was supported by Sandia National Laboratories’ Laboratory Directed Research and Development (LDRD) Program under the Hardware Acceleration of Adaptive Neural Algorithms Grand Challenge project and the DOE Advanced Simulation and Computing program. Sandia National Laboratories is a multi-mission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, a wholly owned subsidiary of Honeywell International, for the US Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525.

This Article describes objective technical results and analysis. Any subjective views or opinions that might be expressed in the paper do not necessarily represent the views of the US Department of Energy or the US Government.

Author information

All authors contributed to Whetstone algorithm theory and design. W.S. and R.D. implemented code and performed experiments. W.S., C.M.V., R.D. and J.B.A. analysed results. W.S., C.M.V. and J.B.A. wrote the manuscript.

Competing interests

The authors declare no competing interests.

Correspondence to William Severa or James B. Aimone.

Supplementary information

Supplementary Information

Supplementary notes and figures

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Further reading

Fig. 1: Overview of the Whetstone process.
Fig. 2: Training a single network through the Whetstone process.
Fig. 3: How Whetstone training influences the performance of different network topologies and tasks.
Fig. 4: Whetstone training requires N-hot output encodings.
Fig. 5: Whetstone has the ability to sharpen diverse networks.