Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Primer
  • Published:

Piecewise linear neural networks and deep learning

Abstract

As a powerful modelling method, piecewise linear neural networks (PWLNNs) have proven successful in various fields, most recently in deep learning. To apply PWLNN methods, both the representation and the learning have long been studied. In 1977, the canonical representation pioneered the works of shallow PWLNNs learned by incremental designs, but the applications to large-scale data were prohibited. In 2010, rectified linear units (ReLU) advocated the prevalence of PWLNNs in deep learning. Ever since, PWLNNs have been successfully applied to many tasks and achieved excellent performance. In this Primer, we systematically introduce the methodology of PWLNNs by grouping the works into shallow and deep networks. First, different PWLNN representation models are constructed with elaborated examples. With PWLNNs, the evolution of learning algorithms for data is presented and fundamental theoretical analysis follows up for in-depth understandings. Then, representative applications are introduced together with discussions and outlooks.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: General workflow of applying the PWLNN method.
Fig. 2: Illustration of the topology of PWLNN representations, where the outputs of squared nodes denote the PWL mappings.
Fig. 3: Simple illustrations to visualize the resulting PWLNNs in Eqs (9) and (12).
Fig. 4: Illustration of the hinge function and its hinging hyperplanes.
Fig. 5: Simple illustration of the geometrical description and tree searching of learning an AHH.
Fig. 6: Illustration of the basis functions of SBF representation.

Similar content being viewed by others

References

  1. Leenaerts, D. & Van Bokhoven, W. M. Piecewise Linear Modeling and Analysis (Springer Science & Business Media, 2013).

  2. Folland, G. B. Real Analysis: Modern Techniques and Their Applications (Wiley Interscience, 1999).

  3. Chien, M.-J. & Kuh, E. Solving nonlinear resistive networks using piecewise-linear analysis and simplicial subdivision. IEEE Trans. Circuits Syst. 24, 305–317 (1977).

    Article  MathSciNet  MATH  Google Scholar 

  4. Chua, L. O. & Deng, A. Canonical piecewise-linear representation. IEEE Trans. Circuits Syst. 35, 101–111 (1988). This paper presents a systematic analysis of CPLR, including some crucial properties of PWLNNs.

    Article  MathSciNet  MATH  Google Scholar 

  5. Chua, L. O. & Kang, S. Section-wise piecewise-linear functions: canonical representation, properties, and applications. Proc. IEEE 65, 915–929 (1977). This paper proposes the pioneering compact expression for PWL functions and formally introduces it for circuit systems, and analytical analysis for PWL functions has since become viable.

    Article  Google Scholar 

  6. Nair, V. & Hinton, G. in Proc. Int. Conf. on Machine Learning (eds Fürnkranz, J. & Joachims, T.) 807–814 (Omnipress, 2010). This paper initiates the prevalence and state-of-the-art performance of PWL-DNNs, and establishes the most popular ReLU.

  7. Kang, S. & Chua, L. O. A global representation of multidimensional piecewise-linear functions with linear partitions. IEEE Trans. Circuits Syst. 25, 938–940 (1978).

    Article  MATH  Google Scholar 

  8. Lin, J. N. & Unbehauen, R. Canonical representation: from piecewise-linear function to piecewise-smooth functions. IEEE Trans. Circuits Syst. I Fundamental Theory Appl. 40, 461–468 (1993).

    Article  MathSciNet  MATH  Google Scholar 

  9. Breiman, L. Hinging hyperplanes for regression, classification, and function approximation. IEEE Trans. Inf. Theory 39, 999–1013 (1993). This paper introduces the hinging hyperplanes representation model and its hinge-finding learning algorithm. The connection with ReLU in PWL-DNNs can be referred to.

    Article  MathSciNet  MATH  Google Scholar 

  10. Lin, J. N. & Unbehauen, R. Explicit piecewise-linear models. IEEE Trans. Circuits Syst. I Fundamental Theory Appl. 41, 931–933 (1995).

    MATH  Google Scholar 

  11. Tarela, J. & Martínez, M. Region configurations for realizability of lattice piecewise-linear models. Math. Computer Model. 30, 17–27 (1999). This work presents formal proofs on the universal representation ability of the lattice representation and summarizes different locally linear subregion realizations.

    Article  MathSciNet  MATH  Google Scholar 

  12. Julián, P. The complete canonical piecewise-linear representation: functional form for minimal degenerate intersections. IEEE Trans. Circuits Syst. I Fundamental Theory Appl. 50, 387–396 (2003).

    Article  MathSciNet  MATH  Google Scholar 

  13. Wen, C., Wang, S., Li, F. & Khan, M. J. A compact f–f model of high-dimensional piecewise-linear function over a degenerate intersection. IEEE Trans. Circuits Syst. I Regul. Pap. 52, 815–821 (2005).

    Article  MathSciNet  MATH  Google Scholar 

  14. Wang, S. & Sun, X. Generalization of hinging hyperplanes. IEEE Trans. Inf. Theory 51, 4425–4431 (2005). This paper presents the idea of inserting multiple linear functions in the hinge, and formal proofs are given for the universal representation ability for continuous PWL functions. The connection with maxout in PWL-DNNs can be referred to.

    Article  MathSciNet  MATH  Google Scholar 

  15. Sun, X. & Wang, S. A special kind of neural networks: continuous piecewise linear functions. Lecture Notes Computer Sci. 3496, 375–379 (2005).

    Article  MATH  Google Scholar 

  16. Xu, J., Huang, X. & Wang, S. Adaptive hinging hyperplanes and its applications in dynamic system identification. Automatica 45, 2325–2332 (2009).

    Article  MathSciNet  MATH  Google Scholar 

  17. Yu, J., Wang, S. & Li, L. Incremental design of simplex basis function model for dynamic system identification. IEEE Trans. Neural Netw. Learn. Syst. 29, 4758–4768 (2017).

    Article  MathSciNet  Google Scholar 

  18. Chua, O. L. & Deng, A. C. Canonical piecewise-linear analysis — part II: tracing driving-point and transfer characteristics. IEEE Trans. Circuits Syst. 32, 417–444 (1985).

    Article  MATH  Google Scholar 

  19. Wang, S. General constructive representations for continuous piecewise-linear functions. IEEE Trans. Circuits Syst. I Regul. Pap. 51, 1889–1896 (2004). This paper considers a general constructive method for representing an arbitrary PWL function, in which significant differences and connections between different representation models are vigorously discussed. Many theoretical analyses on PWL-DNNs adopt the theorems and lemmas proposed.

    Article  MathSciNet  MATH  Google Scholar 

  20. Wang, S., Huang, X. & Yam, Y. A neural network of smooth hinge functions. IEEE Trans. Neural Netw. 21, 1381–1395 (2010).

    Article  Google Scholar 

  21. Xu, J., Huang, X. & Wang, S. in Proc. American Control Conf. 2505–2510 (IEEE, 2010).

  22. Mu, X., Huang, X. & Wang, S. Dynamic behavior of piecewise-linear approximations. J. Tsinghua Univ. 51, 879–883 (2011).

    MathSciNet  MATH  Google Scholar 

  23. Huang, X., Xu, J. & Wang, S. Exact penalty and optimality condition for nonseparable continuous piecewise linear programming. J. Optim. Theory Appl. 155, 145–164 (2012).

    Article  MathSciNet  MATH  Google Scholar 

  24. Xu, J., Boom, T., Schutter, B. & Wang, S. Irredundant lattice representations of continuous piecewise affine functions. Automatica 70, 109–120 (2016).

    Article  MathSciNet  MATH  Google Scholar 

  25. Xu, J., Boom, T., Schutter, B. & Luo, X. Minimal conjunctive normal expression of continuous piecewise affine functions. IEEE Trans. Autom. Control. 61, 1340–1345 (2016).

    Article  MathSciNet  MATH  Google Scholar 

  26. Pucar, P. & Millnert, M. in Proc. 3rd European Control Conf. 1173–1178 (Linköping Univ., 1995).

  27. Hush, D. & Horne, B. Efficient algorithms for function approximation with piecewise linear sigmoidal networks. IEEE Trans. Neural Netw. 9, 1129–1141 (1998).

    Article  Google Scholar 

  28. Wang, S. & Narendra, K. S. in Proc. American Control Conf. 388–393 (IEEE, 2002).

  29. Wen, C., Wang, S., Jin, X. & Ma, X. Identification of dynamic systems using piecewise-affine basis function models. Automatica 43, 1824–1831 (2007).

    Article  MathSciNet  MATH  Google Scholar 

  30. Wang, S., Huang, X. & Khan Junaid, K. M. Configuration of continuous piecewise-linear neural networks. IEEE Trans. Neural Netw. 19, 1431–1445 (2008).

    Article  Google Scholar 

  31. Huang, X., Xu, J. & Wang, S. in Proc. American Control Conf. 4431–4936 (IEEE, 2010). This paper proposes a gradient descent learning algorithm for PWLNNs, where domain partitions and parameter optimizations are both elucidated.

  32. Krizhevsky, A., Sutskever, I. & Hinton, G. E. in Adv. Neural Inf. Process. Syst. (eds Bartlett, P. L., Pereira, F. C. N., Burges, C. J. C., Bottou, L. & Weinberger, K. Q.) 1097–1105 (NIPS, 2012).

  33. He, K., Zhang, X., Ren, S. & Sun, J. in Proc. IEEE Conf. Computer Vision and Pattern Recognition 770–778 (IEEE, 2016).

  34. Huang, G., Liu, Z., van der Maaten, L. & Weinberger, K. Q. in Proc. IEEE Conf. Computer Vision and Pattern Recognition 2261–2269 (IEEE, 2017).

  35. Arora, R., Basu, A., Mianjy, P. & Mukherjee, A. in Proc. Int. Conf. Learning Representations (ICLR, 2018).

  36. Paszke, A. et al. in Adv. Neural Inf. Process. Syst. Vol. 32, 8024–8035 (NIPS, 2019).

  37. Julián, P. A High Level Canonical Piecewise Linear Representation: Theory and Applications. Ph.D. thesis, Universidad Nacional del Sur (Argentina) (1999). This dissertation gives a very good view on the PWL functions and their applications mainly in circuit systems developed before the 2000s.

  38. Ohnishi, M. & Inaba, N. A singular bifurcation into instant chaos in a piecewise-linear circuit. IEEE Trans. Circuits Syst. I Fundamental Theory Appl. 41, 433–442 (1994).

    Article  MathSciNet  Google Scholar 

  39. Itoh, M. & Chua, L. O. Memristor oscillators. Int. J. Bifurc. Chaos 18, 3183–3206 (2008).

    Article  MathSciNet  MATH  Google Scholar 

  40. Bradley, P. S., Mangasarian, O. L. & Street, W. N. in Adv. Neural Inf. Process. Syst. (eds Mozer, M., Jordan, M. I. & Petsche, T.) 368–374 (NIPS, 1996).

  41. Kim, D. & Pardalos, P. M. A dynamic domain contraction algorithm for nonconvex piecewise linear network flow problems. J. Glob. Optim. 17, 225–234 (2000).

    Article  MathSciNet  MATH  Google Scholar 

  42. Balakrishnan, A. & Graves, S. C. A composite algorithm for a concave-cost network flow problem. Networks 19, 175–202 (2010).

    Article  MathSciNet  MATH  Google Scholar 

  43. Liu, K., Xu, Z., Xi, X. & Wang, S. Sparse signal reconstruction via concave continuous piecewise linear programming. Dig. Signal. Process. 54, 12–26 (2016).

    Article  Google Scholar 

  44. Liu, K., Xi, X., Xu, Z. & Wang, S. A piecewise linear programming algorithm for sparse signal reconstruction. Tsinghua Sci. Technol. 22, 29–41 (2017).

    Article  MATH  Google Scholar 

  45. Zhang, H. & Wang, S. Global optimization of separable objective functions on convex polyhedra via piecewise-linear approximation. J. Comput. Appl. Math. 197, 212–217 (2006).

    Article  ADS  MathSciNet  MATH  Google Scholar 

  46. Zhang, H. & Wang, S. Linearly constrained global optimization via piecewise-linear approximation. J. Comput. Appl. Math. 214, 111–120 (2008).

    Article  ADS  MathSciNet  MATH  Google Scholar 

  47. Guisewite, G. M. & Pardalos, P. M. Minimum concave-cost network flow problems: applications, complexity, and algorithms. Ann. Oper. Res. 25, 75–99 (1991).

    Article  MathSciNet  MATH  Google Scholar 

  48. Burkard, R. E., Dollani, H. & Thach, P. T. Linear approximations in a dynamic programming approach for the uncapacitated single-source minimum concave cost network flow problem in acyclic networks. J. Glob. Optim. 19, 121–139 (2001).

    Article  MathSciNet  MATH  Google Scholar 

  49. Xi, X., Huang, X., Suykens, J. A. K. & Wang, S. Coordinate descent algorithm for ramp loss linear programming support vector machines. Neural Process. Lett. 43, 887–903 (2016).

    Article  Google Scholar 

  50. Xu, Z., Liu, K., Xi, X. & Wang, S. in Proc. IEEE Conf. Decision and Control 6609–6616 (IEEE, 2015).

  51. Goodfellow, I., Warde-Farley, D., Mirza, M., Courville, A. & Bengio, Y. in Proc. Int. Conf. Machine Learning Vol. 28 (eds Dasgupta, S. & McAllester, D.) 1319–1327 (PMLR, 2013). This paper proposes a flexible PWL activation function for PWL-DNNs, and ReLU can be regarded as its special case, and analysis on the universal approximation ability and the relations to the shallow-architectured PWLNNs are given.

  52. Hopfield, J. J. Neural networks and physical systems with emergent collective computational abilities. Proc. Natl Acad. Sci. USA 79, 2554–2558 (1982).

    Article  ADS  MathSciNet  MATH  Google Scholar 

  53. Kahlert, C. & Chua, L. O. A generalized canonical piecewise-linear representation. IEEE Trans. Circuits Syst. 37, 373–383 (1990).

    Article  MathSciNet  MATH  Google Scholar 

  54. Lin, J., Xu, H.-Q. & Unbehauen, R. A generalization of canonical piecewise-linear functions. IEEE Trans. Circuits Syst. I Fundamental Theory Appl. 41, 345–347 (1994).

    Article  MathSciNet  Google Scholar 

  55. Ernst, S. in Proc. IEEE Conf. Decision and Control Vol. 2, 1266–1271 (IEEE, 1998).

  56. Pucar, P. & Sjöberg, J. On the hinge-finding algorithm for hinging hyperplanes. IEEE Trans. Inf. Theory 44, 3310–3319 (1998).

    Article  MathSciNet  MATH  Google Scholar 

  57. Ramirez, D. R., Camacho, E. F. & Arahal, M. R. Implementation of min–max MPC using hinging hyperplanes. application to a heat exchanger. Control. Eng. Pract. 12, 1197–1205 (2004).

    Article  Google Scholar 

  58. Huang, X., Matijaš, M. & Suykens, J. A. Hinging hyperplanes for time-series segmentation. IEEE Trans. Neural Netw. Learn. Syst. 24, 1279–1291 (2013).

    Article  Google Scholar 

  59. Huang, X., Xu, J. & Wang, S. in Proc. IEEE Int. Conf. Systems, Man and Cybernetics 1121–1126 (IEEE, 2010).

  60. Julián, P., Desages, A. & Agamennoni, O. High-level canonical piecewise linear representation using a simplicial partition. IEEE Trans. Circuits Syst. I Fundamental Theory Appl. 46, 463–480 (1999).

    Article  MathSciNet  MATH  Google Scholar 

  61. Padberg, M. Approximating separable nonlinear functions via mixed zero–one programs. Oper. Res. Lett. 27, 1–5 (2000).

    Article  MathSciNet  MATH  Google Scholar 

  62. Croxton, K. L., Gendron, B. & Magnanti, T. L. A comparison of mixed-integer programming models for nonconvex piecewise linear cost minimization problems. Manag. Sci. 49, 1268–1273 (2003).

    Article  MATH  Google Scholar 

  63. Keha, A. B., de Farias, I. R. & Nemhauser, G. L. A branch-and-cut algorithm without binary variables for nonconvex piecewise linear optimization. Oper. Res. 54, 847–858 (2006).

    Article  MathSciNet  MATH  Google Scholar 

  64. Vielma, J. P., Ahmed, S. & Nemhauser, G. Mixed-integer models for nonseparable piecewise-linear optimization: unifying framework and extensions. Oper. Res. 58, 303–315 (2010).

    Article  MathSciNet  MATH  Google Scholar 

  65. Wilkinson, R. A method of generating functions of several variables using analog diode logic. IEEE Trans. Electron. Computers 12, 112–129 (1963).

    Article  MATH  Google Scholar 

  66. Birkhoff & Garrett. Lattice theory. Bull. Am. Math. Soc. 64, 50–57 (1958).

  67. Streubel, T., Griewank, A., Radons, M. & Bernt, J.-U. in Proc. IFIP Conf. System Modeling and Optimization 327–336 (Springer, 2013).

  68. Griewank, A. On stable piecewise linearization and generalized algorithmic differentiation. Optim. Methods Softw. 28, 1139–1178 (2013).

    Article  MathSciNet  MATH  Google Scholar 

  69. Fiege, S., Walther, A. & Griewank, A. An algorithm for nonsmooth optimization by successive piecewise linearization. Math. Program. 177, 343–370 (2019).

    Article  MathSciNet  MATH  Google Scholar 

  70. Griewank, A. & Walther, A. Polyhedral DC decomposition and DCA optimization of piecewise linear functions. Algorithms 13, 166 (2020).

    Article  MathSciNet  Google Scholar 

  71. Glorot, X., Bordes, A. & Bengio, Y. Deep sparse rectifier neural networks. PMLR 15, 315–323 (2011).

    Google Scholar 

  72. McCulloch, W. S. & Pitts, W. A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 5, 115–133 (1943).

    Article  MathSciNet  MATH  Google Scholar 

  73. Batruni, R. A multilayer neural network with piecewise-linear structure and back-propagation learning. IEEE Trans. Neural Netw. 2, 395–403 (1991).

    Article  Google Scholar 

  74. Lin, J. N. & Unbehauen, R. Canonical piecewise-linear networks. IEEE Trans. Neural Netw. 6, 43–50 (1995). This work depicts network topology for G-CPLR, and also discusses the idea of introducing general PWL activation functions for PWL-DNNs, yet without numerical evaluations.

    Article  Google Scholar 

  75. Rawat, W. & Wang, Z. Deep convolutional neural networks for image classification: a comprehensive review. Neural Comput. 29, 2352–2449 (2017).

    Article  MathSciNet  MATH  Google Scholar 

  76. Xu, J. et al. Efficient hinging hyperplanes neural network and its application in nonlinear system identification. Automatica 116, 108906 (2020).

    Article  MathSciNet  MATH  Google Scholar 

  77. Jin, X. et al. in Proc. AAAI Conf. Artificial Intelligence (eds Schuurmans, D. & Wellman, M. P.) 1737–1743 (AAAI, 2016).

  78. Agostinelli, F., Hoffman, M. D., Sadowski, P. J. & Baldi, P. in Workshop Track of International Conference on Learning Representations (eds Bengio, Y. & LeCun, Y.) (ICLR, 2015).

  79. Suykens, J. A., Huang, A. & Chua, L. O. A family of n-scroll attractors from a generalized Chua’s circuit. Arch. fur Elektronik und Ubertragungstechnik 51, 131–137 (1997).

    Google Scholar 

  80. Friedman, J. H. et al. Multivariate adaptive regression splines. Ann. Stat. 19, 1–67 (1991).

    MathSciNet  MATH  Google Scholar 

  81. Wang, Y. & Witten, I. H. in Poster Papers of the 9th Eur. Conf. Machine Learning (ECML, 1997).

  82. Tao, Q. et al. Learning with continuous piecewise linear decision trees. Expert. Syst. Appl. 168, 114–214 (2020).

    Google Scholar 

  83. Ferrari-Trecate, G., Muselli, M., Liberati, D. & Morari, M. A clustering technique for the identification of piecewise affine systems. Automatica 39, 205–217 (2003).

    Article  MathSciNet  MATH  Google Scholar 

  84. Nakada, H., Takaba, K. & Katayama, T. Identification of piecewise affine systems based on statistical clustering technique. Automatica 41, 905–913 (2005).

    Article  MathSciNet  MATH  Google Scholar 

  85. Bottou, L. Stochastic gradient learning in neural networks. Proc. Neuro-Nimes 91, 12 (1991).

    Google Scholar 

  86. Jin, C., Netrapalli, P., Ge, R., Kakade, S. M. & Jordan, M. I. On nonconvex optimization for machine learning: gradients, stochasticity, and saddle points. J. ACM 68, 1–29 (2021).

    Article  MathSciNet  Google Scholar 

  87. Duchi, J., Hazan, E. & Singer, Y. Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011).

    MathSciNet  MATH  Google Scholar 

  88. Kingma, D. P. & Ba, J. in Posters of the International Conference on Learning Representations (ICLR, 2015).

  89. Gupta, V., Koren, T. & Singer, Y. in Proc. Int. Conf. Machine Learning Vol. 80 (eds Dy, J. G. & Krause, A.) 1845–1850 (ICML, 2018).

  90. Anil, R., Gupta, V., Koren, T., Regan, K. & Singer, Y. Scalable second order optimization for deep learning. Preprint at https://arxiv.org/abs/2002.09018 (2020).

  91. He, K., Zhang, X., Ren, S. & Sun, J. in Proc. IEEE Int. Conf. Computer Vision 1026–1034 (IEEE, 2015). This paper presents modifications of optimization strategies on the PWL-DNNs and a novel PWL activation function, where PWL-DNNs can be delved into fairly deep.

  92. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. & Salakhutdinov, R. Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014).

    MathSciNet  MATH  Google Scholar 

  93. Ioffe, S. & Szegedy, C. in Proc. Int. Conf. Machine Learning Vol. 37 (eds Bach, F. R. & Blei, D. M.) 448–456 (2015).

  94. Shorten, C. & Khoshgoftaar, T. M. A survey on image data augmentation for deep learning. J. Big Data 6, 1–48 (2019).

    Article  Google Scholar 

  95. Erhan, D., Courville, A., Bengio, Y. & Vincent, P. in Proc. Int. Conf. Artificial Intelligence and Statistics (eds Teh, Y. W. & Titterington, D. M.) 201–208 (PMLR, 2010).

  96. Neyshabur, B., Wu, Y., Salakhutdinov, R. & Srebro, N. in Adv. Neural Inf. Process. Syst. Vol. 29 (eds Lee, D. D. et al.) 3477–3485 (2016).

  97. Meng, Q. et al. in Proc. Int. Conf. Learning Representations (ICLR, 2019).

  98. Wang, G., Giannakis, G. B. & Chen, J. Learning relu networks on linearly separable data: algorithm, optimality, and generalization. IEEE Trans. Signal. Process. 67, 2357–2370 (2019).

    Article  ADS  MathSciNet  MATH  Google Scholar 

  99. Tsay, C., Kronqvist, J., Thebelt, A. & Misener, R. Partition-based formulations for mixed-integer optimization of trained ReLU neural networks. Adv. Neural Inf. Process. Syst. 34, 2993–3003 (2021).

    Google Scholar 

  100. Ergen, T. & Pilanci, M. in Proc. Int. Conf. Mach. Learn. Vol. 139 (eds Meila, M. & Zhang, T.) 2993–3003 (PMLR, 2021).

  101. Wen, W., Wu, C., Wang, Y., Chen, Y. & Li, H. Learning structured sparsity in deep neural networks. Adv. Neural Inf. Process. Syst. 29, 2074–2082 (2016).

    Google Scholar 

  102. Han, S., Pool, J., Tran, J. & Dally, W. Learning both weights and connections for efficient neural network. Adv. Neural Inf. Process. Syst. 28, 1135–1143 (2015).

    Google Scholar 

  103. Denton, E. L., Zaremba, W., Bruna, J., LeCun, Y. & Fergus, R. in Adv. Neural Inf. Process. Syst. Vol 27 (eds Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N. D. & Weinberger, K. Q.) 1269–1277 (2014).

  104. Frankle, J. & Carbin, M. in Proc. Int. Conf. Learning Representations 6336–6347 (ICLR2019).

  105. Zoph, B. & Le, Q. V. in Proc. Int. Conf. Learning Representations (ICLR, 2017).

  106. Tao, Q., Xu, J., Suykens, J. A. K. & Wang, S. in Proc. IEEE Conf. Decision and Control 1482–1487 (IEEE, 2018).

  107. Cybenko, G. Approximation by superpositions of a sigmoidal function. Math. Control. Signals Syst. 2, 303–314 (1989).

    Article  MathSciNet  MATH  Google Scholar 

  108. Kurková, V. Kolmogorov’s theorem and multilayer neural networks. Neural Netw. 5, 501–506 (1992).

    Article  Google Scholar 

  109. Hornik, K., Stinchcombe, M. & White, H. Multilayer feedforward networks are universal approximators. Neural Netw. 2, 359–366 (1989).

    Article  MATH  Google Scholar 

  110. Yarotsky, D. Error bounds for approximations with deep ReLU networks. Neural Netw. 94, 103–114 (2017).

    Article  MATH  Google Scholar 

  111. Lu, Z., Pu, H., Wang, F., Hu, Z. & Wang, L. in Adv. Neural Inf. Process. Syst. Vol. 30 (eds Guyon, I. et al.) 6231–6239 (NIPS, 2017).

  112. Lin, H. & Jegelka, S. in Proc. Adv. Neural Inf. Process. Syst. Vol. 31 (eds Bengio, S. et al.) 1–10 (NIPS, 2018).

  113. Barron, A. R. Universal approximation bounds for superpositions of a sigmoidal function. IEEE Trans. Inf. Theory 39, 930–945 (1993).

    Article  MathSciNet  MATH  Google Scholar 

  114. Cohen, N. & Shashua, A. in Proc. Int. Conf. Machine Learning Vol. 48 (eds Balcan, M.-F. & Weinberger, K. Q.) 955–963 (2016).

  115. Kumar, A., Serra, T. & Ramalingam, S. Equivalent and approximate transformations of deep neural networks. Preprint at http://arxiv.org/abs/1905.11428 (2019).

  116. DeVore, R., Hanin, B. & Petrova, G. Neural network approximation. Acta Numerica 30, 327–444 (2021). This work describes approximation properties of neural networks as they are presently understood and also discusses their performance with other methods of approximation, where ReLU are centred in the analysis involving univariate and multivariate forms with both shallow and deep architectures.

    Article  MathSciNet  Google Scholar 

  117. Huang, S.-C. & Huang, Y.-F. Bounds on the number of hidden neurons in multilayer perceptrons. IEEE Trans. Neural Netw. 2, 47–55 (1991).

    Article  ADS  Google Scholar 

  118. Mirchandani, G. & Cao, W. On hidden nodes for neural nets. IEEE Trans. Circuits Syst. 36, 661–664 (1989).

    Article  MathSciNet  Google Scholar 

  119. Huang, G.-B. Learning capability and storage capacity of two-hidden-layer feedforward networks. IEEE Trans. Neural Netw. 14, 274–281 (2003).

    Article  Google Scholar 

  120. Zhang, C., Bengio, S., Hardt, M., Recht, B. & Vinyals, O. in Proc. Int. Conf. Learning Representations (ICLR, 2017).

  121. Hardt, M. & Ma, T. in Proc. Int. Conf. Learning Representations (ICLR, 2017).

  122. Nguyen, Q. & Hein, M. Optimization landscape and expressivity of deep CNNs. PMLR 80, 3730–3739 (2018).

    Google Scholar 

  123. Yun, C., Sra, S. & Jadbabaie, A. in Adv. Neural Inf. Process. Syst. (eds Wallach, H. M. et al.) 15532–15543 (NIPS, 2019).

  124. Pascanu, R., Montufar, G. & Bengio, Y. in Adv. Neural Inf. Process. Syst. 2924–2932 (NIPS, 2014). This paper presents the novel perspective of measuring the capacity of PWL-DNNs, namely the number of linear sub-regions, where how to utilize the locally linear property is introduced with mathematical proofs and intuitive visualizations.

  125. Zaslavsky, T. Facing Up To Arrangements: Face-Count Formulas for Partitions of Space by Hyperplanes Vol. 154 (American Mathematical Society, 1975).

  126. Raghu, M., Poole, B., Kleinberg, J., Ganguli, S. & Sohl-Dickstein, J. On the expressive power of deep neural networks. PMLR 70, 2847–2854 (2017).

    Google Scholar 

  127. Serra, T., Tjandraatmadja, C. & Ramalingam, S. Bounding and counting linear regions of deep neural networks. PMLR 80, 4558–4566 (2018).

    Google Scholar 

  128. Hanin, B. & Rolnick, D. Complexity of linear regions in deep networks. PMLR 97, 2596–2604 (2019).

    Google Scholar 

  129. Xiong, H. et al. On the number of linear regions of convolutional neural networks. PMLR 119, 10514–10523 (2020).

    Google Scholar 

  130. Goodfellow, I. J., Shlens, J. & Szegedy, C. in Proc. Int. Conf. Learning Representations (ICLR, 2015).

  131. Katz, G., Barrett, C., Dill, D. L., Julian, K. & Kochenderfer, M. J. in Proc. Int. Conf. Computer Aided Verification (eds Majumdar, R. & Kuncak, V.) 97–117 (Springer, 2017).

  132. Bunel, R., Turkaslan, I., Torr, P. H. S., Kohli, P. & Mudigonda, P. K. in Adv. Neural Inf. Process. Syst. Vol. 31 (eds Bengio, S. et al.) 4795–4804 (2018).

  133. Jia, J., Cao, X., Wang, B. & Gong, N. Z. in Proc. Int. Conf. Learning Representations (ICLR, 2020).

  134. Tjeng, V., Xiao, K. Y. & Tedrake, R. in Proc. Int. Conf. Learning Representations (ICLR, 2019).

  135. Cheng, C.-H., Nührenberg, G. & Ruess, H. in International Symposium on Automated Technology for Verification and Analysis Vol. 10482, 251–268 (Springer, 2017).

  136. Wong, E. & Kolter, Z. Provable defenses against adversarial examples via the convex outer adversarial polytope. Proc. Int. Conf. Mach. Learn. 80, 5286–5295 (2018).

    Google Scholar 

  137. Stern, T. E. Piecewise-linear Network Theory (MIT Tech. Rep., 1956).

  138. Katzenelson, J. An algorithm for solving nonlinear resistor networks. Bell Syst. Technical J. 44, 1605–1620 (1965).

    Article  MATH  Google Scholar 

  139. Ohtsuki, T. & Yoshida, N. DC analysis of nonlinear networks based on generalized piecewise-linear characterization. IEEE Trans. Circuit Theory CT-18, 146–152 (1971).

    Article  Google Scholar 

  140. Chua, L. O. & Ushida, A. A switching-parameter algorithm for finding multiple solutions of nonlinear resistive circuits. Int. J. Circuit Theory Appl. 4, 215–239 (1976).

    Article  MATH  Google Scholar 

  141. Chien, M.-J. Piecewise-linear theory and computation of solutions of homeomorphic resistive networks. IEEE Trans. Circuits Syst. 24, 118–127 (1977).

    Article  MathSciNet  MATH  Google Scholar 

  142. Yamamura, K. & Ochiai, M. An efficient algorithm for finding all solutions of piecewise-linear resistive circuits. IEEE Trans. Circuits Syst. 39, 213–221 (1992).

    Article  MATH  Google Scholar 

  143. Pastore, S. & Premoli, A. Polyhedral elements: a new algorithm for capturing all the equilibrium points of piecewise-linear circuits. IEEE Trans. Circuits Syst. I Fundamental Theory Appl. 40, 124–132 (1993).

    Article  MATH  Google Scholar 

  144. Yamamura, K. & Ohshima, T. Finding all solutions of piecewise-linear resistive circuits using linear programming. IEEE Trans. Circuits Syst. I Fundamental Theory Appl. 45, 434–445 (1998).

    Article  MathSciNet  MATH  Google Scholar 

  145. Chua, L. O. Modeling of three terminal devices: a black box approach. IEEE Trans. Circuit Theory 19, 555–562 (1972).

    Article  Google Scholar 

  146. Meijer, P. B. Fast and smooth highly nonlinear multidimensional table models for device modeling. IEEE Trans. Circuits Syst. 37, 335–346 (1990).

    Article  ADS  MathSciNet  Google Scholar 

  147. Yamamura, K. On piecewise-linear approximation of nonlinear mappings containing Gummel–Poon models or Schichman–Hodges models. IEEE Trans. Circuits Syst. I Fundamental Theory Appl. 39, 694–697 (1992).

    Article  Google Scholar 

  148. Chua, L. O., Komuro, M. & Matsumoto, T. The double scroll family. IEEE Trans. Circuits Syst. 33, 1072–1118 (1986).

    Article  ADS  MATH  Google Scholar 

  149. Billings, S. & Voon, W. Piecewise linear identification of non-linear systems. Int. J. Control. 46, 215–235 (1987).

    Article  MATH  Google Scholar 

  150. Sontag, E. From linear to nonlinear: some complexity comparisons. Proc. IEEE Conf. Decis. Control. 3, 2916–2920 (1995).

    Google Scholar 

  151. Mestl, T., Plahte, E. & Omholt, S. W. Periodic solutions in systems of piecewise- linear differential equations. Dyn. Stab. Syst. 10, 179–193 (1995).

    Article  MathSciNet  MATH  Google Scholar 

  152. Yalcin, M., Suykens, J. A. & Vandewalle, J. Cellular Neural Networks, Multi-Scroll Chaos and Synchronization Vol. 50 (World Scientific, 2005).

  153. Yu, J., Mu, X., Xi, X. & Wang, S. A memristor model with piecewise window function. Radioengineering 22, 969–974 (2013).

    Google Scholar 

  154. Mu, X., Yu, J. & Wang, S. Modeling the memristor with piecewise linear function. Int. J. Numer. Model. Electron. Netw. Devices Fields 28, 96–106 (2015).

    Article  Google Scholar 

  155. Yu, Y. et al. Modeling the AginSbTe memristor. Radioengineering 24, 808–813 (2015).

    Article  Google Scholar 

  156. Yu, J. Memristor Model with Window Function and its Applications. Ph.D. thesis, Tsinghua University (2016).

  157. Bemporad, A., Torrisi, F. D. & Morari, M. in Int. Workshop on Hybrid Systems: Computation and Control (eds Lynch, N. A. & Krogh, B. H.) 45–58 (Springer, 2000).

  158. Bemporad, A., Ferrari-Trecate, G. & Morari, M. Observability and controllability of piecewise affine and hybrid systems. IEEE Trans. Autom. Control. 45, 1864–1876 (2000).

    Article  MathSciNet  MATH  Google Scholar 

  159. Heemels, W., De Schutter, B. & Bemporad, A. Equivalence of hybrid dynamical models. Automatica 37, 1085–1091 (2001).

    Article  MATH  Google Scholar 

  160. Bemporad, A. Piecewise linear regression and classification. Preprint at https://arxiv.org/abs/2103.06189 (2021).

  161. Huang, X., Xu, J. & Wang, S. Nonlinear system identification with continuous piecewise linear neural network. Neurocomputing 77, 167–177 (2012).

    Article  Google Scholar 

  162. Huang, X., Mu, X. & Wang, S. in 16th IFAC Symp. System Identification 535–540 (IFAC, 2012).

  163. Tao, Q. et al. Short-term traffic flow prediction based on the efficient hinging hyperplanes neural network. IEEE Trans. Intell. Transp. Syst. 1–13 (2022).

  164. Pistikopoulos, E. N., Dua, V., Bozinis, N. A., Bemporad, A. & Morari, M. On-line optimization via off-line parametric optimization tools. Comput. Chem. Eng. 26, 175–185 (2002).

    Article  Google Scholar 

  165. Bemporad, A., Borrelli, F. & Morari, M. Piecewise linear optimal controllers for hybrid systems. Proc. Am. Control. Conf. 2, 1190–1194 (2000). This work introduces the characteristics of PWL in control systems and the applications of PWL non-linearity.

    Google Scholar 

  166. Bemporad, A., Borrelli, F. & Morari, M. Model predictive control based on linear programming — the explicit solution. IEEE Trans. Autom. Control. 47, 1974–1985 (2002).

    Article  MathSciNet  MATH  Google Scholar 

  167. Bemporad, A., Morari, M., Dua, V. & Pistikopoulos, E. N. The explicit linear quadratic regulator for constrained systems. Automatica 38, 3–20 (2002).

    Article  MathSciNet  MATH  Google Scholar 

  168. Chikkula, Y., Lee, J. & Okunnaike, B. Dynamically scheduled model predictive control using hinging hyperplane models. AIChE J. 44, 2658–2674 (1998).

    Article  Google Scholar 

  169. Wen, C., Ma, X. & Ydstie, B. E. Analytical expression of explicit mpc solution via lattice piecewise-affine function. Automatica 45, 910–917 (2009).

    Article  MathSciNet  MATH  Google Scholar 

  170. Xu, J. & Wang, S. in Proc. IEEE Conf. Decision and Control 7240–7245 (IEEE, 2019).

  171. Maas, A., Hannun, A. Y. & Ng, A. Y. Rectifier nonlinearities improve neural network acoustic models. Proc. ICML 30, 3 (2013).

    Google Scholar 

  172. Yue-Hei Ng, J. et al. in Proc. IEEE Conf. Computer Vision and Pattern Recognition 4694–4702 (IEEE, 2015).

  173. Purwins, H. et al. Deep learning for audio signal processing. IEEE J. Sel. Top. Signal. Process. 13, 206–219 (2019).

    Article  ADS  Google Scholar 

  174. Xie, Q., Luong, M.-T., Hovy, E. & Le, Q. V. in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition 10687–10698 (IEEE, 2020).

  175. Qiao, Y. et al. FPGA-accelerated deep convolutional neural networks for high throughput and energy efficiency. Concurr. Comput.Pract. Exper. 29, e3850 (2017).

    Article  Google Scholar 

  176. Dua, D. & Graff, C. UCI machine learning repository. UCI http://archive.ics.uci.edu/ml (2017).

  177. LeCun, Y. et al. Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998). This work formally introduces the basic learning framework for generic DNNs including PWL-DNNs.

    Article  Google Scholar 

  178. Netzer, Y. et al. in NIPS Workshop on Deep Learning and Unsupervised Feature Learning 2011 (NIPS, 2011).

  179. LeCun, Y., Huang, F. J. & Bottou, L. in Proc. IEEE Computer Soc. Conf. Computer Vis. Pattern Recognit. Vol. 2, II97–II104 (IEEE, 2004).

  180. Krizhevsky, A. & Hinton, G. Learning Multiple Layers of Features from Tiny Images Technical report (Univ. of Toronto, 2009).

  181. Lin, T.-Y. et al. in Proc. Eur. Conf. Computer Vision (eds Fleet, D. J., Pajdla, T., Schiele, B. & Tuytelaars, T.) 740–755 (Springer, 2014).

  182. Russakovsky, O. et al. ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 211–252 (2015).

    Article  MathSciNet  Google Scholar 

  183. Krishna, R. et al. Visual Genome: connecting language and vision using crowdsourced dense image annotations. Int. J. Comput. Vis. 123, 32–73 (2017).

    Article  MathSciNet  Google Scholar 

  184. Abadi, M. et al. TensorFlow: large-scale machine learning on heterogeneous systems. TensorFlow https://www.tensorflow.org/ (2015).

  185. Chollet, F. Keras. GitHub https://github.com/fchollet/keras (2015).

  186. Jia, Y. et al. in Proc. ACM Int. Conf. Multimedia (eds Hua, K. A. et al.) 675–678 (ACM, 2014).

  187. Chen, T. et al. MXNet: a flexible and efficient machine learning library for heterogeneous distributed systems. Preprint at https://arxiv.org/abs/1512.01274 (2015).

  188. Bergstra, J. et al. in Proc. Python for Scientific Computing Conf. (SCIPY, 2010).

  189. Tao, Q. et al. Toward deep adaptive hinging hyperplanes. IEEE Transactions on Neural Networks and Learning Systems (IEEE, 2021).

  190. Tang, C. et al. Sparse MLP for image recognition: is self-attention really necessary? Preprint at https://arxiv.org/abs/2109.05422 (2021).

  191. Wang, Y., Li, Z., Xu, J. & Li, J. in Proc. Asian Control Conf. 1066–1071 (IEEE, 2019).

  192. Kawaguchi, K. in Adv. Neural Inf. Process. Syst. Vol. 29 (eds Lee, D. D., Sugiyama, M., von Luxburg, U., Guyon, I. & Garnett, R.) 586–594 (2016).

  193. Yun, C., Sra, S. & Jadbabaie, A. in Proc. Int. Conf. Learning Representations (ICLR, 2018).

  194. Nguyen, Q. & Hein, M. in Proc. Int. Conf. Mach. Learn. Vol. 70, 2603–2612 (PMLR, 2017).

  195. Yun, C., Sra, S. & Jadbabaie, A. in Proc. Int. Conf. Learning Representations (ICLR, 2019).

  196. Xu, B., Wang, N., Chen, T. & Li, M. in Workshop of the International Conference on Machine Learning (ICML, 2015).

  197. Liang, X. & Xu, J. Biased ReLU neural networks. Neurocomputing 423, 71–79 (2021).

    Article  Google Scholar 

  198. Shang, W., Sohn, K., Almeida, D. & Lee, H. in Proc. Int. Conf. Machine Learning Vol. 48 (eds Balcan, M.-F. & Weinberger, K. Q.) 2217–2225 (JMLR, 2016).

  199. Qiu, S., Xu, X. & Cai, B. in Proc. Int. Conf. Pattern Recognition, 1223–1228 (IEEE, 2018).

  200. Bodyanskiy, Y., Deineko, A., Pliss, I. & Slepanska, V. in Proc. Int. Workshop on Digital Content & Smart Multimedia Vol. 2533 (eds Kryvinska, N., Izonin, I., Gregus, M., Poniszewska-Maranda, A. & Dronyuk, I.) 14–22 (DCSMart Workshop, 2019).

Download references

Acknowledgements

This work is jointly supported by European Research Council (ERC) Advanced Grant E-DUALITY (787960), KU Leuven Grant CoE PFV/10/002, Grant FWO GOA4917N, EU H2020 ICT-48 Network TAILOR (Foundations of Trustworthy AI — Integrating Reasoning, Learning and Optimization), Leuven.AI Institute, National Key Research and Development Program under Grant 2021YFB2501200 and Shanghai Municipal Science and Technology Major Project (2021SHZDZX0102).

Author information

Authors and Affiliations

Authors

Contributions

Introduction (Q.T., L.L., X.H., X.X., S.W. and J.A.K.S.); Experimentation (Q.T., L.L., X.H., X.X., S.W. and J.A.K.S.); Results (Q.T., L.L., X.H., X.X., S.W. and J.A.K.S.); Applications (Q.T., L.L. and J.A.K.S.); Reproducibility and data deposition (Q.T. and X.X.); Limitations and optimizations (Q.T., X.H. and J.A.K.S.); Outlook (Q.T., L.L., X.H. and J.A.K.S.).

Corresponding authors

Correspondence to Qinghua Tao, Li Li or Xiaolin Huang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Reviews Methods Primers thanks Pedro Julian, Jun Wang, Andrea Walther and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Glossary

Induced conclusion by the Stone–Weierstrass approximation theorem

Any continuous function can be approximated by a piecewise linear (PWL) function to arbitrary accuracy.

PWL functions

(Piecewise linear functions). Functions that appear to be linear in subregions of the domain but are, in essence, non-linear in the whole domain.

Canonical piecewise linear representation

(CPLR). The pioneering compact expression by which a piecewise linear (PWL) function is constructed through a linear combination of multiple absolute-value basis functions.

Rectified linear units

(ReLU). Some of the most popular activation functions in neural networks, defined as the positive part of the arguments by max{0, x}.

Hinging hyperplanes

Two hyperplanes that constitute a hinge function, continuously joining at the so-called hinge; the hinging hyperplanes model has greatly contributed to construct flexible representation models for continuous piecewise linear (PWL) functions.

Backpropagation strategy

A strategy widely used to train feedforward neural networks and works by computing the gradients of weights of each layer in the network and iterating backward layer-wise for efficient calculation.

Stochastic gradient descent

(SGD). An iterative optimization algorithm, where the actual gradient is approximated or estimated commonly by a randomly selected subset of data.

PWL memristors

Considered the fourth (other than the resistor, the inductor and the capacitor) fundamental two-terminal circuit elements including a memory of past voltages or currents, those memristors pertaining to piecewise linear (PWL)-characterized dynamics.

Gradient vanishing problem

In the iterative updates of training deep neural networks (DNNs) with gradient-based algorithms, multiplying small values of gradients by backpropagation can lead to a very small value (approaching zero) in computing the gradients of early layers, which makes the network hard to proceed with in the training.

Least squares method

An approach to approximate the solutions of an unknown system given with a set of input–output data points by minimizing the sum of the squares of the residuals between the observed output data and the network’s output.

Gauss–Newton algorithm

A modified Newton method, which computes the second-order derivatives, to minimize a sum of squared loss in solving non-linear least squares problems.

Multivariate adaptive regression splines

A flexible regression model consisting of weighted basis functions, which are expressed in terms of the product of truncated power splines \({[\pm ({x}_{i}-\beta )]}_{+}^{q}\), and its training procedures can be interpreted as generalized tree searching based on recursive domain partitions.

Consistent variation property

Given a continuous piecewise linear (PWL) function, the necessary and sufficient condition on whether such a function can be expressed by a canonical piecewise linear representation (CPLR) model, where the properties of domain partitions and intersections between partitioned subregions are discussed; its detailed descriptions are given in the subsequent context.

Zaslavsky’s theorem of hyperplane arrangement

The maximal number of regions in \({{\mathbb{R}}}^{d}\) with an arrangement of m hyperplanes is estimated by \({\sum }_{j=0}^{n}\left(\begin{array}{l}m\\ j\end{array}\right)\).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tao, Q., Li, L., Huang, X. et al. Piecewise linear neural networks and deep learning. Nat Rev Methods Primers 2, 42 (2022). https://doi.org/10.1038/s43586-022-00125-7

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1038/s43586-022-00125-7

This article is cited by

Search

Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics