Abstract
As a powerful modelling method, piecewise linear neural networks (PWLNNs) have proven successful in various fields, most recently in deep learning. To apply PWLNN methods, both the representation and the learning have long been studied. In 1977, the canonical representation pioneered the works of shallow PWLNNs learned by incremental designs, but the applications to large-scale data were prohibited. In 2010, rectified linear units (ReLU) advocated the prevalence of PWLNNs in deep learning. Ever since, PWLNNs have been successfully applied to many tasks and achieved excellent performance. In this Primer, we systematically introduce the methodology of PWLNNs by grouping the works into shallow and deep networks. First, different PWLNN representation models are constructed with elaborated examples. With PWLNNs, the evolution of learning algorithms for data is presented and fundamental theoretical analysis follows up for in-depth understandings. Then, representative applications are introduced together with discussions and outlooks.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 1 digital issues and online access to articles
$119.00 per year
only $119.00 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Leenaerts, D. & Van Bokhoven, W. M. Piecewise Linear Modeling and Analysis (Springer Science & Business Media, 2013).
Folland, G. B. Real Analysis: Modern Techniques and Their Applications (Wiley Interscience, 1999).
Chien, M.-J. & Kuh, E. Solving nonlinear resistive networks using piecewise-linear analysis and simplicial subdivision. IEEE Trans. Circuits Syst. 24, 305–317 (1977).
Chua, L. O. & Deng, A. Canonical piecewise-linear representation. IEEE Trans. Circuits Syst. 35, 101–111 (1988). This paper presents a systematic analysis of CPLR, including some crucial properties of PWLNNs.
Chua, L. O. & Kang, S. Section-wise piecewise-linear functions: canonical representation, properties, and applications. Proc. IEEE 65, 915–929 (1977). This paper proposes the pioneering compact expression for PWL functions and formally introduces it for circuit systems, and analytical analysis for PWL functions has since become viable.
Nair, V. & Hinton, G. in Proc. Int. Conf. on Machine Learning (eds Fürnkranz, J. & Joachims, T.) 807–814 (Omnipress, 2010). This paper initiates the prevalence and state-of-the-art performance of PWL-DNNs, and establishes the most popular ReLU.
Kang, S. & Chua, L. O. A global representation of multidimensional piecewise-linear functions with linear partitions. IEEE Trans. Circuits Syst. 25, 938–940 (1978).
Lin, J. N. & Unbehauen, R. Canonical representation: from piecewise-linear function to piecewise-smooth functions. IEEE Trans. Circuits Syst. I Fundamental Theory Appl. 40, 461–468 (1993).
Breiman, L. Hinging hyperplanes for regression, classification, and function approximation. IEEE Trans. Inf. Theory 39, 999–1013 (1993). This paper introduces the hinging hyperplanes representation model and its hinge-finding learning algorithm. The connection with ReLU in PWL-DNNs can be referred to.
Lin, J. N. & Unbehauen, R. Explicit piecewise-linear models. IEEE Trans. Circuits Syst. I Fundamental Theory Appl. 41, 931–933 (1995).
Tarela, J. & Martínez, M. Region configurations for realizability of lattice piecewise-linear models. Math. Computer Model. 30, 17–27 (1999). This work presents formal proofs on the universal representation ability of the lattice representation and summarizes different locally linear subregion realizations.
Julián, P. The complete canonical piecewise-linear representation: functional form for minimal degenerate intersections. IEEE Trans. Circuits Syst. I Fundamental Theory Appl. 50, 387–396 (2003).
Wen, C., Wang, S., Li, F. & Khan, M. J. A compact f–f model of high-dimensional piecewise-linear function over a degenerate intersection. IEEE Trans. Circuits Syst. I Regul. Pap. 52, 815–821 (2005).
Wang, S. & Sun, X. Generalization of hinging hyperplanes. IEEE Trans. Inf. Theory 51, 4425–4431 (2005). This paper presents the idea of inserting multiple linear functions in the hinge, and formal proofs are given for the universal representation ability for continuous PWL functions. The connection with maxout in PWL-DNNs can be referred to.
Sun, X. & Wang, S. A special kind of neural networks: continuous piecewise linear functions. Lecture Notes Computer Sci. 3496, 375–379 (2005).
Xu, J., Huang, X. & Wang, S. Adaptive hinging hyperplanes and its applications in dynamic system identification. Automatica 45, 2325–2332 (2009).
Yu, J., Wang, S. & Li, L. Incremental design of simplex basis function model for dynamic system identification. IEEE Trans. Neural Netw. Learn. Syst. 29, 4758–4768 (2017).
Chua, O. L. & Deng, A. C. Canonical piecewise-linear analysis — part II: tracing driving-point and transfer characteristics. IEEE Trans. Circuits Syst. 32, 417–444 (1985).
Wang, S. General constructive representations for continuous piecewise-linear functions. IEEE Trans. Circuits Syst. I Regul. Pap. 51, 1889–1896 (2004). This paper considers a general constructive method for representing an arbitrary PWL function, in which significant differences and connections between different representation models are vigorously discussed. Many theoretical analyses on PWL-DNNs adopt the theorems and lemmas proposed.
Wang, S., Huang, X. & Yam, Y. A neural network of smooth hinge functions. IEEE Trans. Neural Netw. 21, 1381–1395 (2010).
Xu, J., Huang, X. & Wang, S. in Proc. American Control Conf. 2505–2510 (IEEE, 2010).
Mu, X., Huang, X. & Wang, S. Dynamic behavior of piecewise-linear approximations. J. Tsinghua Univ. 51, 879–883 (2011).
Huang, X., Xu, J. & Wang, S. Exact penalty and optimality condition for nonseparable continuous piecewise linear programming. J. Optim. Theory Appl. 155, 145–164 (2012).
Xu, J., Boom, T., Schutter, B. & Wang, S. Irredundant lattice representations of continuous piecewise affine functions. Automatica 70, 109–120 (2016).
Xu, J., Boom, T., Schutter, B. & Luo, X. Minimal conjunctive normal expression of continuous piecewise affine functions. IEEE Trans. Autom. Control. 61, 1340–1345 (2016).
Pucar, P. & Millnert, M. in Proc. 3rd European Control Conf. 1173–1178 (Linköping Univ., 1995).
Hush, D. & Horne, B. Efficient algorithms for function approximation with piecewise linear sigmoidal networks. IEEE Trans. Neural Netw. 9, 1129–1141 (1998).
Wang, S. & Narendra, K. S. in Proc. American Control Conf. 388–393 (IEEE, 2002).
Wen, C., Wang, S., Jin, X. & Ma, X. Identification of dynamic systems using piecewise-affine basis function models. Automatica 43, 1824–1831 (2007).
Wang, S., Huang, X. & Khan Junaid, K. M. Configuration of continuous piecewise-linear neural networks. IEEE Trans. Neural Netw. 19, 1431–1445 (2008).
Huang, X., Xu, J. & Wang, S. in Proc. American Control Conf. 4431–4936 (IEEE, 2010). This paper proposes a gradient descent learning algorithm for PWLNNs, where domain partitions and parameter optimizations are both elucidated.
Krizhevsky, A., Sutskever, I. & Hinton, G. E. in Adv. Neural Inf. Process. Syst. (eds Bartlett, P. L., Pereira, F. C. N., Burges, C. J. C., Bottou, L. & Weinberger, K. Q.) 1097–1105 (NIPS, 2012).
He, K., Zhang, X., Ren, S. & Sun, J. in Proc. IEEE Conf. Computer Vision and Pattern Recognition 770–778 (IEEE, 2016).
Huang, G., Liu, Z., van der Maaten, L. & Weinberger, K. Q. in Proc. IEEE Conf. Computer Vision and Pattern Recognition 2261–2269 (IEEE, 2017).
Arora, R., Basu, A., Mianjy, P. & Mukherjee, A. in Proc. Int. Conf. Learning Representations (ICLR, 2018).
Paszke, A. et al. in Adv. Neural Inf. Process. Syst. Vol. 32, 8024–8035 (NIPS, 2019).
Julián, P. A High Level Canonical Piecewise Linear Representation: Theory and Applications. Ph.D. thesis, Universidad Nacional del Sur (Argentina) (1999). This dissertation gives a very good view on the PWL functions and their applications mainly in circuit systems developed before the 2000s.
Ohnishi, M. & Inaba, N. A singular bifurcation into instant chaos in a piecewise-linear circuit. IEEE Trans. Circuits Syst. I Fundamental Theory Appl. 41, 433–442 (1994).
Itoh, M. & Chua, L. O. Memristor oscillators. Int. J. Bifurc. Chaos 18, 3183–3206 (2008).
Bradley, P. S., Mangasarian, O. L. & Street, W. N. in Adv. Neural Inf. Process. Syst. (eds Mozer, M., Jordan, M. I. & Petsche, T.) 368–374 (NIPS, 1996).
Kim, D. & Pardalos, P. M. A dynamic domain contraction algorithm for nonconvex piecewise linear network flow problems. J. Glob. Optim. 17, 225–234 (2000).
Balakrishnan, A. & Graves, S. C. A composite algorithm for a concave-cost network flow problem. Networks 19, 175–202 (2010).
Liu, K., Xu, Z., Xi, X. & Wang, S. Sparse signal reconstruction via concave continuous piecewise linear programming. Dig. Signal. Process. 54, 12–26 (2016).
Liu, K., Xi, X., Xu, Z. & Wang, S. A piecewise linear programming algorithm for sparse signal reconstruction. Tsinghua Sci. Technol. 22, 29–41 (2017).
Zhang, H. & Wang, S. Global optimization of separable objective functions on convex polyhedra via piecewise-linear approximation. J. Comput. Appl. Math. 197, 212–217 (2006).
Zhang, H. & Wang, S. Linearly constrained global optimization via piecewise-linear approximation. J. Comput. Appl. Math. 214, 111–120 (2008).
Guisewite, G. M. & Pardalos, P. M. Minimum concave-cost network flow problems: applications, complexity, and algorithms. Ann. Oper. Res. 25, 75–99 (1991).
Burkard, R. E., Dollani, H. & Thach, P. T. Linear approximations in a dynamic programming approach for the uncapacitated single-source minimum concave cost network flow problem in acyclic networks. J. Glob. Optim. 19, 121–139 (2001).
Xi, X., Huang, X., Suykens, J. A. K. & Wang, S. Coordinate descent algorithm for ramp loss linear programming support vector machines. Neural Process. Lett. 43, 887–903 (2016).
Xu, Z., Liu, K., Xi, X. & Wang, S. in Proc. IEEE Conf. Decision and Control 6609–6616 (IEEE, 2015).
Goodfellow, I., Warde-Farley, D., Mirza, M., Courville, A. & Bengio, Y. in Proc. Int. Conf. Machine Learning Vol. 28 (eds Dasgupta, S. & McAllester, D.) 1319–1327 (PMLR, 2013). This paper proposes a flexible PWL activation function for PWL-DNNs, and ReLU can be regarded as its special case, and analysis on the universal approximation ability and the relations to the shallow-architectured PWLNNs are given.
Hopfield, J. J. Neural networks and physical systems with emergent collective computational abilities. Proc. Natl Acad. Sci. USA 79, 2554–2558 (1982).
Kahlert, C. & Chua, L. O. A generalized canonical piecewise-linear representation. IEEE Trans. Circuits Syst. 37, 373–383 (1990).
Lin, J., Xu, H.-Q. & Unbehauen, R. A generalization of canonical piecewise-linear functions. IEEE Trans. Circuits Syst. I Fundamental Theory Appl. 41, 345–347 (1994).
Ernst, S. in Proc. IEEE Conf. Decision and Control Vol. 2, 1266–1271 (IEEE, 1998).
Pucar, P. & Sjöberg, J. On the hinge-finding algorithm for hinging hyperplanes. IEEE Trans. Inf. Theory 44, 3310–3319 (1998).
Ramirez, D. R., Camacho, E. F. & Arahal, M. R. Implementation of min–max MPC using hinging hyperplanes. application to a heat exchanger. Control. Eng. Pract. 12, 1197–1205 (2004).
Huang, X., Matijaš, M. & Suykens, J. A. Hinging hyperplanes for time-series segmentation. IEEE Trans. Neural Netw. Learn. Syst. 24, 1279–1291 (2013).
Huang, X., Xu, J. & Wang, S. in Proc. IEEE Int. Conf. Systems, Man and Cybernetics 1121–1126 (IEEE, 2010).
Julián, P., Desages, A. & Agamennoni, O. High-level canonical piecewise linear representation using a simplicial partition. IEEE Trans. Circuits Syst. I Fundamental Theory Appl. 46, 463–480 (1999).
Padberg, M. Approximating separable nonlinear functions via mixed zero–one programs. Oper. Res. Lett. 27, 1–5 (2000).
Croxton, K. L., Gendron, B. & Magnanti, T. L. A comparison of mixed-integer programming models for nonconvex piecewise linear cost minimization problems. Manag. Sci. 49, 1268–1273 (2003).
Keha, A. B., de Farias, I. R. & Nemhauser, G. L. A branch-and-cut algorithm without binary variables for nonconvex piecewise linear optimization. Oper. Res. 54, 847–858 (2006).
Vielma, J. P., Ahmed, S. & Nemhauser, G. Mixed-integer models for nonseparable piecewise-linear optimization: unifying framework and extensions. Oper. Res. 58, 303–315 (2010).
Wilkinson, R. A method of generating functions of several variables using analog diode logic. IEEE Trans. Electron. Computers 12, 112–129 (1963).
Birkhoff & Garrett. Lattice theory. Bull. Am. Math. Soc. 64, 50–57 (1958).
Streubel, T., Griewank, A., Radons, M. & Bernt, J.-U. in Proc. IFIP Conf. System Modeling and Optimization 327–336 (Springer, 2013).
Griewank, A. On stable piecewise linearization and generalized algorithmic differentiation. Optim. Methods Softw. 28, 1139–1178 (2013).
Fiege, S., Walther, A. & Griewank, A. An algorithm for nonsmooth optimization by successive piecewise linearization. Math. Program. 177, 343–370 (2019).
Griewank, A. & Walther, A. Polyhedral DC decomposition and DCA optimization of piecewise linear functions. Algorithms 13, 166 (2020).
Glorot, X., Bordes, A. & Bengio, Y. Deep sparse rectifier neural networks. PMLR 15, 315–323 (2011).
McCulloch, W. S. & Pitts, W. A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 5, 115–133 (1943).
Batruni, R. A multilayer neural network with piecewise-linear structure and back-propagation learning. IEEE Trans. Neural Netw. 2, 395–403 (1991).
Lin, J. N. & Unbehauen, R. Canonical piecewise-linear networks. IEEE Trans. Neural Netw. 6, 43–50 (1995). This work depicts network topology for G-CPLR, and also discusses the idea of introducing general PWL activation functions for PWL-DNNs, yet without numerical evaluations.
Rawat, W. & Wang, Z. Deep convolutional neural networks for image classification: a comprehensive review. Neural Comput. 29, 2352–2449 (2017).
Xu, J. et al. Efficient hinging hyperplanes neural network and its application in nonlinear system identification. Automatica 116, 108906 (2020).
Jin, X. et al. in Proc. AAAI Conf. Artificial Intelligence (eds Schuurmans, D. & Wellman, M. P.) 1737–1743 (AAAI, 2016).
Agostinelli, F., Hoffman, M. D., Sadowski, P. J. & Baldi, P. in Workshop Track of International Conference on Learning Representations (eds Bengio, Y. & LeCun, Y.) (ICLR, 2015).
Suykens, J. A., Huang, A. & Chua, L. O. A family of n-scroll attractors from a generalized Chua’s circuit. Arch. fur Elektronik und Ubertragungstechnik 51, 131–137 (1997).
Friedman, J. H. et al. Multivariate adaptive regression splines. Ann. Stat. 19, 1–67 (1991).
Wang, Y. & Witten, I. H. in Poster Papers of the 9th Eur. Conf. Machine Learning (ECML, 1997).
Tao, Q. et al. Learning with continuous piecewise linear decision trees. Expert. Syst. Appl. 168, 114–214 (2020).
Ferrari-Trecate, G., Muselli, M., Liberati, D. & Morari, M. A clustering technique for the identification of piecewise affine systems. Automatica 39, 205–217 (2003).
Nakada, H., Takaba, K. & Katayama, T. Identification of piecewise affine systems based on statistical clustering technique. Automatica 41, 905–913 (2005).
Bottou, L. Stochastic gradient learning in neural networks. Proc. Neuro-Nimes 91, 12 (1991).
Jin, C., Netrapalli, P., Ge, R., Kakade, S. M. & Jordan, M. I. On nonconvex optimization for machine learning: gradients, stochasticity, and saddle points. J. ACM 68, 1–29 (2021).
Duchi, J., Hazan, E. & Singer, Y. Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011).
Kingma, D. P. & Ba, J. in Posters of the International Conference on Learning Representations (ICLR, 2015).
Gupta, V., Koren, T. & Singer, Y. in Proc. Int. Conf. Machine Learning Vol. 80 (eds Dy, J. G. & Krause, A.) 1845–1850 (ICML, 2018).
Anil, R., Gupta, V., Koren, T., Regan, K. & Singer, Y. Scalable second order optimization for deep learning. Preprint at https://arxiv.org/abs/2002.09018 (2020).
He, K., Zhang, X., Ren, S. & Sun, J. in Proc. IEEE Int. Conf. Computer Vision 1026–1034 (IEEE, 2015). This paper presents modifications of optimization strategies on the PWL-DNNs and a novel PWL activation function, where PWL-DNNs can be delved into fairly deep.
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. & Salakhutdinov, R. Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014).
Ioffe, S. & Szegedy, C. in Proc. Int. Conf. Machine Learning Vol. 37 (eds Bach, F. R. & Blei, D. M.) 448–456 (2015).
Shorten, C. & Khoshgoftaar, T. M. A survey on image data augmentation for deep learning. J. Big Data 6, 1–48 (2019).
Erhan, D., Courville, A., Bengio, Y. & Vincent, P. in Proc. Int. Conf. Artificial Intelligence and Statistics (eds Teh, Y. W. & Titterington, D. M.) 201–208 (PMLR, 2010).
Neyshabur, B., Wu, Y., Salakhutdinov, R. & Srebro, N. in Adv. Neural Inf. Process. Syst. Vol. 29 (eds Lee, D. D. et al.) 3477–3485 (2016).
Meng, Q. et al. in Proc. Int. Conf. Learning Representations (ICLR, 2019).
Wang, G., Giannakis, G. B. & Chen, J. Learning relu networks on linearly separable data: algorithm, optimality, and generalization. IEEE Trans. Signal. Process. 67, 2357–2370 (2019).
Tsay, C., Kronqvist, J., Thebelt, A. & Misener, R. Partition-based formulations for mixed-integer optimization of trained ReLU neural networks. Adv. Neural Inf. Process. Syst. 34, 2993–3003 (2021).
Ergen, T. & Pilanci, M. in Proc. Int. Conf. Mach. Learn. Vol. 139 (eds Meila, M. & Zhang, T.) 2993–3003 (PMLR, 2021).
Wen, W., Wu, C., Wang, Y., Chen, Y. & Li, H. Learning structured sparsity in deep neural networks. Adv. Neural Inf. Process. Syst. 29, 2074–2082 (2016).
Han, S., Pool, J., Tran, J. & Dally, W. Learning both weights and connections for efficient neural network. Adv. Neural Inf. Process. Syst. 28, 1135–1143 (2015).
Denton, E. L., Zaremba, W., Bruna, J., LeCun, Y. & Fergus, R. in Adv. Neural Inf. Process. Syst. Vol 27 (eds Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N. D. & Weinberger, K. Q.) 1269–1277 (2014).
Frankle, J. & Carbin, M. in Proc. Int. Conf. Learning Representations 6336–6347 (ICLR2019).
Zoph, B. & Le, Q. V. in Proc. Int. Conf. Learning Representations (ICLR, 2017).
Tao, Q., Xu, J., Suykens, J. A. K. & Wang, S. in Proc. IEEE Conf. Decision and Control 1482–1487 (IEEE, 2018).
Cybenko, G. Approximation by superpositions of a sigmoidal function. Math. Control. Signals Syst. 2, 303–314 (1989).
Kurková, V. Kolmogorov’s theorem and multilayer neural networks. Neural Netw. 5, 501–506 (1992).
Hornik, K., Stinchcombe, M. & White, H. Multilayer feedforward networks are universal approximators. Neural Netw. 2, 359–366 (1989).
Yarotsky, D. Error bounds for approximations with deep ReLU networks. Neural Netw. 94, 103–114 (2017).
Lu, Z., Pu, H., Wang, F., Hu, Z. & Wang, L. in Adv. Neural Inf. Process. Syst. Vol. 30 (eds Guyon, I. et al.) 6231–6239 (NIPS, 2017).
Lin, H. & Jegelka, S. in Proc. Adv. Neural Inf. Process. Syst. Vol. 31 (eds Bengio, S. et al.) 1–10 (NIPS, 2018).
Barron, A. R. Universal approximation bounds for superpositions of a sigmoidal function. IEEE Trans. Inf. Theory 39, 930–945 (1993).
Cohen, N. & Shashua, A. in Proc. Int. Conf. Machine Learning Vol. 48 (eds Balcan, M.-F. & Weinberger, K. Q.) 955–963 (2016).
Kumar, A., Serra, T. & Ramalingam, S. Equivalent and approximate transformations of deep neural networks. Preprint at http://arxiv.org/abs/1905.11428 (2019).
DeVore, R., Hanin, B. & Petrova, G. Neural network approximation. Acta Numerica 30, 327–444 (2021). This work describes approximation properties of neural networks as they are presently understood and also discusses their performance with other methods of approximation, where ReLU are centred in the analysis involving univariate and multivariate forms with both shallow and deep architectures.
Huang, S.-C. & Huang, Y.-F. Bounds on the number of hidden neurons in multilayer perceptrons. IEEE Trans. Neural Netw. 2, 47–55 (1991).
Mirchandani, G. & Cao, W. On hidden nodes for neural nets. IEEE Trans. Circuits Syst. 36, 661–664 (1989).
Huang, G.-B. Learning capability and storage capacity of two-hidden-layer feedforward networks. IEEE Trans. Neural Netw. 14, 274–281 (2003).
Zhang, C., Bengio, S., Hardt, M., Recht, B. & Vinyals, O. in Proc. Int. Conf. Learning Representations (ICLR, 2017).
Hardt, M. & Ma, T. in Proc. Int. Conf. Learning Representations (ICLR, 2017).
Nguyen, Q. & Hein, M. Optimization landscape and expressivity of deep CNNs. PMLR 80, 3730–3739 (2018).
Yun, C., Sra, S. & Jadbabaie, A. in Adv. Neural Inf. Process. Syst. (eds Wallach, H. M. et al.) 15532–15543 (NIPS, 2019).
Pascanu, R., Montufar, G. & Bengio, Y. in Adv. Neural Inf. Process. Syst. 2924–2932 (NIPS, 2014). This paper presents the novel perspective of measuring the capacity of PWL-DNNs, namely the number of linear sub-regions, where how to utilize the locally linear property is introduced with mathematical proofs and intuitive visualizations.
Zaslavsky, T. Facing Up To Arrangements: Face-Count Formulas for Partitions of Space by Hyperplanes Vol. 154 (American Mathematical Society, 1975).
Raghu, M., Poole, B., Kleinberg, J., Ganguli, S. & Sohl-Dickstein, J. On the expressive power of deep neural networks. PMLR 70, 2847–2854 (2017).
Serra, T., Tjandraatmadja, C. & Ramalingam, S. Bounding and counting linear regions of deep neural networks. PMLR 80, 4558–4566 (2018).
Hanin, B. & Rolnick, D. Complexity of linear regions in deep networks. PMLR 97, 2596–2604 (2019).
Xiong, H. et al. On the number of linear regions of convolutional neural networks. PMLR 119, 10514–10523 (2020).
Goodfellow, I. J., Shlens, J. & Szegedy, C. in Proc. Int. Conf. Learning Representations (ICLR, 2015).
Katz, G., Barrett, C., Dill, D. L., Julian, K. & Kochenderfer, M. J. in Proc. Int. Conf. Computer Aided Verification (eds Majumdar, R. & Kuncak, V.) 97–117 (Springer, 2017).
Bunel, R., Turkaslan, I., Torr, P. H. S., Kohli, P. & Mudigonda, P. K. in Adv. Neural Inf. Process. Syst. Vol. 31 (eds Bengio, S. et al.) 4795–4804 (2018).
Jia, J., Cao, X., Wang, B. & Gong, N. Z. in Proc. Int. Conf. Learning Representations (ICLR, 2020).
Tjeng, V., Xiao, K. Y. & Tedrake, R. in Proc. Int. Conf. Learning Representations (ICLR, 2019).
Cheng, C.-H., Nührenberg, G. & Ruess, H. in International Symposium on Automated Technology for Verification and Analysis Vol. 10482, 251–268 (Springer, 2017).
Wong, E. & Kolter, Z. Provable defenses against adversarial examples via the convex outer adversarial polytope. Proc. Int. Conf. Mach. Learn. 80, 5286–5295 (2018).
Stern, T. E. Piecewise-linear Network Theory (MIT Tech. Rep., 1956).
Katzenelson, J. An algorithm for solving nonlinear resistor networks. Bell Syst. Technical J. 44, 1605–1620 (1965).
Ohtsuki, T. & Yoshida, N. DC analysis of nonlinear networks based on generalized piecewise-linear characterization. IEEE Trans. Circuit Theory CT-18, 146–152 (1971).
Chua, L. O. & Ushida, A. A switching-parameter algorithm for finding multiple solutions of nonlinear resistive circuits. Int. J. Circuit Theory Appl. 4, 215–239 (1976).
Chien, M.-J. Piecewise-linear theory and computation of solutions of homeomorphic resistive networks. IEEE Trans. Circuits Syst. 24, 118–127 (1977).
Yamamura, K. & Ochiai, M. An efficient algorithm for finding all solutions of piecewise-linear resistive circuits. IEEE Trans. Circuits Syst. 39, 213–221 (1992).
Pastore, S. & Premoli, A. Polyhedral elements: a new algorithm for capturing all the equilibrium points of piecewise-linear circuits. IEEE Trans. Circuits Syst. I Fundamental Theory Appl. 40, 124–132 (1993).
Yamamura, K. & Ohshima, T. Finding all solutions of piecewise-linear resistive circuits using linear programming. IEEE Trans. Circuits Syst. I Fundamental Theory Appl. 45, 434–445 (1998).
Chua, L. O. Modeling of three terminal devices: a black box approach. IEEE Trans. Circuit Theory 19, 555–562 (1972).
Meijer, P. B. Fast and smooth highly nonlinear multidimensional table models for device modeling. IEEE Trans. Circuits Syst. 37, 335–346 (1990).
Yamamura, K. On piecewise-linear approximation of nonlinear mappings containing Gummel–Poon models or Schichman–Hodges models. IEEE Trans. Circuits Syst. I Fundamental Theory Appl. 39, 694–697 (1992).
Chua, L. O., Komuro, M. & Matsumoto, T. The double scroll family. IEEE Trans. Circuits Syst. 33, 1072–1118 (1986).
Billings, S. & Voon, W. Piecewise linear identification of non-linear systems. Int. J. Control. 46, 215–235 (1987).
Sontag, E. From linear to nonlinear: some complexity comparisons. Proc. IEEE Conf. Decis. Control. 3, 2916–2920 (1995).
Mestl, T., Plahte, E. & Omholt, S. W. Periodic solutions in systems of piecewise- linear differential equations. Dyn. Stab. Syst. 10, 179–193 (1995).
Yalcin, M., Suykens, J. A. & Vandewalle, J. Cellular Neural Networks, Multi-Scroll Chaos and Synchronization Vol. 50 (World Scientific, 2005).
Yu, J., Mu, X., Xi, X. & Wang, S. A memristor model with piecewise window function. Radioengineering 22, 969–974 (2013).
Mu, X., Yu, J. & Wang, S. Modeling the memristor with piecewise linear function. Int. J. Numer. Model. Electron. Netw. Devices Fields 28, 96–106 (2015).
Yu, Y. et al. Modeling the AginSbTe memristor. Radioengineering 24, 808–813 (2015).
Yu, J. Memristor Model with Window Function and its Applications. Ph.D. thesis, Tsinghua University (2016).
Bemporad, A., Torrisi, F. D. & Morari, M. in Int. Workshop on Hybrid Systems: Computation and Control (eds Lynch, N. A. & Krogh, B. H.) 45–58 (Springer, 2000).
Bemporad, A., Ferrari-Trecate, G. & Morari, M. Observability and controllability of piecewise affine and hybrid systems. IEEE Trans. Autom. Control. 45, 1864–1876 (2000).
Heemels, W., De Schutter, B. & Bemporad, A. Equivalence of hybrid dynamical models. Automatica 37, 1085–1091 (2001).
Bemporad, A. Piecewise linear regression and classification. Preprint at https://arxiv.org/abs/2103.06189 (2021).
Huang, X., Xu, J. & Wang, S. Nonlinear system identification with continuous piecewise linear neural network. Neurocomputing 77, 167–177 (2012).
Huang, X., Mu, X. & Wang, S. in 16th IFAC Symp. System Identification 535–540 (IFAC, 2012).
Tao, Q. et al. Short-term traffic flow prediction based on the efficient hinging hyperplanes neural network. IEEE Trans. Intell. Transp. Syst. 1–13 (2022).
Pistikopoulos, E. N., Dua, V., Bozinis, N. A., Bemporad, A. & Morari, M. On-line optimization via off-line parametric optimization tools. Comput. Chem. Eng. 26, 175–185 (2002).
Bemporad, A., Borrelli, F. & Morari, M. Piecewise linear optimal controllers for hybrid systems. Proc. Am. Control. Conf. 2, 1190–1194 (2000). This work introduces the characteristics of PWL in control systems and the applications of PWL non-linearity.
Bemporad, A., Borrelli, F. & Morari, M. Model predictive control based on linear programming — the explicit solution. IEEE Trans. Autom. Control. 47, 1974–1985 (2002).
Bemporad, A., Morari, M., Dua, V. & Pistikopoulos, E. N. The explicit linear quadratic regulator for constrained systems. Automatica 38, 3–20 (2002).
Chikkula, Y., Lee, J. & Okunnaike, B. Dynamically scheduled model predictive control using hinging hyperplane models. AIChE J. 44, 2658–2674 (1998).
Wen, C., Ma, X. & Ydstie, B. E. Analytical expression of explicit mpc solution via lattice piecewise-affine function. Automatica 45, 910–917 (2009).
Xu, J. & Wang, S. in Proc. IEEE Conf. Decision and Control 7240–7245 (IEEE, 2019).
Maas, A., Hannun, A. Y. & Ng, A. Y. Rectifier nonlinearities improve neural network acoustic models. Proc. ICML 30, 3 (2013).
Yue-Hei Ng, J. et al. in Proc. IEEE Conf. Computer Vision and Pattern Recognition 4694–4702 (IEEE, 2015).
Purwins, H. et al. Deep learning for audio signal processing. IEEE J. Sel. Top. Signal. Process. 13, 206–219 (2019).
Xie, Q., Luong, M.-T., Hovy, E. & Le, Q. V. in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition 10687–10698 (IEEE, 2020).
Qiao, Y. et al. FPGA-accelerated deep convolutional neural networks for high throughput and energy efficiency. Concurr. Comput.Pract. Exper. 29, e3850 (2017).
Dua, D. & Graff, C. UCI machine learning repository. UCI http://archive.ics.uci.edu/ml (2017).
LeCun, Y. et al. Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998). This work formally introduces the basic learning framework for generic DNNs including PWL-DNNs.
Netzer, Y. et al. in NIPS Workshop on Deep Learning and Unsupervised Feature Learning 2011 (NIPS, 2011).
LeCun, Y., Huang, F. J. & Bottou, L. in Proc. IEEE Computer Soc. Conf. Computer Vis. Pattern Recognit. Vol. 2, II97–II104 (IEEE, 2004).
Krizhevsky, A. & Hinton, G. Learning Multiple Layers of Features from Tiny Images Technical report (Univ. of Toronto, 2009).
Lin, T.-Y. et al. in Proc. Eur. Conf. Computer Vision (eds Fleet, D. J., Pajdla, T., Schiele, B. & Tuytelaars, T.) 740–755 (Springer, 2014).
Russakovsky, O. et al. ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 211–252 (2015).
Krishna, R. et al. Visual Genome: connecting language and vision using crowdsourced dense image annotations. Int. J. Comput. Vis. 123, 32–73 (2017).
Abadi, M. et al. TensorFlow: large-scale machine learning on heterogeneous systems. TensorFlow https://www.tensorflow.org/ (2015).
Chollet, F. Keras. GitHub https://github.com/fchollet/keras (2015).
Jia, Y. et al. in Proc. ACM Int. Conf. Multimedia (eds Hua, K. A. et al.) 675–678 (ACM, 2014).
Chen, T. et al. MXNet: a flexible and efficient machine learning library for heterogeneous distributed systems. Preprint at https://arxiv.org/abs/1512.01274 (2015).
Bergstra, J. et al. in Proc. Python for Scientific Computing Conf. (SCIPY, 2010).
Tao, Q. et al. Toward deep adaptive hinging hyperplanes. IEEE Transactions on Neural Networks and Learning Systems (IEEE, 2021).
Tang, C. et al. Sparse MLP for image recognition: is self-attention really necessary? Preprint at https://arxiv.org/abs/2109.05422 (2021).
Wang, Y., Li, Z., Xu, J. & Li, J. in Proc. Asian Control Conf. 1066–1071 (IEEE, 2019).
Kawaguchi, K. in Adv. Neural Inf. Process. Syst. Vol. 29 (eds Lee, D. D., Sugiyama, M., von Luxburg, U., Guyon, I. & Garnett, R.) 586–594 (2016).
Yun, C., Sra, S. & Jadbabaie, A. in Proc. Int. Conf. Learning Representations (ICLR, 2018).
Nguyen, Q. & Hein, M. in Proc. Int. Conf. Mach. Learn. Vol. 70, 2603–2612 (PMLR, 2017).
Yun, C., Sra, S. & Jadbabaie, A. in Proc. Int. Conf. Learning Representations (ICLR, 2019).
Xu, B., Wang, N., Chen, T. & Li, M. in Workshop of the International Conference on Machine Learning (ICML, 2015).
Liang, X. & Xu, J. Biased ReLU neural networks. Neurocomputing 423, 71–79 (2021).
Shang, W., Sohn, K., Almeida, D. & Lee, H. in Proc. Int. Conf. Machine Learning Vol. 48 (eds Balcan, M.-F. & Weinberger, K. Q.) 2217–2225 (JMLR, 2016).
Qiu, S., Xu, X. & Cai, B. in Proc. Int. Conf. Pattern Recognition, 1223–1228 (IEEE, 2018).
Bodyanskiy, Y., Deineko, A., Pliss, I. & Slepanska, V. in Proc. Int. Workshop on Digital Content & Smart Multimedia Vol. 2533 (eds Kryvinska, N., Izonin, I., Gregus, M., Poniszewska-Maranda, A. & Dronyuk, I.) 14–22 (DCSMart Workshop, 2019).
Acknowledgements
This work is jointly supported by European Research Council (ERC) Advanced Grant E-DUALITY (787960), KU Leuven Grant CoE PFV/10/002, Grant FWO GOA4917N, EU H2020 ICT-48 Network TAILOR (Foundations of Trustworthy AI — Integrating Reasoning, Learning and Optimization), Leuven.AI Institute, National Key Research and Development Program under Grant 2021YFB2501200 and Shanghai Municipal Science and Technology Major Project (2021SHZDZX0102).
Author information
Authors and Affiliations
Contributions
Introduction (Q.T., L.L., X.H., X.X., S.W. and J.A.K.S.); Experimentation (Q.T., L.L., X.H., X.X., S.W. and J.A.K.S.); Results (Q.T., L.L., X.H., X.X., S.W. and J.A.K.S.); Applications (Q.T., L.L. and J.A.K.S.); Reproducibility and data deposition (Q.T. and X.X.); Limitations and optimizations (Q.T., X.H. and J.A.K.S.); Outlook (Q.T., L.L., X.H. and J.A.K.S.).
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Reviews Methods Primers thanks Pedro Julian, Jun Wang, Andrea Walther and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Glossary
- Induced conclusion by the Stone–Weierstrass approximation theorem
-
Any continuous function can be approximated by a piecewise linear (PWL) function to arbitrary accuracy.
- PWL functions
-
(Piecewise linear functions). Functions that appear to be linear in subregions of the domain but are, in essence, non-linear in the whole domain.
- Canonical piecewise linear representation
-
(CPLR). The pioneering compact expression by which a piecewise linear (PWL) function is constructed through a linear combination of multiple absolute-value basis functions.
- Rectified linear units
-
(ReLU). Some of the most popular activation functions in neural networks, defined as the positive part of the arguments by max{0, x}.
- Hinging hyperplanes
-
Two hyperplanes that constitute a hinge function, continuously joining at the so-called hinge; the hinging hyperplanes model has greatly contributed to construct flexible representation models for continuous piecewise linear (PWL) functions.
- Backpropagation strategy
-
A strategy widely used to train feedforward neural networks and works by computing the gradients of weights of each layer in the network and iterating backward layer-wise for efficient calculation.
- Stochastic gradient descent
-
(SGD). An iterative optimization algorithm, where the actual gradient is approximated or estimated commonly by a randomly selected subset of data.
- PWL memristors
-
Considered the fourth (other than the resistor, the inductor and the capacitor) fundamental two-terminal circuit elements including a memory of past voltages or currents, those memristors pertaining to piecewise linear (PWL)-characterized dynamics.
- Gradient vanishing problem
-
In the iterative updates of training deep neural networks (DNNs) with gradient-based algorithms, multiplying small values of gradients by backpropagation can lead to a very small value (approaching zero) in computing the gradients of early layers, which makes the network hard to proceed with in the training.
- Least squares method
-
An approach to approximate the solutions of an unknown system given with a set of input–output data points by minimizing the sum of the squares of the residuals between the observed output data and the network’s output.
- Gauss–Newton algorithm
-
A modified Newton method, which computes the second-order derivatives, to minimize a sum of squared loss in solving non-linear least squares problems.
- Multivariate adaptive regression splines
-
A flexible regression model consisting of weighted basis functions, which are expressed in terms of the product of truncated power splines \({[\pm ({x}_{i}-\beta )]}_{+}^{q}\), and its training procedures can be interpreted as generalized tree searching based on recursive domain partitions.
- Consistent variation property
-
Given a continuous piecewise linear (PWL) function, the necessary and sufficient condition on whether such a function can be expressed by a canonical piecewise linear representation (CPLR) model, where the properties of domain partitions and intersections between partitioned subregions are discussed; its detailed descriptions are given in the subsequent context.
- Zaslavsky’s theorem of hyperplane arrangement
-
The maximal number of regions in \({{\mathbb{R}}}^{d}\) with an arrangement of m hyperplanes is estimated by \({\sum }_{j=0}^{n}\left(\begin{array}{l}m\\ j\end{array}\right)\).
Rights and permissions
About this article
Cite this article
Tao, Q., Li, L., Huang, X. et al. Piecewise linear neural networks and deep learning. Nat Rev Methods Primers 2, 42 (2022). https://doi.org/10.1038/s43586-022-00125-7
Accepted:
Published:
DOI: https://doi.org/10.1038/s43586-022-00125-7
This article is cited by
-
Principles of artificial intelligence in radiooncology
Strahlentherapie und Onkologie (2024)
-
A graph neural network-based bearing fault detection method
Scientific Reports (2023)