Continual learning of context-dependent processing in neural networks

Zeng, Guanxiong; Chen, Yang; Cui, Bo; Yu, Shan

doi:10.1038/s42256-019-0080-x

Article
Published: 09 August 2019

Continual learning of context-dependent processing in neural networks

Guanxiong Zeng^1,2^na1,
Yang Chen¹^na1,
Bo Cui^1,2 &
…
Shan Yu ORCID: orcid.org/0000-0002-9008-6658^1,2,3

Nature Machine Intelligence volume 1, pages 364–372 (2019)Cite this article

7286 Accesses
109 Citations
18 Altmetric
Metrics details

Subjects

A preprint version of the article is available at arXiv.

Abstract

Deep neural networks are powerful tools in learning sophisticated but fixed mapping rules between inputs and outputs, thereby limiting their application in more complex and dynamic situations in which the mapping rules are not kept the same but change according to different contexts. To lift such limits, we developed an approach involving a learning algorithm, called orthogonal weights modification, with the addition of a context-dependent processing module. We demonstrated that with orthogonal weights modification to overcome catastrophic forgetting, and the context-dependent processing module to learn how to reuse a feature representation and a classifier for different contexts, a single network could acquire numerous context-dependent mapping rules in an online and continual manner, with as few as approximately ten samples to learn each. Our approach should enable highly compact systems to gradually learn myriad regularities of the real world and eventually behave appropriately within it.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 2: Performance of OWM, CAB and SGD in the ten-task disjoint MNIST experiment.**

**Fig. 3: Continual learning with small sample size achieved by OWM in recognizing Chinese characters.**

**Fig. 4: Achieving context-dependent sequential learning via the OWM algorithm and the CDP module.**

Highly accurate protein structure prediction with AlphaFold

Article Open access 15 July 2021

The language network as a natural kind within the broader landscape of the human brain

Article 12 April 2024

brainlife.io: a decentralized and open-source cloud platform to support neuroscience research

Article Open access 11 April 2024

Data availability

All data used in this paper are publicly available and can be accessed at http://yann.lecun.com/exdb/mnist/ for the MNIST dataset, https://www.cs.toronto.edu/~kriz/cifar.html for the CIFAR dataset, http://image-net.org/index for the ILSVR2012 dataset, http://www.nlpr.ia.ac.cn/databases/handwriting/Home.html for the CASIA-HWDB dataset and http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html for the CelebA dataset. For more details of the datasets, please refer to the references cited in the Supplementary Methods.

Code availability

The source code can be accessed at https://github.com/beijixiong3510/OWM⁵⁶.

References

Newell, A. Unified Theories of Cognition (Harvard Univ. Press, 1994).
Miller, G. A., Heise, G. A. & Lichten, W. The intelligibility of speech as a function of the context of the test materials. J. Exp. Psychol. 41, 329–335 (1951).
Article Google Scholar
Desimone, R. & Duncan, J. Neural mechanisms of selective visual-attention. Annu. Rev. Neurosci. 18, 193–222 (1995).
Article Google Scholar
Mante, V., Sussillo, D., Shenoy, K. V. & Newsome, W. T. Context-dependent computation by recurrent dynamics in prefrontal cortex. Nature 503, 78–84 (2013).
Article Google Scholar
Siegel, M., Buschman, T. J. & Miller, E. K. Cortical information flow during flexible sensorimotor decisions. Science 348, 1352–1355 (2015).
Article Google Scholar
Miller, E. K. The prefrontal cortex: complex neural properties for complex behavior. Neuron 22, 15–17 (1999).
Article Google Scholar
Wise, S. P., Murray, E. A. & Gerfen, C. R. The frontal cortex basal ganglia system in primates. Crit. Rev. Neurobiol. 10, 317–356 (1996).
Article Google Scholar
Passingham, R. The Frontal Lobes and Voluntary Action (Oxford Univ. Press, 1993).
Miller, E. K. & Cohen, J. D. An integrative theory of prefrontal cortex function. Annu. Rev. Neurosci. 24, 167–202 (2001).
Article Google Scholar
Miller, E. K. The prefontral cortex and cognitive control. Nat. Rev. Neurosci. 1, 59–65 (2000).
Article Google Scholar
LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).
Article Google Scholar
McCloskey, M. & Cohen, N. J. Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem Vol. 24 109–165 (Elsevier, 1989).
Ratcliff, R. Connectionist models of recognition memory—constraints imposed by learning and forgetting functions. Psychol. Rev. 97, 285–308 (1990).
Article Google Scholar
Goodfellow, I. J., Mirza, M., Xiao, D., Courville, A. & Bengio, Y. An empirical investigation of catastrophic forgetting in gradient-based neural networks. Preprint at https://arxiv.org/abs/1312.6211 (2013).
Parisi, G. I., Kemker, R., Part, J. L., Kanan, C. & Wermter, S. Continual lifelong learning with neural networks: a review. Neural Netw. 113, 54–71 (2019).
Article Google Scholar
Haykin, S. S. Adaptive Filter theory (Pearson Education India, 2008).
Golub, G. H. & Van Loan, C. F. Matrix Computations Vol. 3 (JHU Press, 2012).
Singhal, S. & Wu, L. Training feed-forward networks with the extended kalman algorithm. In International Conference on Acoustics, Speech, and Signal Processing 1187–1190 (IEEE, 1989).
Shah, S., Palmieri, F. & Datum, M. Optimal filtering algorithms for fast learning in feedforward neural networks. Neural Netw. 5, 779–787 (1992).
Article Google Scholar
Sussillo, D. & Abbott, L. F. Generating coherent patterns of activity from chaotic neural networks. Neuron 63, 544–557 (2009).
Article Google Scholar
Jaeger, H. Controlling recurrent neural networks by conceptors. Preprint at https://arxiv.org/abs/1403.3369 (2014).
He, X. & Jaeger, H. Overcoming catastrophic interference using conceptor-aided backpropagation. In International Conference on Learning Representations (ICLR, 2018).
Nair, V. & Hinton, G. E. Rectified linear units improve restricted Boltzmann machines. In International Conference on Machine Learning 807–814 (PMLR, 2010).
Kirkpatricka, J. et al. Overcoming catastrophic forgetting in neural networks. Proc. Natl Acad. Sci. USA 114, 3521–3526 (2017).
Article MathSciNet Google Scholar
Lee, S.-W., Kim, J.-H., Jun, J., Ha, J.-W. & Zhang, B.-T. Overcoming catastrophic forgetting by incremental moment matching. In Advances in Neural Information Processing Systems 4652–4662 (Curran Associates, 2017).
Zenke, F., Poole, B. & Ganguli, S. Continual learning through synaptic intelligence. In International Conference on Machine Learning 6072–6082 (PMLR, 2017).
Liu, C.-L., Yin, F., Wang, D.-H. & Wang, Q.-F. Chinese handwriting recognition contest 2010. In Chinese Conference on Pattern Recognition (CCPR) 1–5 (IEEE, 2010).
Yin, F., Wang, Q.-F., Zhang, X.-Y. & Liu, C.-L. ICDAR 2013 Chinese handwriting recognition competition. In 12th International Conference on Document Analysis and Recognition (ICDAR) 1464–1470 (IEEE, 2013).
Fuster, J. The Prefrontal Cortex (Academic Press, 2015).
Liu, Z., Luo, P., Wang, X. & Tang, X. Deep learning face attributes in the wild. In IEEE International Conference on Computer Vision 3730–3738 (IEEE, 2015).
Řehůřek, R. & Sojka, P. Software framework for topic modelling with large corpora. Proc. LREC 2010 Workshop on New Challenges for NLP Frameworks 45–50 (ELRA, 2010).
Lehky, S. R., Kiani, R., Esteky, H. & Tanaka, K. Dimensionality of object representations in monkey inferotemporal cortex. Neural Comput. 26, 2135–2162 (2014).
Article Google Scholar
Freedman, D. J., Riesenhuber, M., Poggio, T. & Miller, E. K. Categorical representation of visual stimuli in the primate prefrontal cortex. Science 291, 312–316 (2001).
Article Google Scholar
Hung, C. P., Kreiman, G., Poggio, T. & DiCarlo, J. J. Fast readout of object identity from macaque inferior temporal cortex. Science 310, 863–866 (2005).
Article Google Scholar
Kravitz, D. J., Saleem, K. S., Baker, C. I., Ungerleider, L. G. & Mishkin, M. The ventral visual pathway: an expanded neural framework for the processing of object quality. Trends Cogn. Sci. 17, 26–49 (2013).
Article Google Scholar
Gomez, J. et al. Microstructural proliferation in human cortex is coupled with the development of face processing. Science 355, 68–71 (2017).
Article Google Scholar
Xu, F. & Tenenbaum, J. B. Word learning as Bayesian inference. Psychol. Rev. 114, 245–272 (2007).
Article Google Scholar
Rigotti, M. et al. The importance of mixed selectivity in complex cognitive tasks. Nature 497, 585–590 (2013).
Article Google Scholar
Cichon, J. & Gan, W.-B. Branch-specific dendritic Ca²⁺ spikes cause persistent synaptic plasticity. Nature 520, 180–185 (2015).
Article Google Scholar
Rusu, A. A. et al. Progressive neural networks. Preprint at https://arxiv.org/abs/1606.04671 (2016).
Masse, N. Y., Grant, G. D. & Freedman, D. J. Alleviating catastrophic forgetting using context-dependent gating and synaptic stabilization. Proc. Natl Acad. Sci. USA 115, E10467–E10475 (2018).
Article Google Scholar
McClelland, J. L., McNaughton, B. L. & Oreilly, R. C. Why there are complementary learning-systems in the hippocampus and neocortex—insights from the successes and failures of connectionist models of learning and memory. Psychol. Rev. 102, 419–457 (1995).
Article Google Scholar
Kumaran, D., Hassabis, D. & McClelland, J. L. What learning systems do intelligent agents need? Complementary learning systems theory updated. Trends Cogn. Sci. 20, 512–534 (2016).
Article Google Scholar
Shin, H., Lee, J. K., Kim, J. & Kim, J. Continual learning with deep generative replay. In Advances in Neural Information Processing Systems 2990–2999 (Curran Associates, 2017).
Li, Z. & Hoiem, D. Learning without forgetting. IEEE Trans. Pattern Anal. Mach. Intell. 40, 2935–2947 (2017).
Article Google Scholar
Rohrbach, M., Stark, M., Szarvas, G., Gurevych, I. & Schiele, B. What helps where—and why? Semantic relatedness for knowledge transfer. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition 910–917 (IEEE, 2010).
Yosinski, J., Clune, J., Bengio, Y. & Lipson, H. How transferable are features in deep neural networks? In Advances in Neural Information Processing Systems 3320–3328 (Curran Associates, 2014).
Hinton, G., Vinyals, O. & Dean, J. Distilling the knowledge in a neural network. Preprint at https://arxiv.org/abs/1503.02531 (2015).
Schwarz, J. et al. Progress & compress: a scalable framework for continual learning. Preprint at https://arxiv.org/abs/1805.06370 (2018).
Glorot, X. & Bengio, Y. Understanding the difficulty of training deep feedforward neural networks. In Proc. Thirteenth International Conference on Artificial Intelligence and Statistics 249–256 (Microtome, 2010).
Nair, V. & Hinton, G. E. Rectified linear units improve restricted boltzmann machines. In Proc. 27th International Conference on Machine Learning (ICML-10) 807–814 (PMLR, 2010).
Srivastava, R. K., Masci, J., Kazerounian, S., Gomez, F. & Schmidhuber, J. Compete to compute. In Advances in Neural Information Processing Systems 2310–2318 (Curran Associates, 2013).
He, K. M., Zhang, X. Y., Ren, S. Q. & Sun, J. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition 770–778 (IEEE, 2016).
He, K., Zhang, X., Ren, S. & Sun, J. Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In IEEE International Conference on Computer Vision 1026–1034 (IEEE, 2015).
Ramirez-Cardenas, A. & Viswanathan, P. The role of prefrontal mixed selectivity in cognitive control. J. Neurosci. 36, 9013–9015 (2016).
Article Google Scholar
Zeng, G., Chen, Y., Cui, B. & Yu, S. Codes for paper Continual learning of context-dependent processing in neural networks. Zenodo https://doi.org/10.5281/zenodo.3346080 (2019).
Hu, W. et al. Overcoming catastrophic forgetting via model adaptation. In International Conference on Learning Representations (ICLR, 2019).

Download references

Acknowledgements

The authors thank D. Nikolić for helpful discussions and R. Hadsell for comments on the manuscript. This work was supported by the National Key Research and Development Program of China (2017YFA0105203), the Strategic Priority Research Program of the Chinese Academy of Sciences (CAS) (XDB32040200), Key Research Program of the National Laboratory of Pattern Recognition (99S9011M2N), and the Hundred-Talent Program of CAS (for S.Y.).

Author information

These authors contributed equally: Guanxiong Zeng, Yang Chen.

Authors and Affiliations

Brainnetome Center and National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China
Guanxiong Zeng, Yang Chen, Bo Cui & Shan Yu
University of Chinese Academy of Sciences, Beijing, China
Guanxiong Zeng, Bo Cui & Shan Yu
Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Beijing, China
Shan Yu

Authors

Guanxiong Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Yang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Bo Cui
View author publications
You can also search for this author in PubMed Google Scholar
Shan Yu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

S.Y., Y.C. and G.Z conceived the study and designed the experiments. G.Z. and Y.C. conducted computational experiments and theoretical analyses. B.C. assisted with some experiments and analyses. S.Y., Y.C. and G.Z. wrote the paper.

Corresponding author

Correspondence to Shan Yu.

Ethics declarations

Competing interests

The Institute of Automation, Chinese Academy of Sciences has submitted patent applications on the OWM algorithm (application no. PCT/CN2019/083355; invented by Y.C., G.Z. and S.Y.; pending) and the CDP module (application no. PCT/CN2019/083356; invented by G.Z., Y.C. and S.Y.; pending).

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary discussion, methods, Figs. 1–7, Tables 1–7 and references.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zeng, G., Chen, Y., Cui, B. et al. Continual learning of context-dependent processing in neural networks. Nat Mach Intell 1, 364–372 (2019). https://doi.org/10.1038/s42256-019-0080-x

Download citation

Received: 23 December 2018
Accepted: 10 July 2019
Published: 09 August 2019
Issue Date: August 2019
DOI: https://doi.org/10.1038/s42256-019-0080-x

This article is cited by

Relay learning: a physically secure framework for clinical multi-site deep learning
- Zi-Hao Bo
- Yuchen Guo
- Qionghai Dai
npj Digital Medicine (2023)
Hierarchically structured task-agnostic continual learning
- Heinke Hihn
- Daniel A. Braun
Machine Learning (2023)
Brain works principle followed by neural information processing: a review of novel brain theory
- Rubin Wang
- Yihong Wang
- Xiaochuan Pan
Artificial Intelligence Review (2023)
Prototype Representation Expansion in Incremental Learning
- Keming Mao
- Yong Luo
- Ruixiang Wang
Neural Processing Letters (2023)
RT-Net: replay-and-transfer network for class incremental object detection
- Bo Cui
- Guyue Hu
- Shan Yu
Applied Intelligence (2023)