The human brain can solve highly abstract reasoning problems using a neural network that is entirely physical. The underlying mechanisms are only partially understood, but an artificial network provides valuable insight. See Article p.471
A classic example of logical reasoning is the syllogism, “All men are mortal. Socrates is a man. Therefore, Socrates is mortal.” According to both ancient and modern views1, reasoning amounts to a rule-based mental manipulation of symbols — in this example, the words 'All', 'men', and so on. But human brains are made of neurons that operate by exchanging jittery electrical pulses, rather than word-like symbols. This difference encapsulates a notorious scientific and philosophical enigma, sometimes referred to as the neural–symbolic integration problem2, which remains unsolved. On page 471, Graves et al.3 use the machine-learning methods of 'deep learning' to impart some crucial symbolic-reasoning mechanisms to an artificial neural system. Their system can solve complex tasks by learning symbolic-reasoning rules from examples, an achievement that has potential implications for the neural–symbolic integration problem.
A key requirement for reasoning is a working memory. In digital computers, this role is served by the random-access memory (RAM). When a computer reasons — when it executes a program — information is bundled together in working memory in ever-changing combinations. Comparing human reasoning to the running of computer programs is not a far-fetched metaphor. In fact, a venerable historical alley leads from Aristotle's definition of syllogisms to the modern model of a programmable computer (the Turing machine). Alan Turing himself used 'mind' language in his groundbreaking work4: “The behaviour of the computer at any moment is determined by the symbols which he is observing and his 'state of mind' at that moment.”
Although there are clear parallels between human reasoning and the running of computer programs, we lack an understanding of how either of them could be implemented in biological or artificial neural networks. Graves and colleagues take a substantial step forward in this quest by presenting a neuro-computational system that shows striking similarities to a digital computer.
The authors' system consists of several modules, all of which are entirely non-symbolic and operate by exchanging streams of purely analog activation patterns — just like those recorded from biological brains. There are two main modules: a 'memory' comprised of a large grid of memory cells, each of which can have a particular numerical value that is akin to a voltage; and a 'controller', which is an artificial neural network. The controller can access selected locations on the memory grid, read what it finds there, combine that with input data and write numerical values back to selected memory locations. The two modules interact in many respects like the RAM and central processing unit of a digital computer.
Graves and colleagues demonstrate the capabilities of their system by putting it through several tasks that require rational reasoning, such as planning a multi-stage journey using public transport. Such tasks are fairly easy to solve using the symbolic computer programs of artificial intelligence, but have so far been rather out of reach of artificial neural networks.
“The authors' neural system cannot and need not be programmed — instead, it is trained.”
A digital computer solves a given task by executing a program that has been written for that purpose. By contrast, the authors' neural system cannot and need not be programmed — instead, it is trained. During training, the system is presented with a large number of solved examples of the task at hand. With each new presentation, the system slightly adapts its internal neural wiring so that its response moves gradually closer to the given task's solution.
The analog, smoothly adaptable nature of the authors' neural system is the key to its ability to be trained. Mathematically speaking, the system is a 'differentiable function', which has led to the authors calling it a differentiable neural computer (DNC). A digital computer is not differentiable and could not be trained in any similar fashion.
A DNC is a mathematical object that boasts tens of thousands of adjustable parameters. Training such a monster raises a plethora of mathematical, numerical and run-time issues. Only in the past few years has machine-learning research overcome these obstacles, through a compendium of techniques that have become branded as deep learning5. The authors' training of a DNC is a splendid demonstration of the power of deep learning.
Graves et al. steer clear of grand claims about their work's implications for the neural–symbolic integration problem and, with due caution, suggest possible mappings of DNC structures to those of biological brains. This is wise, because the debates fought out in this arena are fierce and without winners. Instead, the authors establish an undeniable technical anchor point that will help to ground the debates — they have shown that certain non-trivial, central aspects of symbolic reasoning can be learnt by artificial neural systems.
With regard to practical exploits, deep-learning methods have so far excelled in tasks that require limited or no working memory, such as image recognition6 and sentence-wise language translation7. Whether or not DNCs will bring about practical advances in big-data technologies remains to be seen. The authors' demonstrations are not particularly complex as demands on rational reasoning go, and could be solved by the algorithms of symbolic artificial intelligence of the 1970s. However, those programs were handcrafted by humans and do not learn from examples.
For the time being, the DNC by itself cannot compete with state-of-the-art methods in digital computing when it comes to logical data mining8. But a flexible, extensible DNC-style working memory might allow deep learning to expand into big-data applications that have a rational reasoning component, such as generating video commentaries or semantic text analysis. A precursor to the DNC, the neural Turing machine9, certainly sent thrills through the deep-learning community.Footnote 1
Newell, A. Cogn. Sci. 4, 135–183 (1980).
Hammer, B. & Hitzler, P. (eds) Perspectives of Neural–Symbolic Integration http://doi.org/fsrb8m (Springer, 2007).
Graves, A. et al. Nature 538, 471–476 (2016).
Turing, A. M. J. Math. 58, 345–363 (1936).
LeCun, Y., Bengio, Y. & Hinton, G. Nature 521, 436–444 (2015).
Szegedy, C. et al. Proc. IEEE Conf. Computer Vision Pattern Recognition http://dx.doi.org/10.1109/CVPR.2015.7298594 (2015).
Bahdanau, D., Cho, K. & Bengio, Y. Int. Conf. Learning Representations Preprint at http://arxiv.org/abs/1409.0473 (2014).
De Raedt, L. & Kimmig, A. Machine Learn. 100, 5–47 (2015).
Graves, A., Wayne, G. & Danihelka, I. Preprint at http://arxiv.org/abs/1410.5401 (2014).
Related links in Nature Research
About this article
Pattern Recognition (2020)
IOP Conference Series: Materials Science and Engineering (2020)
Surface fatigue crack identification in steel box girder of bridges by a deep fusion convolutional neural network based on consumer-grade camera images
Structural Health Monitoring (2019)
Evolutionary Computation (2019)
Optimization and kinetic modeling of an enhanced bio-hydrogen fermentation with the addition of synergistic biochar and nickel nanoparticle
International Journal of Energy Research (2019)