Artificial neural networks are remarkably adept at sensory processing, sequence learning and reinforcement learning, but are limited in their ability to represent variables and data structures and to store data over long timescales, owing to the lack of an external memory. Here we introduce a machine learning model called a differentiable neural computer (DNC), which consists of a neural network that can read from and write to an external memory matrix, analogous to the random-access memory in a conventional computer. Like a conventional computer, it can use its memory to represent and manipulate complex data structures, but, like a neural network, it can learn to do so from data. When trained with supervised learning, we demonstrate that a DNC can successfully answer synthetic questions designed to emulate reasoning and inference problems in natural language. We show that it can learn tasks such as finding the shortest path between specified points and inferring the missing links in randomly generated graphs, and then generalize these tasks to specific graphs such as transport networks and family trees. When trained with reinforcement learning, a DNC can complete a moving blocks puzzle in which changing goals are specified by sequences of symbols. Taken together, our results demonstrate that DNCs have the capacity to solve complex, structured tasks that are inaccessible to neural networks without external read–write memory.
Access optionsAccess options
Subscribe to Journal
Get full journal access for 1 year
only $3.90 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
We thank D. Silver, M. Botvinick and S. Legg for reviewing the paper prior to submission; P. Dayan, D. Wierstra, G. Hinton, J. Dean, N. Kalchbrenner, J. Veness, I. Sutskever, V. Mnih, A. Mnih, D. Kumaran, N. de Freitas, L. Sifre, R. Pascanu, T. Lillicrap, J. Rae, A. Senior, M. Denil, T. Kocisky, A. Fidjeland, K. Gregor, A. Lerchner, C. Fernando, D. Rezende, C. Blundell and N. Heess for discussions; J. Besley for legal assistance; the rest of the DeepMind team for support and encouragement; and Transport for London for allowing us to reproduce portions of the London Underground map.
This video shows a DNC successfully finding the shortest path between two nodes in a randomly generated graph. By decoding the memory usage of the DNC (as in Fig. 3) we were able to determine which edges were stored in the memory locations it was reading from and writing to at each timestep. The edges being read are shown in pink on the left, while the edges being written are shown in green on the right; the colour saturation indicates the relative strength of the operation. During the initial query phase, the DNC receives the labels for the start and end goal ("390" and "040" respectively). During the ten step planning phase it attempts to determine the shortest path. During this time it repeatedly reads edges close to or along the path, which are indicated by the grey shaded nodes. Beginning with edges attached to the start and end node, it appears to move further afield as the phase progresses. At the same time it writes to several of the edge locations, perhaps marking those edges as visited. Finally, during the answer phase, it successively reads the outgoing edges from the nodes along the shortest path, allowing it to correctly answer the query.
This video shows a DNC successfully performing a reasoning problem in a blocks world. A sequence of letter-labeled goals (S, K, R, Q, E) is presented to the network one step at a time. Each goal consists of a sequence of defining constraints, presented one constraint per time-step. For example, S is: 6 below 2 (6b2); 2 right of 5 (2r5); 6 right of 1 (6r1); 5 above 1 (5a1). On the right, the write head edits the memory, writing information about the goals down. Ultimately, the DNC is commanded to satisfy goal \Q", which it does subsequently by using the read heads to inspect the locations containing goal Q. The constraints constituting goal \Q" are shown below, and the final board position is correct.
About this article
Mapless Motion Planning System for an Autonomous Underwater Vehicle Using Policy Gradient-based Deep Reinforcement Learning
Journal of Intelligent & Robotic Systems (2019)