Abstract

Artificial neural networks are remarkably adept at sensory processing, sequence learning and reinforcement learning, but are limited in their ability to represent variables and data structures and to store data over long timescales, owing to the lack of an external memory. Here we introduce a machine learning model called a differentiable neural computer (DNC), which consists of a neural network that can read from and write to an external memory matrix, analogous to the random-access memory in a conventional computer. Like a conventional computer, it can use its memory to represent and manipulate complex data structures, but, like a neural network, it can learn to do so from data. When trained with supervised learning, we demonstrate that a DNC can successfully answer synthetic questions designed to emulate reasoning and inference problems in natural language. We show that it can learn tasks such as finding the shortest path between specified points and inferring the missing links in randomly generated graphs, and then generalize these tasks to specific graphs such as transport networks and family trees. When trained with reinforcement learning, a DNC can complete a moving blocks puzzle in which changing goals are specified by sequences of symbols. Taken together, our results demonstrate that DNCs have the capacity to solve complex, structured tasks that are inaccessible to neural networks without external read–write memory.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

References

  1. 1.

    , & Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems Vol. 25 (eds et al.) 1097–1105 (Curran Associates, 2012)

  2. 2.

    Generating sequences with recurrent neural networks. Preprint at (2013)

  3. 3.

    , & Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems Vol. 27 (eds et al.) 3104–3112 (Curran Associates, 2014)

  4. 4.

    et al. Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015)

  5. 5.

    & Memory and the Computational Brain: Why Cognitive Science Will Transform Neuroscience (John Wiley & Sons, 2011)

  6. 6.

    The Algebraic Mind: Integrating Connectionism and Cognitive Science (MIT Press, 2001)

  7. 7.

    , , & Indirection and symbol-like processing in the prefrontal cortex and basal ganglia. Proc. Natl Acad. Sci. USA 110, 16390–16395 (2013)

  8. 8.

    Learning distributed representations of concepts. In Proc. Eighth Annual Conference of the Cognitive Science Society Vol. 1, 1–12 (Lawrence Erlbaum Associates, 1986)

  9. 9.

    From machine learning to machine reasoning. Mach. Learn. 94, 133–149 (2014)

  10. 10.

    , & Cascade models of synaptically stored memories. Neuron 45, 599–611 (2005)

  11. 11.

    , & Memory traces in dynamical systems. Proc. Natl Acad. Sci. USA 105, 18970–18975 (2008)

  12. 12.

    Sparse Distributed Memory (MIT press, 1988)

  13. 13.

    . Characteristics of sparsely encoded associative memory. Neural Netw. 2, 451–457 (1989)

  14. 14.

    , & Memory networks. Preprint at (2014)

  15. 15.

    , & Pointer networks. In Advances in Neural Information Processing Systems Vol. 28 (eds et al.) 2692–2700 (Curran Associates, 2015)

  16. 16.

    , & Neural Turing machines. Preprint at (2014)

  17. 17.

    , & Neural machine translation by jointly learning to align and translate. Preprint at (2014)

  18. 18.

    , , , & DRAW: a recurrent neural network for image generation. In Proc. 32nd International Conference on Machine Learning (eds & ) 1462–1471 (JMLR, 2015)

  19. 19.

    MINERVA 2: a simulation model of human memory. Behav. Res. Methods Instrum. Comput. 16, 96–101 (1984)

  20. 20.

    et al. Ask me anything: dynamic memory networks for natural language processing. Preprint at (2015)

  21. 21.

    et al. End-to-end memory networks. In Advances in Neural Information Processing Systems Vol. 28 (eds et al.) 2431–2439 (Curran Associates, 2015)

  22. 22.

    & A synaptically controlled, associative signal for Hebbian plasticity in hippocampal neurons. Science 275, 209–213 (1997)

  23. 23.

    , , , & Paradox of pattern separation and adult neurogenesis: a dual role for new neurons balancing memory resolution and robustness. Neurobiol. Learn. Mem. 129, 60–68 (2016)

  24. 24.

    & Hippocampal conjunctive encoding, storage, and recall: avoiding a trade-off. Hippocampus 4, 661–682 (1994)

  25. 25.

    & A distributed representation of temporal context. J. Math. Psychol. 46, 269–299 (2002)

  26. 26.

    , , & Towards AI-complete question answering: a set of prerequisite toy tasks. Preprint at (2015)

  27. 27.

    & Long short-term memory. Neural Comput. 9, 1735–1780 (1997)

  28. 28.

    , , & Curriculum learning. In Proc. 26th International Conference on Machine Learning (eds & ) 41–48 (ACM, 2009)

  29. 29.

    & Learning to execute. Preprint at (2014)

  30. 30.

    Procedures as a Representation for Data in a Computer Program for Understanding Natural Language. Report No. MAC-TR-84 (DTIC, MIT Project MAC, 1971)

  31. 31.

    , & Symbolic communication between two pigeons (Columba livia domestica). Science 207, 543–545 (1980)

  32. 32.

    , & Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory. Psychol. Rev. 102, 419–457 (1995)

  33. 33.

    , & What learning systems do intelligent agents need? Complementary learning systems theory updated. Trends Cogn. Sci. 20, 512–534 (2016)

  34. 34.

    & Considerations arising from a complementary learning systems perspective on hippocampus and neocortex. Hippocampus 6, 654–665 (1996)

  35. 35.

    , & Human-level concept learning through probabilistic program induction. Science 350, 1332–1338 (2015)

  36. 36.

    , , , & One-shot generalization in deep generative models. In Proc. 33nd International Conference on Machine Learning (eds & ) 1521–1529 (JMLR, 2016)

  37. 37.

    , , , & Meta-learning with memory-augmented neural networks. In Proc. 33nd International Conference on Machine Learning (eds & ) 1842–1850 (JMLR, 2016)

  38. 38.

    & The role of context in object recognition. Trends Cogn. Sci. 11, 520–527 (2007)

  39. 39.

    et al. Teaching machines to read and comprehend. In Advances in Neural Information Processing Systems Vol. 28 (eds et al.) 1693–1701 (Curran Associates, 2015)

  40. 40.

    & The Hippocampus as a Cognitive Map (Oxford Univ. Press, 1978)

  41. 41.

    , & Speech recognition with deep recurrent neural networks. In IEEE International Conference on Acoustics, Speech and Signal Processing (eds et al.) 6645–6649 (Curran Associates, 2013)

  42. 42.

    , , & Dynamic storage allocation: a survey and critical review. In Memory Management (ed. ) 1–116 (Springer, 1995)

  43. 43.

    , & A reduction of imitation learning and structured prediction to no-regret online learning. In Proc. Fourteenth International Conference on Artificial Intelligence and Statistics (eds et al.) 627–635 (JMLR, 2010)

  44. 44.

    , & Search-based structured prediction. Mach. Learn. 75, 297–325 (2009)

  45. 45.

    Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 8, 229–256 (1992)

  46. 46.

    , , & Policy gradient methods for reinforcement learning with function approximation. In Advances in Neural Information Processing Systems Vol. 12 (eds et al.) 1057–1063 (MIT Press, 1999)

  47. 47.

    , , , & High-dimensional continuous control using generalized advantage estimation. Preprint at (2015)

  48. 48.

    & Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)

  49. 49.

    et al. Large scale distributed deep networks. In Advances in Neural Information Processing Systems Vol. 25 (eds et al.) 1223–1231 (Curran Associates, 2012)

  50. 50.

    Backpropagation through time: what it does and how to do it. Proc. IEEE 78, 1550–1560 (1990)

  51. 51.

    & RmsProp: divide the gradient by a running average of its recent magnitude. Lecture 6.5 of Neural Networks for Machine Learning (COURSERA, 2012); available at

Download references

Acknowledgements

We thank D. Silver, M. Botvinick and S. Legg for reviewing the paper prior to submission; P. Dayan, D. Wierstra, G. Hinton, J. Dean, N. Kalchbrenner, J. Veness, I. Sutskever, V. Mnih, A. Mnih, D. Kumaran, N. de Freitas, L. Sifre, R. Pascanu, T. Lillicrap, J. Rae, A. Senior, M. Denil, T. Kocisky, A. Fidjeland, K. Gregor, A. Lerchner, C. Fernando, D. Rezende, C. Blundell and N. Heess for discussions; J. Besley for legal assistance; the rest of the DeepMind team for support and encouragement; and Transport for London for allowing us to reproduce portions of the London Underground map.

Author information

Author notes

    • Alex Graves
    •  & Greg Wayne

    These authors contributed equally to this work.

Affiliations

  1. Google DeepMind, 5 New Street Square, London EC4A 3TW, UK

    • Alex Graves
    • , Greg Wayne
    • , Malcolm Reynolds
    • , Tim Harley
    • , Ivo Danihelka
    • , Agnieszka Grabska-Barwińska
    • , Sergio Gómez Colmenarejo
    • , Edward Grefenstette
    • , Tiago Ramalho
    • , John Agapiou
    • , Adrià Puigdomènech Badia
    • , Karl Moritz Hermann
    • , Yori Zwols
    • , Georg Ostrovski
    • , Adam Cain
    • , Helen King
    • , Christopher Summerfield
    • , Phil Blunsom
    • , Koray Kavukcuoglu
    •  & Demis Hassabis

Authors

  1. Search for Alex Graves in:

  2. Search for Greg Wayne in:

  3. Search for Malcolm Reynolds in:

  4. Search for Tim Harley in:

  5. Search for Ivo Danihelka in:

  6. Search for Agnieszka Grabska-Barwińska in:

  7. Search for Sergio Gómez Colmenarejo in:

  8. Search for Edward Grefenstette in:

  9. Search for Tiago Ramalho in:

  10. Search for John Agapiou in:

  11. Search for Adrià Puigdomènech Badia in:

  12. Search for Karl Moritz Hermann in:

  13. Search for Yori Zwols in:

  14. Search for Georg Ostrovski in:

  15. Search for Adam Cain in:

  16. Search for Helen King in:

  17. Search for Christopher Summerfield in:

  18. Search for Phil Blunsom in:

  19. Search for Koray Kavukcuoglu in:

  20. Search for Demis Hassabis in:

Contributions

A.G. and G.W. conceived the project. A.G., G.W., M.R., T.H., I.D., S.G. and E.G. implemented networks and tasks. A.G., G.W., M.R., T.H., A.G.-B., T.R. and J.A. performed analysis. M.R., T.H., I.D., E.G., K.M.H., C.S., P.B., K.K. and D.H. contributed ideas. A.C. prepared graphics. A.G., G.W., M.R., T.H., S.G., A.P.B., Y.Z., G.O. and K.K. performed experiments. A.G., G.W., H.K., K.K. and D.H. managed the project. A.G., G.W., M.R., T.H., K.K. and D.H. wrote the paper.

Competing interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to Alex Graves or Greg Wayne.

Reviewer Information Nature thanks Y. Bengio, J. McClelland and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Extended data

Supplementary information

PDF files

  1. 1.

    Supplementary Information

    This file contains a glossary of symbols and the complete equations.

Videos

  1. 1.

    Shortest Path Visualisation

    This video shows a DNC successfully finding the shortest path between two nodes in a randomly generated graph. By decoding the memory usage of the DNC (as in Fig. 3) we were able to determine which edges were stored in the memory locations it was reading from and writing to at each timestep. The edges being read are shown in pink on the left, while the edges being written are shown in green on the right; the colour saturation indicates the relative strength of the operation. During the initial query phase, the DNC receives the labels for the start and end goal ("390" and "040" respectively). During the ten step planning phase it attempts to determine the shortest path. During this time it repeatedly reads edges close to or along the path, which are indicated by the grey shaded nodes. Beginning with edges attached to the start and end node, it appears to move further afield as the phase progresses. At the same time it writes to several of the edge locations, perhaps marking those edges as visited. Finally, during the answer phase, it successively reads the outgoing edges from the nodes along the shortest path, allowing it to correctly answer the query.

  2. 2.

    Mini-SHRDLU Visualisation

    This video shows a DNC successfully performing a reasoning problem in a blocks world. A sequence of letter-labeled goals (S, K, R, Q, E) is presented to the network one step at a time. Each goal consists of a sequence of defining constraints, presented one constraint per time-step. For example, S is: 6 below 2 (6b2); 2 right of 5 (2r5); 6 right of 1 (6r1); 5 above 1 (5a1). On the right, the write head edits the memory, writing information about the goals down. Ultimately, the DNC is commanded to satisfy goal \Q", which it does subsequently by using the read heads to inspect the locations containing goal Q. The constraints constituting goal \Q" are shown below, and the final board position is correct.

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/nature20101

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.