Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Task representations in neural networks trained to perform many cognitive tasks


The brain has the ability to flexibly perform many tasks, but the underlying mechanism cannot be elucidated in traditional experimental and modeling studies designed for one task at a time. Here, we trained single network models to perform 20 cognitive tasks that depend on working memory, decision making, categorization, and inhibitory control. We found that after training, recurrent units can develop into clusters that are functionally specialized for different cognitive processes, and we introduce a simple yet effective measure to quantify relationships between single-unit neural representations of tasks. Learning often gives rise to compositionality of task representations, a critical feature for cognitive flexibility, whereby one task can be performed by recombining instructions for other tasks. Finally, networks developed mixed task selectivity similar to recorded prefrontal neurons after learning multiple tasks sequentially with a continual-learning technique. This work provides a computational platform to investigate neural representations of many cognitive tasks.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: A recurrent neural network model is trained to perform a large number of cognitive tasks.
Fig. 2: The emergence of functionally specialized clusters for task representation.
Fig. 3: The activation function dictates whether clusters emerge in a network.
Fig. 4: A diversity of neural relationships between pairs of tasks.
Fig. 5: Dissecting a reference network for the context-dependent DM tasks.
Fig. 6: Compositional representation of tasks in state space.
Fig. 7: Performing tasks with algebraically composite rule inputs.
Fig. 8: Sequential training of cognitive tasks.

Similar content being viewed by others

Code availability

All training and analysis codes are available on GitHub (

Data availability

We provide data files in Python and MATLAB readable formats for all trained models for further analyses on Github (


  1. Fuster, J. The Prefrontal Cortex (Academic Press, Cambridge, 2015).

    Book  Google Scholar 

  2. Miller, E. K. & Cohen, J. D. An integrative theory of prefrontal cortex function. Annu. Rev. Neurosci. 24, 167–202 (2001).

    Article  CAS  Google Scholar 

  3. Wang, X.-J. in Principles of Frontal Lobe Function (Stuss, D. T. & Knight, R. T. eds.) (Cambridge Univ. Press, New York, 2013).

  4. Wallis, J. D., Anderson, K. C. & Miller, E. K. Single neurons in prefrontal cortex encode abstract rules. Nature 411, 953–956 (2001).

    Article  CAS  Google Scholar 

  5. Sakai, K. Task set and prefrontal cortex. Annu. Rev. Neurosci. 31, 219–245 (2008).

    Article  CAS  Google Scholar 

  6. Cole, M. W., Etzel, J. A., Zacks, J. M., Schneider, W. & Braver, T. S. Rapid transfer of abstract rules to novel contexts in human lateral prefrontal cortex. Front. Hum. Neurosci. 5, 142 (2011).

    Article  Google Scholar 

  7. Tschentscher, N., Mitchell, D. & Duncan, J. Fluid intelligence predicts novel rule implementation in a distributed frontoparietal control network. J. Neurosci. 37, 4841–4847 (2017).

    Article  CAS  Google Scholar 

  8. Hanes, D. P., Patterson, W. F. II & Schall, J. D. Role of frontal eye fields in countermanding saccades: visual, movement, and fixation activity. J. Neurophysiol. 79, 817–834 (1998).

    Article  CAS  Google Scholar 

  9. Padoa-Schioppa, C. & Assad, J. A. Neurons in the orbitofrontal cortex encode economic value. Nature 441, 223–226 (2006).

    Article  CAS  Google Scholar 

  10. Rigotti, M. et al. The importance of mixed selectivity in complex cognitive tasks. Nature 497, 585–590 (2013).

    Article  CAS  Google Scholar 

  11. Mante, V., Sussillo, D., Shenoy, K. V. & Newsome, W. T. Context-dependent computation by recurrent dynamics in prefrontal cortex. Nature 503, 78–84 (2013).

    Article  CAS  Google Scholar 

  12. Cole, M. W., Laurent, P. & Stocco, A. Rapid instructed task learning: a new window into the human brain’s unique capacity for flexible cognitive control. Cogn. Affect. Behav. Neurosci. 13, 1–22 (2013).

    Article  Google Scholar 

  13. Reverberi, C., Görgen, K. & Haynes, J.-D. Compositionality of rule representations in human prefrontal cortex. Cereb. Cortex 22, 1237–1246 (2012).

    Article  Google Scholar 

  14. Zipser, D. & Andersen, R. A. A back-propagation programmed network that simulates response properties of a subset of posterior parietal neurons. Nature 331, 679–684 (1988).

    Article  CAS  Google Scholar 

  15. Song, H. F., Yang, G. R. & Wang, X.-J. Training excitatory-inhibitory recurrent neural networks for cognitive tasks: a simple and flexible framework. PLoS Comput. Biol. 12, e1004792 (2016).

    Article  Google Scholar 

  16. Carnevale, F., de Lafuente, V., Romo, R., Barak, O. & Parga, N. Dynamic control of response criterion in premotor cortex during perceptual detection under temporal uncertainty. Neuron 86, 1067–1077 (2015).

    Article  CAS  Google Scholar 

  17. Rajan, K., Harvey, C. D. & Tank, D. W. Recurrent network models of sequence generation and memory. Neuron 90, 128–142 (2016).

    Article  CAS  Google Scholar 

  18. Chaisangmongkon, W., Swaminathan, S. K., Freedman, D. J. & Wang, X.-J. Computing by robust transience: how the fronto-parietal network performs sequential, category-based decisions. Neuron 93, 1504–1517 (2017).

    Article  CAS  Google Scholar 

  19. Eliasmith, C. et al. A large-scale model of the functioning brain. Science 338, 1202–1205 (2012).

    Article  CAS  Google Scholar 

  20. Funahashi, S., Bruce, C. J. & Goldman-Rakic, P. S. Mnemonic coding of visual space in the monkey’s dorsolateral prefrontal cortex. J. Neurophysiol. 61, 331–349 (1989).

    Article  CAS  Google Scholar 

  21. Gold, J. I. & Shadlen, M. N. The neural basis of decision making. Annu. Rev. Neurosci. 30, 535–574 (2007).

    Article  CAS  Google Scholar 

  22. Siegel, M., Buschman, T. J. & Miller, E. K. Cortical information flow during flexible sensorimotor decisions. Science 348, 1352–1355 (2015).

    Article  CAS  Google Scholar 

  23. Raposo, D., Kaufman, M. T. & Churchland, A. K. A category-free neural population supports evolving demands during decision-making. Nat. Neurosci. 17, 1784–1792 (2014).

    Article  CAS  Google Scholar 

  24. Romo, R., Brody, C. D., Hernández, A. & Lemus, L. Neuronal correlates of parametric working memory in the prefrontal cortex. Nature 399, 470–473 (1999).

    Article  CAS  Google Scholar 

  25. Munoz, D. P. & Everling, S. Look away: the anti-saccade task and the voluntary control of eye movement. Nat. Rev. Neurosci. 5, 218–228 (2004).

    Article  CAS  Google Scholar 

  26. Miller, E. K., Erickson, C. A. & Desimone, R. Neural mechanisms of visual working memory in prefrontal cortex of the macaque. J. Neurosci. 16, 5154–5167 (1996).

    Article  CAS  Google Scholar 

  27. Freedman, D. J. & Assad, J. A. Neuronal mechanisms of visual categorization: an abstract view on decision making. Annu. Rev. Neurosci. 39, 129–147 (2016).

    Article  CAS  Google Scholar 

  28. Priebe, N. J. & Ferster, D. Inhibition, spike threshold, and stimulus selectivity in primary visual cortex. Neuron 57, 482–497 (2008).

    Article  CAS  Google Scholar 

  29. Abbott, L. F. & Chance, F. S. Drivers and modulators from push-pull and balanced synaptic input. Prog. Brain. Res. 149, 147–155 (2005).

    Article  CAS  Google Scholar 

  30. Wang, X.-J. Probabilistic decision making by slow reverberation in cortical circuits. Neuron 36, 955–968 (2002).

    Article  CAS  Google Scholar 

  31. Sussillo, D. & Barak, O. Opening the black box: low-dimensional dynamics in high-dimensional recurrent neural networks. Neural Comput. 25, 626–649 (2013).

    Article  Google Scholar 

  32. Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S. & Dean, J. Distributed representations of words and phrases and their compositionality. Adv. Neural. Inf. Process. Syst. 26, 3111–3119 (2013).

    Google Scholar 

  33. Benna, M. K. & Fusi, S. Computational principles of synaptic memory consolidation. Nat. Neurosci. 19, 1697–1706 (2016).

    Article  CAS  Google Scholar 

  34. Kirkpatrick, J. et al. Overcoming catastrophic forgetting in neural networks. Proc. Natl Acad. Sci. USA 114, 3521–3526 (2017).

    Article  CAS  Google Scholar 

  35. Zenke, F., Poole, B. & Ganguli, S. Continual learning through synaptic intelligence. ICML 70, 3987–3995 (2017).

    Google Scholar 

  36. Kanwisher, N. Functional specificity in the human brain: a window into the functional architecture of the mind. Proc. Natl Acad. Sci. USA 107, 11163–11170 (2010).

    Article  CAS  Google Scholar 

  37. Rigotti, M., Ben Dayan Rubin, D., Wang, X.-J. & Fusi, S. Internal representation of task rules by recurrent dynamics: the importance of the diversity of neural responses. Front. Comput. Neurosci. 4, 24 (2010).

    Article  Google Scholar 

  38. Cole, M. W. et al. Multi-task connectivity reveals flexible hubs for adaptive task control. Nat. Neurosci. 16, 1348–1355 (2013).

    Article  CAS  Google Scholar 

  39. Yang, G. R., Ganichev, I., Wang, X.-J., Shlens, J. & Sussillo, D. A dataset and architecture for visual reasoning with a working memory. ECCV 714–731 (2018)..

  40. Lake, B. M. & Baroni, M. Generalization without systematicity: On the compositional skills of sequence-to-sequence recurrent networks. ICML 80, 2873–2882 (2017).

    Google Scholar 

  41. Yamins, D. L. K. et al. Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proc. Natl Acad. Sci. USA 111, 8619–8624 (2014).

    Article  CAS  Google Scholar 

  42. Song, H. F., Yang, G. R. & Wang, X.-J. Reward-based training of recurrent neural networks for cognitive and value-based tasks. eLife 6, e21492 (2017).

    Article  Google Scholar 

  43. Kingma, D. & Ba, J. Adam: A method for stochastic optimization. ICLR (2015)..

  44. Le, Q. V., Jaitly, N. & Hinton, G. E. A simple way to initialize recurrent networks of rectified linear units. Preprint at arXiv (2015).

Download references


We thank current and former members of the Wang lab, especially S.Y. Li, O. Marschall, and E. Ohran for fruitful discussions; J.A. Li, J.D. Murray, D. Ehrlich, and J. Jaramillo for critical comments on the manuscript; and S. Wang for assistance with the NYU HPC clusters. We are grateful to V. Mante for providing data and for discussion. This work was supported by an Office of Naval Research grant no. N00014-13-1-0297, a National Science Foundation grant no. 16-31586, a Google Computational Neuroscience Grant (X.J.W.), a Samuel J. and Joan B. Williamson Fellowship, a National Science Foundation Grant Number 1707398, and the Gatsby Charitable Foundation (G.R.Y.).

Author information

Authors and Affiliations



G.R.Y. and X.J.W. designed the study. G.R.Y., M.R.J., H.F.S, W.T.N., and X.J.W. had frequent discussions. G.R.Y. and M.R.J. performed the research. G.R.Y., H.F.S, W.T.N., and X.J.W. wrote the manuscript.

Corresponding author

Correspondence to Xiao-Jing Wang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Integrated supplementary information

Supplementary Figure 1 Sample trials from the 20 tasks trained.

(a) Convention is the same as Fig. 1a. Output activities are obtained from a sample network after training. Green lines are the target activities for the fixation output unit.

Supplementary Figure 2 Psychometric tests for a range of tasks.

(a) Decision-making performances improve with longer stimulus presentation time and stronger stimulus coherence in the DM 1 task in a sample reference network. (b) Discrimination thresholds decrease with longer stimulus presentation time in the DM 1 task. The discrimination thresholds are estimated by fitting cumulative Weibull functions. (c-f) Same analyses as (a,b) for the Ctx DM 1 (c,d) and MultSen DM (e,f) task. In all n=20 independent networks studied, performance improves with longer stimulus presentation time. However, in many networks the improvement is different from that expected of perfect integration (red line). This variation has no impact on other results. (g) A sample network is able to perform well above chance in the Dly DM 1 task for a delay period of up to five seconds.

Supplementary Figure 3 Task and epoch variances.

(a) Visualization of the task variance map using classical multi-dimensional scaling (MDS). MDS tends to preserve global structures, while tSNE tends to emphasize local structures (for example, clustering). (b) Epoch variance is computed in a similar way to task variance, except that it is computed for individual task epochs instead of tasks. There are clusters of units that are selective in specific epochs. (c) Visualization of the epoch variance map in the same style as Fig. 2d.

Supplementary Figure 4 Determining number of clusters.

The silhouette score as a function of the number of clusters for an example network with the Softplus activation function (a) and one with the Tanh activation function (b). The silhouette score assesses the quality of a clustering scheme (see Methods). The ‘optimal’ or natural number of clusters is chosen to be the one with the highest silhouette score.

Supplementary Figure 5 Connectivity matrix.

The full connectivity matrix for an example reference network. The network units are first sorted according to their cluster identity. Within each cluster, the units are sorted according to their preferred input directions, as defined by the input direction making the strongest connection weights to each unit (summed across modality 1 and 2). Color range is determined separately for each sub-matrix for better visualization. Red means more excitatory and blue means more inhibitory.

Supplementary Figure 6 Fractional variance distributions for all pairs of tasks.

(a) There is a total of 190 unique pairs of tasks from all 20 tasks trained. Each fractional variance distribution (black) shown here is averaged across 20 independently trained networks. As a control, we also computed fractional variance distributions (gray) from activities of surrogate units that are generated by randomly mixing activities of the original network units (see Methods). The y-axis range is shared across all plots.

Supplementary Figure 7 Detailed behavioral effect of lesioning on the Ctx DM 1 task.

(a-e) The network choice in the Ctx DM 1 task for different combinations of modality 1 and modality 2 coherence in various networks. (a) The intact network’s choice only depends on the coherence of modality 1. (b) Lesioning group 1 makes the network more dependent on the coherence of modality 2. (c) Lesioning group 2 has no impact for the Ctx DM 1 task. (d) Lesioning both group 1 and 2 allow the network to weigh both modalities equally. (e) Lesioning group 12 led to failure in making decisions. Although some preference towards modality 1 is preserved, the network is largely unable to choose decisively.

Supplementary Figure 8 Representation of all tasks in state space.

(a) The representation of each task is computed the same way as in Fig. 6. Here showing the representation of all tasks in the top two principal components. RT Go and RT Anti tasks are not shown here because there is no well-defined stimulus epoch in these tasks.

Supplementary Figure 9 Visualization of connection weights of rule inputs.

(a) Connection weights from rule input units representing Go, Dly Go, Anti, Dly Anti tasks visualized in the space spanned by the top two principal components (PCs) for a sample network. Similar to Fig. 6, the top two PCs are rotated and reflected (rPCs) to form the two axes. (b) The same analysis as in (a) is performed for 40 networks, and the results are overlaid. (c) Connection weights from rule input units representing Ctx DM 1, Ctx DM 2, Ctx Dly DM 1, and Ctx Dly DM 2 tasks visualized in the top two PCs for a sample network. (d) The same analysis as in (c) for 40 networks.

Supplementary Figure 10 Distributed rule representation.

(a) The same analysis and box-plot convention as Fig. 7b,c, except that the networks are trained using distributed, instead of one-hot, rule representations.

Supplementary Figure 11 Lack of compositionality for the family of matching tasks.

(a) Visualization of task-based network activity for the DMS, DNMS, DMC, and DNMC tasks, for an example network (left) and for 40 networks (right). These plots have the same style as Fig. 6. (b) Visualization of connection weights for the same set of tasks in an example network (left) and for 40 networks (right). The rule weights are not compositional. These plots have the same style as Supplementary Fig. 9. (c) The DMS task can not be performed with a compositional rule input. The box plot convention is the same as the one in Fig. 7b.

Supplementary Figure 12 Partially plastic networks and experimental data.

(a) Networks where only 10% of connection weights are trained show a mixed FTV distribution for the Ctx DM 1 and Ctx DM 2 tasks. Solid lines are median over 60 networks. Shaded areas indicate the 95% confidence interval of the median estimated from bootstrapping. (b-e) FTV distributions derived from experimental data (reference 11). (b) Monkey A, single units. (c) Monkey A, all units. (d) Monkey F, single units. (e) Monkey F, all units.

Supplementary information

Supplementary Figures 1–12

Supplementary Figs. 1–12 and Supplementary Table 1

Reporting Summary

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, G.R., Joglekar, M.R., Song, H.F. et al. Task representations in neural networks trained to perform many cognitive tasks. Nat Neurosci 22, 297–306 (2019).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

This article is cited by


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing