Article | Published:

Task representations in neural networks trained to perform many cognitive tasks


The brain has the ability to flexibly perform many tasks, but the underlying mechanism cannot be elucidated in traditional experimental and modeling studies designed for one task at a time. Here, we trained single network models to perform 20 cognitive tasks that depend on working memory, decision making, categorization, and inhibitory control. We found that after training, recurrent units can develop into clusters that are functionally specialized for different cognitive processes, and we introduce a simple yet effective measure to quantify relationships between single-unit neural representations of tasks. Learning often gives rise to compositionality of task representations, a critical feature for cognitive flexibility, whereby one task can be performed by recombining instructions for other tasks. Finally, networks developed mixed task selectivity similar to recorded prefrontal neurons after learning multiple tasks sequentially with a continual-learning technique. This work provides a computational platform to investigate neural representations of many cognitive tasks.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Code availability

All training and analysis codes are available on GitHub (

Data availability

We provide data files in Python and MATLAB readable formats for all trained models for further analyses on Github (

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.


  1. 1.

    Fuster, J. The Prefrontal Cortex (Academic Press, Cambridge, 2015).

  2. 2.

    Miller, E. K. & Cohen, J. D. An integrative theory of prefrontal cortex function. Annu. Rev. Neurosci. 24, 167–202 (2001).

  3. 3.

    Wang, X.-J. in Principles of Frontal Lobe Function (Stuss, D. T. & Knight, R. T. eds.) (Cambridge Univ. Press, New York, 2013).

  4. 4.

    Wallis, J. D., Anderson, K. C. & Miller, E. K. Single neurons in prefrontal cortex encode abstract rules. Nature 411, 953–956 (2001).

  5. 5.

    Sakai, K. Task set and prefrontal cortex. Annu. Rev. Neurosci. 31, 219–245 (2008).

  6. 6.

    Cole, M. W., Etzel, J. A., Zacks, J. M., Schneider, W. & Braver, T. S. Rapid transfer of abstract rules to novel contexts in human lateral prefrontal cortex. Front. Hum. Neurosci. 5, 142 (2011).

  7. 7.

    Tschentscher, N., Mitchell, D. & Duncan, J. Fluid intelligence predicts novel rule implementation in a distributed frontoparietal control network. J. Neurosci. 37, 4841–4847 (2017).

  8. 8.

    Hanes, D. P., Patterson, W. F. II & Schall, J. D. Role of frontal eye fields in countermanding saccades: visual, movement, and fixation activity. J. Neurophysiol. 79, 817–834 (1998).

  9. 9.

    Padoa-Schioppa, C. & Assad, J. A. Neurons in the orbitofrontal cortex encode economic value. Nature 441, 223–226 (2006).

  10. 10.

    Rigotti, M. et al. The importance of mixed selectivity in complex cognitive tasks. Nature 497, 585–590 (2013).

  11. 11.

    Mante, V., Sussillo, D., Shenoy, K. V. & Newsome, W. T. Context-dependent computation by recurrent dynamics in prefrontal cortex. Nature 503, 78–84 (2013).

  12. 12.

    Cole, M. W., Laurent, P. & Stocco, A. Rapid instructed task learning: a new window into the human brain’s unique capacity for flexible cognitive control. Cogn. Affect. Behav. Neurosci. 13, 1–22 (2013).

  13. 13.

    Reverberi, C., Görgen, K. & Haynes, J.-D. Compositionality of rule representations in human prefrontal cortex. Cereb. Cortex 22, 1237–1246 (2012).

  14. 14.

    Zipser, D. & Andersen, R. A. A back-propagation programmed network that simulates response properties of a subset of posterior parietal neurons. Nature 331, 679–684 (1988).

  15. 15.

    Song, H. F., Yang, G. R. & Wang, X.-J. Training excitatory-inhibitory recurrent neural networks for cognitive tasks: a simple and flexible framework. PLoS Comput. Biol. 12, e1004792 (2016).

  16. 16.

    Carnevale, F., de Lafuente, V., Romo, R., Barak, O. & Parga, N. Dynamic control of response criterion in premotor cortex during perceptual detection under temporal uncertainty. Neuron 86, 1067–1077 (2015).

  17. 17.

    Rajan, K., Harvey, C. D. & Tank, D. W. Recurrent network models of sequence generation and memory. Neuron 90, 128–142 (2016).

  18. 18.

    Chaisangmongkon, W., Swaminathan, S. K., Freedman, D. J. & Wang, X.-J. Computing by robust transience: how the fronto-parietal network performs sequential, category-based decisions. Neuron 93, 1504–1517 (2017).

  19. 19.

    Eliasmith, C. et al. A large-scale model of the functioning brain. Science 338, 1202–1205 (2012).

  20. 20.

    Funahashi, S., Bruce, C. J. & Goldman-Rakic, P. S. Mnemonic coding of visual space in the monkey’s dorsolateral prefrontal cortex. J. Neurophysiol. 61, 331–349 (1989).

  21. 21.

    Gold, J. I. & Shadlen, M. N. The neural basis of decision making. Annu. Rev. Neurosci. 30, 535–574 (2007).

  22. 22.

    Siegel, M., Buschman, T. J. & Miller, E. K. Cortical information flow during flexible sensorimotor decisions. Science 348, 1352–1355 (2015).

  23. 23.

    Raposo, D., Kaufman, M. T. & Churchland, A. K. A category-free neural population supports evolving demands during decision-making. Nat. Neurosci. 17, 1784–1792 (2014).

  24. 24.

    Romo, R., Brody, C. D., Hernández, A. & Lemus, L. Neuronal correlates of parametric working memory in the prefrontal cortex. Nature 399, 470–473 (1999).

  25. 25.

    Munoz, D. P. & Everling, S. Look away: the anti-saccade task and the voluntary control of eye movement. Nat. Rev. Neurosci. 5, 218–228 (2004).

  26. 26.

    Miller, E. K., Erickson, C. A. & Desimone, R. Neural mechanisms of visual working memory in prefrontal cortex of the macaque. J. Neurosci. 16, 5154–5167 (1996).

  27. 27.

    Freedman, D. J. & Assad, J. A. Neuronal mechanisms of visual categorization: an abstract view on decision making. Annu. Rev. Neurosci. 39, 129–147 (2016).

  28. 28.

    Priebe, N. J. & Ferster, D. Inhibition, spike threshold, and stimulus selectivity in primary visual cortex. Neuron 57, 482–497 (2008).

  29. 29.

    Abbott, L. F. & Chance, F. S. Drivers and modulators from push-pull and balanced synaptic input. Prog. Brain. Res. 149, 147–155 (2005).

  30. 30.

    Wang, X.-J. Probabilistic decision making by slow reverberation in cortical circuits. Neuron 36, 955–968 (2002).

  31. 31.

    Sussillo, D. & Barak, O. Opening the black box: low-dimensional dynamics in high-dimensional recurrent neural networks. Neural Comput. 25, 626–649 (2013).

  32. 32.

    Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S. & Dean, J. Distributed representations of words and phrases and their compositionality. Adv. Neural. Inf. Process. Syst. 26, 3111–3119 (2013).

  33. 33.

    Benna, M. K. & Fusi, S. Computational principles of synaptic memory consolidation. Nat. Neurosci. 19, 1697–1706 (2016).

  34. 34.

    Kirkpatrick, J. et al. Overcoming catastrophic forgetting in neural networks. Proc. Natl Acad. Sci. USA 114, 3521–3526 (2017).

  35. 35.

    Zenke, F., Poole, B. & Ganguli, S. Continual learning through synaptic intelligence. ICML 70, 3987–3995 (2017).

  36. 36.

    Kanwisher, N. Functional specificity in the human brain: a window into the functional architecture of the mind. Proc. Natl Acad. Sci. USA 107, 11163–11170 (2010).

  37. 37.

    Rigotti, M., Ben Dayan Rubin, D., Wang, X.-J. & Fusi, S. Internal representation of task rules by recurrent dynamics: the importance of the diversity of neural responses. Front. Comput. Neurosci. 4, 24 (2010).

  38. 38.

    Cole, M. W. et al. Multi-task connectivity reveals flexible hubs for adaptive task control. Nat. Neurosci. 16, 1348–1355 (2013).

  39. 39.

    Yang, G. R., Ganichev, I., Wang, X.-J., Shlens, J. & Sussillo, D. A dataset and architecture for visual reasoning with a working memory. ECCV 714–731 (2018)..

  40. 40.

    Lake, B. M. & Baroni, M. Generalization without systematicity: On the compositional skills of sequence-to-sequence recurrent networks. ICML 80, 2873–2882 (2017).

  41. 41.

    Yamins, D. L. K. et al. Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proc. Natl Acad. Sci. USA 111, 8619–8624 (2014).

  42. 42.

    Song, H. F., Yang, G. R. & Wang, X.-J. Reward-based training of recurrent neural networks for cognitive and value-based tasks. eLife 6, e21492 (2017).

  43. 43.

    Kingma, D. & Ba, J. Adam: A method for stochastic optimization. ICLR (2015)..

  44. 44.

    Le, Q. V., Jaitly, N. & Hinton, G. E. A simple way to initialize recurrent networks of rectified linear units. Preprint at arXiv (2015).

Download references


We thank current and former members of the Wang lab, especially S.Y. Li, O. Marschall, and E. Ohran for fruitful discussions; J.A. Li, J.D. Murray, D. Ehrlich, and J. Jaramillo for critical comments on the manuscript; and S. Wang for assistance with the NYU HPC clusters. We are grateful to V. Mante for providing data and for discussion. This work was supported by an Office of Naval Research grant no. N00014-13-1-0297, a National Science Foundation grant no. 16-31586, a Google Computational Neuroscience Grant (X.J.W.), a Samuel J. and Joan B. Williamson Fellowship, a National Science Foundation Grant Number 1707398, and the Gatsby Charitable Foundation (G.R.Y.).

Author information

G.R.Y. and X.J.W. designed the study. G.R.Y., M.R.J., H.F.S, W.T.N., and X.J.W. had frequent discussions. G.R.Y. and M.R.J. performed the research. G.R.Y., H.F.S, W.T.N., and X.J.W. wrote the manuscript.

Competing interests

The authors declare no competing interests.

Correspondence to Xiao-Jing Wang.

Integrated supplementary information

Supplementary Figure 1 Sample trials from the 20 tasks trained.

(a) Convention is the same as Fig. 1a. Output activities are obtained from a sample network after training. Green lines are the target activities for the fixation output unit.

Supplementary Figure 2 Psychometric tests for a range of tasks.

(a) Decision-making performances improve with longer stimulus presentation time and stronger stimulus coherence in the DM 1 task in a sample reference network. (b) Discrimination thresholds decrease with longer stimulus presentation time in the DM 1 task. The discrimination thresholds are estimated by fitting cumulative Weibull functions. (c-f) Same analyses as (a,b) for the Ctx DM 1 (c,d) and MultSen DM (e,f) task. In all n=20 independent networks studied, performance improves with longer stimulus presentation time. However, in many networks the improvement is different from that expected of perfect integration (red line). This variation has no impact on other results. (g) A sample network is able to perform well above chance in the Dly DM 1 task for a delay period of up to five seconds.

Supplementary Figure 3 Task and epoch variances.

(a) Visualization of the task variance map using classical multi-dimensional scaling (MDS). MDS tends to preserve global structures, while tSNE tends to emphasize local structures (for example, clustering). (b) Epoch variance is computed in a similar way to task variance, except that it is computed for individual task epochs instead of tasks. There are clusters of units that are selective in specific epochs. (c) Visualization of the epoch variance map in the same style as Fig. 2d.

Supplementary Figure 4 Determining number of clusters.

The silhouette score as a function of the number of clusters for an example network with the Softplus activation function (a) and one with the Tanh activation function (b). The silhouette score assesses the quality of a clustering scheme (see Methods). The ‘optimal’ or natural number of clusters is chosen to be the one with the highest silhouette score.

Supplementary Figure 5 Connectivity matrix.

The full connectivity matrix for an example reference network. The network units are first sorted according to their cluster identity. Within each cluster, the units are sorted according to their preferred input directions, as defined by the input direction making the strongest connection weights to each unit (summed across modality 1 and 2). Color range is determined separately for each sub-matrix for better visualization. Red means more excitatory and blue means more inhibitory.

Supplementary Figure 6 Fractional variance distributions for all pairs of tasks.

(a) There is a total of 190 unique pairs of tasks from all 20 tasks trained. Each fractional variance distribution (black) shown here is averaged across 20 independently trained networks. As a control, we also computed fractional variance distributions (gray) from activities of surrogate units that are generated by randomly mixing activities of the original network units (see Methods). The y-axis range is shared across all plots.

Supplementary Figure 7 Detailed behavioral effect of lesioning on the Ctx DM 1 task.

(a-e) The network choice in the Ctx DM 1 task for different combinations of modality 1 and modality 2 coherence in various networks. (a) The intact network’s choice only depends on the coherence of modality 1. (b) Lesioning group 1 makes the network more dependent on the coherence of modality 2. (c) Lesioning group 2 has no impact for the Ctx DM 1 task. (d) Lesioning both group 1 and 2 allow the network to weigh both modalities equally. (e) Lesioning group 12 led to failure in making decisions. Although some preference towards modality 1 is preserved, the network is largely unable to choose decisively.

Supplementary Figure 8 Representation of all tasks in state space.

(a) The representation of each task is computed the same way as in Fig. 6. Here showing the representation of all tasks in the top two principal components. RT Go and RT Anti tasks are not shown here because there is no well-defined stimulus epoch in these tasks.

Supplementary Figure 9 Visualization of connection weights of rule inputs.

(a) Connection weights from rule input units representing Go, Dly Go, Anti, Dly Anti tasks visualized in the space spanned by the top two principal components (PCs) for a sample network. Similar to Fig. 6, the top two PCs are rotated and reflected (rPCs) to form the two axes. (b) The same analysis as in (a) is performed for 40 networks, and the results are overlaid. (c) Connection weights from rule input units representing Ctx DM 1, Ctx DM 2, Ctx Dly DM 1, and Ctx Dly DM 2 tasks visualized in the top two PCs for a sample network. (d) The same analysis as in (c) for 40 networks.

Supplementary Figure 10 Distributed rule representation.

(a) The same analysis and box-plot convention as Fig. 7b,c, except that the networks are trained using distributed, instead of one-hot, rule representations.

Supplementary Figure 11 Lack of compositionality for the family of matching tasks.

(a) Visualization of task-based network activity for the DMS, DNMS, DMC, and DNMC tasks, for an example network (left) and for 40 networks (right). These plots have the same style as Fig. 6. (b) Visualization of connection weights for the same set of tasks in an example network (left) and for 40 networks (right). The rule weights are not compositional. These plots have the same style as Supplementary Fig. 9. (c) The DMS task can not be performed with a compositional rule input. The box plot convention is the same as the one in Fig. 7b.

Supplementary Figure 12 Partially plastic networks and experimental data.

(a) Networks where only 10% of connection weights are trained show a mixed FTV distribution for the Ctx DM 1 and Ctx DM 2 tasks. Solid lines are median over 60 networks. Shaded areas indicate the 95% confidence interval of the median estimated from bootstrapping. (b-e) FTV distributions derived from experimental data (reference 11). (b) Monkey A, single units. (c) Monkey A, all units. (d) Monkey F, single units. (e) Monkey F, all units.

Supplementary information

Supplementary Figures 1–12

Supplementary Figs. 1–12 and Supplementary Table 1

Reporting Summary

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Further reading

Fig. 1: A recurrent neural network model is trained to perform a large number of cognitive tasks.
Fig. 2: The emergence of functionally specialized clusters for task representation.
Fig. 3: The activation function dictates whether clusters emerge in a network.
Fig. 4: A diversity of neural relationships between pairs of tasks.
Fig. 5: Dissecting a reference network for the context-dependent DM tasks.
Fig. 6: Compositional representation of tasks in state space.
Fig. 7: Performing tasks with algebraically composite rule inputs.
Fig. 8: Sequential training of cognitive tasks.
Supplementary Figure 1: Sample trials from the 20 tasks trained.
Supplementary Figure 2: Psychometric tests for a range of tasks.
Supplementary Figure 3: Task and epoch variances.
Supplementary Figure 4: Determining number of clusters.
Supplementary Figure 5: Connectivity matrix.
Supplementary Figure 6: Fractional variance distributions for all pairs of tasks.
Supplementary Figure 7: Detailed behavioral effect of lesioning on the Ctx DM 1 task.
Supplementary Figure 8: Representation of all tasks in state space.
Supplementary Figure 9: Visualization of connection weights of rule inputs.
Supplementary Figure 10: Distributed rule representation.
Supplementary Figure 11: Lack of compositionality for the family of matching tasks.
Supplementary Figure 12: Partially plastic networks and experimental data.