Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Multitask representations in the human cortex transform along a sensory-to-motor hierarchy

Abstract

Human cognition recruits distributed neural processes, yet the organizing computational and functional architectures remain unclear. Here, we characterized the geometry and topography of multitask representations across the human cortex using functional magnetic resonance imaging during 26 cognitive tasks in the same individuals. We measured the representational similarity across tasks within a region and the alignment of representations between regions. Representational alignment varied in a graded manner along the sensory–association–motor axis. Multitask dimensionality exhibited compression then expansion along this gradient. To investigate computational principles of multitask representations, we trained multilayer neural network models to transform empirical visual-to-motor representations. Compression-then-expansion organization in models emerged exclusively in a rich training regime, which is associated with learning optimized representations that are robust to noise. This regime produces hierarchically structured representations similar to empirical cortical patterns. Together, these results reveal computational principles that organize multitask representations across the human cortex to support multitask cognition.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Overview of analytic approaches to study the geometry and topography of multitask representations in fMRI data.
Fig. 2: Leveraging the MDTB dataset to investigate multitask representations.
Fig. 3: Cortical organization of multitask representations.
Fig. 4: The representational dimensionality of task activations follows hierarchical organization.
Fig. 5: Principal component of the RA matrix reveals a sensory-to-motor gradient that compresses then expands task representations.
Fig. 6: Multitask representations in the human cortex were consistent with ANN representations trained in a rich regime.
Fig. 7: Analysis of the ANN revealed that richly trained ANNs learn diverse and structured representations consistent with empirical data.
Fig. 8: Trajectories of representational transformations from visual to motor content.

Similar content being viewed by others

Data availability

All data in this study have been made publicly available on OpenNeuro by King and colleagues (accession number ds002105 (ref. 18)).

Code availability

All code related to this study is publicly available on GitHub (https://github.com/murraylab/multitaskhierarchy). Analyses and models were implemented using Python (version 3.8.5). Cortical visualizations were implemented using workbench (version 1.5.0).

References

  1. Genon, S., Reid, A., Langner, R., Amunts, K. & Eickhoff, S. B. How to characterize the function of a brain region. Trends Cogn. Sci. 22, 350–364 (2018).

    Article  Google Scholar 

  2. Poldrack, R. A. Inferring mental states from neuroimaging data: from reverse inference to large-scale decoding. Neuron 72, 692–697 (2011).

    Article  CAS  Google Scholar 

  3. Gallant, J., Nishimoto, S., Naslaris, T. & Wu, M. C. K. In Visual Population Codes: Toward a Common Multivariate Framework for Cell Recording and Functional Imaging (eds Kriegeskort N. & Krieman G.) Ch. 6 (The MIT Press, 2011).

  4. Kriegeskorte, N., Mur, M. & Bandettini, P. Representational similarity analysis—connecting the branches of systems neuroscience. Front. Syst. Neurosci. 2, 4 (2008).

    Google Scholar 

  5. Khaligh-Razavi, S. M. & Kriegeskorte, N. Deep supervised, but not unsupervised, models may explain IT cortical representation. PLoS Comput. Biol. 10, e1003915 (2014).

    Article  Google Scholar 

  6. Kanwisher, N. Functional specificity in the human brain: a window into the functional architecture of the mind. Proc. Natl Acad. Sci. USA 107, 11163–11170 (2010).

    Article  CAS  Google Scholar 

  7. Curtis, C. E. & D’Esposito, M. Persistent activity in the prefrontal cortex during working memory. Trends Cogn. Sci. 7, 415–423 (2003).

    Article  Google Scholar 

  8. Wandell, B. A. & Winawer, J. Computational neuroimaging and population receptive fields. Trends Cogn. Sci. 19, 349–357 (2015).

    Article  Google Scholar 

  9. Arbuckle, S. A. et al. Structure of population activity in primary motor cortex for single finger flexion and extension. J. Neurosci. 40, 9210–9223 (2020).

    Article  CAS  Google Scholar 

  10. Yeo, B. T. T. et al. Functional specialization and flexibility in human association cortex. Cereb. Cortex 25, 3654–3672 (2015).

    Article  Google Scholar 

  11. Smith, S. M. et al. Correspondence of the brain’s functional architecture during activation and rest. Proc. Natl Acad. Sci. USA 106, 13040–13045 (2009).

    Article  CAS  Google Scholar 

  12. Margulies, D. S. et al. Situating the default-mode network along a principal gradient of macroscale cortical organization. Proc. Natl Acad. Sci. USA 113, 12574–12579 (2016).

    Article  CAS  Google Scholar 

  13. Yamins, D. L. K. et al. Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proc. Natl Acad. Sci. USA 111, 8619–8624 (2014).

    Article  CAS  Google Scholar 

  14. Huth, A. G., Heer, W. A. D., Griffiths, T. L., Theunissen, F. E. & Jack, L. Natural speech reveals the semantic maps that tile human cerebral cortex. Nature 532, 453–458 (2016).

    Article  Google Scholar 

  15. Naselaris, T., Allen, E. & Kay, K. Extensive sampling for complete models of individual brains. Curr. Opin. Behav. Sci. 40, 45–51 (2021).

    Article  Google Scholar 

  16. Yang, G. R., Cole, M. W. & Rajan, K. How to study the neural mechanisms of multiple tasks. Curr. Opin. Behav. Sci. 29, 134–143 (2019).

    Article  Google Scholar 

  17. Nakai, T. & Nishimoto, S. Quantitative models reveal the organization of diverse cognitive functions in the brain. Nat. Commun. 11, 1142 (2020).

    Article  CAS  Google Scholar 

  18. King, M., Hernandez-Castillo, C. R., Poldrack, R. A., Ivry, R. B. & Diedrichsen, J. Functional boundaries in the human cerebellum revealed by a multi-domain task battery. Nat. Neurosci. 22, 1371–1378 (2019).

    Article  CAS  Google Scholar 

  19. Bernhardt, B. C., Smallwood, J., Keilholz, S. & Margulies, D. S. Gradients in brain organization. NeuroImage 251, 118987 (2022).

    Article  Google Scholar 

  20. Ansuini, A., Laio, A., Macke, J. H. & Zoccolan, D. Intrinsic dimension of data representations in deep neural networks. In Advances in Neural Information Processing Systems Vol. 32 (Curran Associates, Inc., 2019).

  21. Recanatesi, S. et al. Dimensionality compression and expansion in deep neural networks. Preprint at https://doi.org/10.48550/arXiv.1906.00443 (2019).

  22. Flesch, T., Juechems, K., Dumbalska, T., Saxe, A. & Summerfield, C. Rich and lazy learning of task representations in brains and neural networks. Preprint at bioRxiv https://doi.org/10.1101/2021.04.23.441128 (2021).

  23. Woodworth, B. et al. Kernel and rich regimes in overparametrized models. In Conference on Learning Theory 3635–3673 (PMLR, 2020).

  24. Glasser, M. F. et al. A multi-modal parcellation of human cerebral cortex. Nature 536, 171–178 (2016).

  25. Power, J. D. et al. Functional network organization of the human brain. Neuron 72, 665–678 (2011).

    Article  CAS  Google Scholar 

  26. Yeo, B. T. et al. The organization of the human cerebral cortex estimated by intrinsic functional connectivity. J. Neurophysiol. 106, 1125–1165 (2011).

    Article  Google Scholar 

  27. Cole, M. W., Bassett, D. S., Power, J. D., Braver, T. S. & Petersen, S. E. Intrinsic and task-evoked network architectures of the human brain. Neuron 83, 238–251 (2014).

    Article  CAS  Google Scholar 

  28. Ji, J. L. et al. Mapping the human brain’s cortical–subcortical functional network organization. NeuroImage 185, 35–57 (2019).

    Article  Google Scholar 

  29. Huntenburg, J. M., Bazin, P. -L. & Margulies, D. S. Large-scale gradients in human cortical organization. Trends Cogn. Sci. 22, 21–31 (2018).

    Article  Google Scholar 

  30. Chan, M. Y., Park, D. C., Savalia, N. K., Petersen, S. E. & Wig, G. S. Decreased segregation of brain systems across the healthy adult lifespan. Proc. Natl Acad. Sci. USA 111, E4997–E5006 (2014).

    Article  CAS  Google Scholar 

  31. Burt, J. B. et al. Hierarchy of transcriptomic specialization across human cortex captured by structural neuroimaging topography. Nat. Neurosci. 21, 1251–1259 (2018).

    Article  CAS  Google Scholar 

  32. Glasser, M. F. & Van Essen, D. C. Mapping human cortical areas in vivo based on myelin content as revealed by T1-and T2-weighted MRI. J. Neurosci. 31, 11597–11616 (2011).

    Article  CAS  Google Scholar 

  33. Badre, D., Bhandari, A., Keglovits, H. & Kikumoto, A. The dimensionality of neural representations for control. Curr. Opin. Behav. Sci. 38, 20–28 (2021).

    Article  Google Scholar 

  34. Rigotti, M. et al. The importance of mixed selectivity in complex cognitive tasks. Nature 497, 585–590 (2013).

    Article  CAS  Google Scholar 

  35. Abbott, L. F., Rajan, K. & Sompolinsky, H. In The Dynamic Brain: An Exploration of Neuronal Variability and Its Functional Significance (eds Ding M. & Glanzman D.) 1–16 (Oxford University Press, 2011).

  36. Gao, P. et al. A theory of multineuronal dimensionality, dynamics and measurement. Preprint at bioRxiv https://doi.org/10.1101/214262 (2017).

  37. Recanatesi, S., Ocker, G. K., Buice, M. A. & Shea-Brown, E. Dimensionality in recurrent spiking networks: global trends in activity and local origins in connectivity. PLoS Comput. Biol. 15, e1006446 (2019).

    Article  CAS  Google Scholar 

  38. Bhandari, A., Gagne, C. & Badre, D. Just above chance: is it harder to decode information from prefrontal cortex hemodynamic activity patterns? J. Cogn. Neurosci. 30, 1473–1498 (2018).

    Article  Google Scholar 

  39. Bassett, D. S. & Bullmore, E. Small-world brain networks. Neuroscientist 12, 512–523 (2006).

    Article  Google Scholar 

  40. Felleman, D. J. & Van Essen, D. C. Distributed hierarchical processing in the primate cerebral cortex. Cereb. Cortex 1, 1–47 (1991).

    Article  CAS  Google Scholar 

  41. Honey, C. J. et al. Slow cortical dynamics and the accumulation of information over long timescales. Neuron 76, 423–434 (2012).

    Article  CAS  Google Scholar 

  42. Ito, T., Hearne, L. J. & Cole, M. W. A cortical hierarchy of localized and distributed processes revealed via dissociation of task activations, connectivity changes, and intrinsic timescales. NeuroImage 221, 117141 (2020).

    Article  Google Scholar 

  43. Cole, M. W. et al. Multi-task connectivity reveals flexible hubs for adaptive task control. Nat. Neurosci. 16, 1348–1355 (2013).

    Article  CAS  Google Scholar 

  44. van den Heuvel, M. P. & Sporns, O. Network hubs in the human brain. Trends Cogn. Sci. 17, 683–696 (2013).

    Article  Google Scholar 

  45. Shine, J. M. et al. Human cognition involves the dynamic integration of neural activity and neuromodulatory systems. Nat. Neurosci. 22, 289 (2019).

    Article  CAS  Google Scholar 

  46. Bernardi, S. et al. The geometry of abstraction in the hippocampus and prefrontal cortex. Cell 183, 954–967 (2020).

    Article  CAS  Google Scholar 

  47. Flesch, T., Juechems, K., Dumbalska, T., Saxe, A. & Summerfield, C. Orthogonal representations for robust context-dependent task performance in brains and neural networks. Neuron 110, 1258–1270 (2022).

    Article  CAS  Google Scholar 

  48. Ito, T. et al. Compositional generalization through abstract representations in human and artificial neural networks. Preprint at https://doi.org/10.48550/arXiv.2209.07431 (2022).

  49. Cole, M. W., Laurent, P. & Stocco, A. Rapid instructed task learning: a new window into the human brain’s unique capacity for flexible cognitive control. Cogn. Affect. Behav. Neurosci. 13, 1–22 (2012).

    Article  Google Scholar 

  50. van Bergen, R. S. & Kriegeskorte, N. Going in circles is the way forward: the role of recurrence in visual inference. Curr. Opin. Neurobiol. 65, 176–193 (2020).

    Article  Google Scholar 

  51. Kar, K., Kubilius, J., Schmidt, K., Issa, E. B. & DiCarlo, J. J. Evidence that recurrent circuits are critical to the ventral stream’s execution of core object recognition behavior. Nat. Neurosci. 22, 974–983 (2019).

    Article  CAS  Google Scholar 

  52. Yang, G. R., Joglekar, M. R., Song, H. F., Newsome, W. T. & Wang, X. -J. Task representations in neural networks trained to perform many cognitive tasks. Nat. Neurosci. 22, 297–306 (2019).https://doi.org/10.1038/s41593-018-0310-2

  53. Shahbazi, M., Shirali, A., Aghajan, H. & Nili, H. Using distance on the Riemannian manifold to compare representations in brain and in models. NeuroImage 239, 118271 (2021).

    Article  Google Scholar 

  54. Williams, A. H., Kunz, E., Kornblith, S. & Linderman, S. W. Generalized shape metrics on neural representations. Preprint at https://doi.org/10.48550/arXiv.2110.14739 (2021).

  55. Zhi, D., King, M., Hernandez-Castillo, C. R. & Diedrichsen, J. Evaluating brain parcellations using the distance-controlled boundary coefficient. Hum. Brain Mapp. 43, 3706–3720 (2022).

  56. Glasser, M. F. et al. The minimal preprocessing pipelines for the Human Connectome Project. NeuroImage 80, 105–124 (2013).

    Article  Google Scholar 

  57. Ji, J. L. et al. QuNex—a scalable platform for integrative multi-modal neuroimaging data processing and analysis. Preprint at bioRxiv https://doi.org/10.1101/2022.06.03.494750 (2022).

  58. Ito, T. et al. Task-evoked activity quenches neural correlations and variability across cortical areas. PLoS Comput. Biol. 16, e1007983 (2020).

    Article  CAS  Google Scholar 

  59. Ciric, R. et al. Benchmarking of participant-level confound regression strategies for the control of motion artifact in studies of functional connectivity. NeuroImage 154, 174–187 (2017).

    Article  Google Scholar 

  60. Glasser, M. F. et al. The Human Connectome Project’s neuroimaging approach. Nat. Neurosci. 19, 1175–1187 (2016).

    Article  Google Scholar 

  61. Rissman, J., Gazzaley, A. & D’Esposito, M. Measuring functional connectivity during distinct stages of a cognitive task. NeuroImage 23, 752–763 (2004).

    Article  Google Scholar 

  62. Friston, K. J. et al. Statistical parametric maps in functional imaging: a general linear approach. Hum. Brain Mapp. 2, 189–210 (1994).

    Article  Google Scholar 

  63. Schaefer, A. et al. Local-global parcellation of the human cerebral cortex from intrinsic functional connectivity MRI. Cereb. Cortex 28, 3095–3114 (2018).

    Article  Google Scholar 

  64. Abdollahi, R. O. et al. Correspondences between retinotopic areas and myelin maps in human visual cortex. NeuroImage 99, 509–524 (2014).

    Article  Google Scholar 

  65. Bobadilla-Suarez, S., Ahlheim, C., Mehrotra, A., Panos, A. & Love, B. C. Measures of neural similarity. Comput. Brain Behav. 3, 369–383 (2020).

    Article  CAS  Google Scholar 

  66. Walther, A. et al. Reliability of dissimilarity measures for multi-voxel pattern analysis. NeuroImage 137, 188–200 (2016).

    Article  Google Scholar 

  67. Basti, A., Nili, H., Hauk, O., Marzetti, L. & Henson, R. N. Multi-dimensional connectivity: a conceptual and mathematical review. NeuroImage 221, 117179 (2020).

    Article  Google Scholar 

  68. Burt, J. B., Helmer, M., Shinn, M., Anticevic, A. & Murray, J. D. Generative modeling of brain maps with spatial autocorrelation. NeuroImage 220, 117038 (2020).

    Article  Google Scholar 

  69. Glorot, X. & Bengio, Y. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the 13th International Conference on Artificial Intelligence and Statistics 249–256 (JMLR Workshop and Conference Proceedings, 2010).

  70. Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. Preprint at https://doi.org/10.48550/arXiv.1412.6980 (2015).

Download references

Acknowledgements

This project was supported by NIH grant R01MH112746 (J.D.M.), NSF NeuroNex grant 2015276 (J.D.M.) and a Swartz Foundation Fellowship (T.I.). We acknowledge the Yale Center for Research Computing at Yale University for providing access to the Grace cluster and associated research computing resources. We thank M. King, J. Diedrichsen and colleagues for providing public access to the dataset. We also thank W. Pettine, M. Helmer and J. Miller for comments on earlier drafts of the manuscript.

Author information

Authors and Affiliations

Authors

Contributions

T.I. and J.D.M. conceptualized the project and wrote the paper. T.I. performed the formal analysis and visualization, developed software and wrote the original draft. J.D.M. acquired funding and supervised the project.

Corresponding author

Correspondence to John D. Murray.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Neuroscience thanks Matthew Farrell, Lucina Uddin, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Whole-cortex group activation maps for all 26 cognitive tasks.

Activation maps reflect the GLM beta values and were averaged across conditions within each task.

Extended Data Fig. 2 Comparing segregation of whole-cortex RSFC and RA between unimodal-transmodal areas and functional networks.

a, b) Force-directed graphs comparing RSFC and RA community structure (color-coated by functional networks). c) Segregation of RSFC and d) RA whole-cortex matrices (n = 144 unimodal, n = 246 transmodal). e) The direct comparison of differences in segregation between RA and RSFC for unimodal and transmodal regions (same as Fig. 3h). (Panels c-e are two-sided t-tests.) f, g) Association of regional RA segregation with the cortical myelin map (T1w/T2w structural map). h) Segregation of RSFC by functional networks. i) Segregation of RA by functional networks. Note that for both RA and RSFC, sensorimotor networks have higher segregation than association networks. Boxplot bounds define the 1st and 3rd quartiles of the distribution, box whiskers the 95% confidence interval, and the center line indicates the median. Network key: VIS1 = Visual 1 (n = 6); VIS2 = Visual 2 (n = 54); SMN = Somatomotor (n = 39); VMM = Ventral multimodal (n = 6); AUD = Auditory (n = 15); DAN = Dorsal attention (n = 23); DMN = Default mode (n = 77); CON = Cingulo-opercular (n = 56); PMM = Posterior multimodal (n = 7); FPN = Frontoparietal (n = 50); LAN = Language (n = 23); ORA = Orbital-affective (n = 4). Colors of each network correspond to colors in panel Fig. 3e. (***p = <0.0001, two-sided t-test.).

Extended Data Fig. 3 Representational dimensionality and multi-task decoding produce similar associations with intrinsic hierarchy, even after controlling for parcel size.

a) Correlation of multi-task decoding with the principal RSFC gradient and myelin map across regions. b) Parcel size (number of vertices within a brain region) and representational dimensionality were positively correlated (r = 0.45, non-parametric p < 0.001). However, after accounting for parcel size (that is, the number of vertices within each parcel) as a covariate (via linear regression), a strong association between decodability and intrinsic hierarchy was maintained. c) Same analysis as in panel b, but using representational dimensionality rather than decodability. All correlations in a, b, and c resulted in a non-parametric p < 0.001 using surrogate brain maps that accounted for spatial autocorrelation68. This suggests that the association between representational dimensionality and intrinsic hierarchy is independent of parcel size. Error bands reflect a 95% confidence interval.

Extended Data Fig. 4 Random subsamples of the task set show similar association with both the unimodal-transmodal and the sensorimotor hierarchy.

a) The association between representational dimensionality and the principal RSFC gradient (unimodal-transmodal hierarchy) with the entire task set. b) We randomly sub-sampled (without replacement) tasks to downsize the RSMs of all parcels, and then measured the correlation between representational dimensionality and RSFC gradient 1. For each sub-sample size, we repeatedly chose (that is, 45 choose n) 20 times to estimate the robustness of the association with arbitrary selection of tasks. The association increased and stabilized as we increased the number of tasks (n = 20). c) Same as in b, but using the myelin map. d) The compression-then-expansion fit of representational dimensionality and the sensorimotor (RSFC gradient 2) hierarchy. e) We estimated the 2nd-order polynomial fit for randomly sub-sampled tasks, and assessed the coefficient of 2nd-order polynomial fit. The higher (and more positive) the parameter, the more convex the compression-then-expansion was. Increased compression-then-expansion as the number of randomly sampled tasks were included (n = 20 random subsamples). f) Same procedure as e, but measuring the R-squared of the polynomial fit rather than the 2nd-order coefficient term. Boxplot bounds define the 1st and 3rd quartiles of the distribution, box whiskers the 95% confidence interval, and the center line indicates the median. Error bands reflect a 95% confidence interval in panels a and d.

Extended Data Fig. 5 Establishing compression-then-expansion of representational dimensionality across the sensory-motor hierarchy via model adjudication.

a) We fit the representational dimensionality of parcels across the sensory-motor RSFC gradient using three competing models: Quadratic (2nd-order polynomial), linear, and an exponential decay model, where separate models were fit for loadings less than and greater than 0. b) The Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC) for all models, which takes into account the maximum likelihood of each model while penalizing the models with more free parameters. Quadratic models had the smallest values for both AIC and BIC. c,d) Same as panels a and b, but using the RA principal gradient. Quadratic models were defined as \(y = \beta _0 + \beta _1x + \beta _2x^2 + {\it{\epsilon }}\). Linear models were defined as \(y = \beta _0 + \beta _1x + {\it{\epsilon }}\). Exponential decay models were defined as \(y\left( t \right) = N_0e^{ - \lambda t} + {\it{\epsilon }}\).

Extended Data Fig. 6 Supplemental information on ANN modeling during rich and lazy training regimes.

The similarity between a) the RSMs for V1 and the gradient-identified input parcel for model construction and b) the RSMs for M1 and the gradient-selected motor output parcel. Overall, the representational geometries were highly similar between V1 and the input RSM, and M1 and the motor output RSM. d) The training cost (that is, number of training epochs required) for different weight initializations. Visualization of RSMs for example ANNs (one initialization each) for e) rich, f) intermediate (that is, initialization SD = 1.0), and g) lazy training regimes. h-j) Characterizing the structural network mechanisms that give rise to differences in representational structure across learning regimes in the ANN. h) Initialized and trained norm of ANN weights as a function of weight initialization. In line with previous work47, the Frobenius norm of the trained ANN, which reflects the variability of the hidden weight projections, were significantly smaller in the rich training regime. i) The kurtosis of the degree distribution during initialization and after training. The kurtosis of the weight distribution measures the tailedness of the weight distribution. Kurtosis (in terms of connectivity weights) reflects the small-worldness of a network, a well-documented feature in empirical brain networks39. The kurtosis of richly trained networks was higher than in lazily trained networks, producing a heavy-tailed weight distribution. j) We characterized the dimensionality of the ANN weights to gain insight into the successive representational transformations in the ANN across 20 initializations per weight distribution (n = 20). Weight dimensionality was computed by performing a singular value decomposition (SVD) on the weights, and then calculating the participation ratio of the singular values. The dimensionality of the learned weights directly constrains the representations the ANNs produce. Low-dimensionality of the connectivity weights likely aids in cross-task generalization, since low-dimensional connections force the network to extract shared components across tasks. Weight dimensionality was lower in rich training regimes. These findings suggest that across layers, richly trained ANNs with low-dimensional and low-variability weights collectively produced modular patterns of representations across layers, consistent with empirical data. Boxplot bounds define the 1st and 3rd quartiles of the distribution, box whiskers the 95% confidence interval, and the center line indicates the median.

Extended Data Fig. 7 Training an ANN with untied weights results in qualitatively similar results.

We trained a 5-layer ANN with untied weights to produce qualitatively similar results to the ANN in the main manuscript. We reduced the number of layers from 10 to 5 and the number of hidden units from 500 to 250 for computational efficiency. (An ANN with untied weights has significantly greater parameters than one with tied weights.) a) Representational dimensionality of ANN layers for different weight initializations. b) ANN architecture. c) Richly trained ANNs had significantly higher similarity with representations found in empirical data relative to lazily trained ANNs (n = 20). d) Similarity to fMRI data by layer (rich minus lazy ANNs) (n = 20). e) Representational alignment of each ANN’s layer (cosine similarity between RSMs). f) Overall similarity of representations across ANN layers. Greater representational dissimilarity (across layers) is found in richly trained ANNs (n = 20). g) Variance explained of the first principal component for each of the RA matrices in panel e (n = 20). h) Frobenius norm of the weight distribution across initializations. i) The kurtosis (tailedness) of the weight distribution across layers under different weight initialization schemes. j) SVD of ANN weights. k) Dimensionality (participation ratio) of the weights for different initializations (n = 20). Richer training regimes produce low-dimensional weights. Boxplot bounds define the 1st and 3rd quartiles of the distribution, box whiskers the 95% confidence interval, and the center line indicates the median. (***p < 0.0001, two-sided t-test.).

Extended Data Fig. 8 Using standard stochastic gradient descent (without momentum/weight decay) with tied weights also produces qualitatively similar results.

To explore the impact of model optimization and network size on the learned representations in ANNs, we trained a 5-layer ANN to produce qualitatively similar results to the ANN in the main manuscript (Figs. 6, 7). We reduced the number of layers from 10 to 5 for computational efficiency. Instead of using the Adam optimizer (with a learning rate of 0.0001), we used standard stochastic gradient descent with a learning rate of 0.01. (Note that smaller learning rates were highly computationally intractable for learning in the rich training regime.) We did not include model initializations with SD > 1.4 due to exploding gradients. a) Representational dimensionality of ANN layers for different weight initializations. b) ANN architecture. c) Richly trained ANNs had significantly higher similarity with representations found in empirical data relative to lazily trained ANNs (rich>1.0, lazy<1.0) (n = 20). d) Similarity to fMRI data by layer (rich minus lazy ANNs) (n = 20). e) Representational alignment of each ANN’s layer (cosine similarity between RSMs). f) Overall similarity of representations across ANN layers (n = 20). Greater representational dissimilarity (across layers) is found in richly trained ANNs. g) Cumulative variance explained of the first three principal components for each of the RA matrices in panel e. h) Dimensionality (participation ratio) of the learned connectivity weights for different initializations (n = 20). i) Average training cost by weight initialization. Boxplot bounds define the 1st and 3rd quartiles of the distribution, box whiskers the 95% confidence interval, and the center line indicates the median. (**p < 0.001, two-sided t-test.).

Extended Data Fig. 9 The importance of within-subject analyses to capture fine-grained representational patterns.

a) Representational dimensionality across the cortical surface when computing dimensionality using the group-averaged RSM (rather than subject-specific RSM). b) We computed the correlation between representational dimensionality with two proxies of the unimodal-transmodal hierarchy: RSFC principal gradient and the myelin map (T1w/T2w contrast). We find that when calculating dimensionality from RSMs derived from group-level activation averages, the association with the unimodal-transmodal hierarchy is significantly reduced. c) We subsequently measured dimensionality across the sensory-association-motor systems, finding that in contrast to within-subject estimates of representational dimensionality, we no longer observed the dimensionality compression from sensory to association systems in group-derived maps (sensory, n = 75; association, n = 246; motor, n = 39). d) Representational dimensionality measured using individual RSMs. (Same as in Fig. 4b, for visual comparison.) e) Dimensionality across the sensory-association-motor hierarchy using dimensionality computed from individual RSMs (same as in Fig. 5g, for visual comparison). Boxplot bounds define the 1st and 3rd quartiles of the distribution, box whiskers the 95% confidence interval, and the center line indicates the median. (***p < 0.0001, *p < 0.05, two-sided t-test.).

Extended Data Fig. 10 Corroborating evidence of the sensory-association-motor axis of hierarchical organization extracted using non-negative matrix factorization (NMF).

This revealed that the sensory-to-motor hierarchy was robust to different matrix decomposition algorithms. a) The first component extracted using PCA and b) NMF. c) Correlation between the first components extracted with PCA and NMF. d) Correlation of RA gradient 1 (NMF) with the RSFC sensorimotor hierarchy (RSFC gradient 2).

Supplementary information

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ito, T., Murray, J.D. Multitask representations in the human cortex transform along a sensory-to-motor hierarchy. Nat Neurosci 26, 306–315 (2023). https://doi.org/10.1038/s41593-022-01224-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41593-022-01224-0

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing