Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Optimal routing to cerebellum-like structures

Abstract

The vast expansion from mossy fibers to cerebellar granule cells (GrC) produces a neural representation that supports functions including associative and internal model learning. This motif is shared by other cerebellum-like structures and has inspired numerous theoretical models. Less attention has been paid to structures immediately presynaptic to GrC layers, whose architecture can be described as a ‘bottleneck’ and whose function is not understood. We therefore develop a theory of cerebellum-like structures in conjunction with their afferent pathways that predicts the role of the pontine relay to cerebellum and the glomerular organization of the insect antennal lobe. We highlight a new computational distinction between clustered and distributed neuronal representations that is reflected in the anatomy of these two brain structures. Our theory also reconciles recent observations of correlated GrC activity with theories of nonlinear mixing. More generally, it shows that structured compression followed by random expansion is an efficient architecture for flexible computation.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Similar routing architecture to expanded representations.
Fig. 2: Selectivity to task-relevant dimensions determines learning performance.
Fig. 3: Optimal compression for clustered and distributed representations.
Fig. 4: Compression of clustered representations in the insect olfactory system.
Fig. 5: Compression of distributed representations in the corticocerebellar pathway.
Fig. 6: Biologically plausible learned compression.
Fig. 7: Bottleneck model can explain correlations and selectivity of recorded GrC.
Fig. 8: Comparison between bottleneck architecture and single-step network.

Similar content being viewed by others

Data availability

The data analyzed in this study was previously published in Hallem and Carlson26 and Wagner et al.13, and is available upon request.

Code availability

All the simulations and analyses were performed using custom code written in Python (https://www.python.org), and can be downloaded at www.columbia.edu/spm2176/code/muscinelli_2023.zip.

References

  1. Yamins, D. L. K. & DiCarlo, J. J. Using goal-driven deep learning models to understand sensory cortex. Nat. Neurosci. 19, 356–365 (2016).

    CAS  PubMed  Google Scholar 

  2. Bell, C. C., Han, V. & Sawtell, N. B. Cerebellum-like structures and their implications for cerebellar function. Annu. Rev. Neurosci. 31, 1–24 (2008).

    CAS  PubMed  Google Scholar 

  3. Marr, D. A theory of cerebellar cortex. J. Physiol. 202, 437–470 (1969).

    CAS  PubMed  PubMed Central  Google Scholar 

  4. Babadi, B. & Sompolinsky, H. Sparseness and expansion in sensory representations. Neuron 83, 1213–1226 (2014).

    CAS  PubMed  Google Scholar 

  5. Litwin-Kumar, A., Harris, K. D., Axel, R., Sompolinsky, H. & Abbott, L. F. Optimal degrees of synaptic connectivity. Neuron 93, 1153–1164.e7 (2017).

    PubMed  PubMed Central  Google Scholar 

  6. Cayco-Gajic, N. A. & Silver, R. A. Re-evaluating circuit mechanisms underlying pattern separation. Neuron 101, 584–602 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  7. Brodal, P. & Bjaalie, J. G. Organization of the pontine nuclei. Neurosci. Res. 13, 83–118 (1992).

    CAS  PubMed  Google Scholar 

  8. Chen, W. R. & Shepherd, G. M. The olfactory glomerulus: a cortical module with specific functions. J. Neurocytol. 34, 353–360 (2005).

    PubMed  Google Scholar 

  9. Bhandawat, V., Olsen, S. R., Gouwens, N. W., Schlief, M. L. & Wilson, R. I. Sensory processing in the Drosophila antennal lobe increases reliability and separability of ensemble odor representations. Nat. Neurosci. 10, 1474–1482 (2007).

    CAS  PubMed  PubMed Central  Google Scholar 

  10. Olsen, S. R. & Wilson, R. I. Lateral presynaptic inhibition mediates gain control in an olfactory circuit. Nature 452, 956–960 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  11. Olsen, S. R., Bhandawat, V. & Wilson, R. I. Divisive normalization in olfactory population codes. Neuron 66, 287–299 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  12. Guo, J.-Z. et al. Disrupting cortico-cerebellar communication impairs dexterity. eLife 10, e65906 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  13. Wagner, M. J. et al. Shared cortex-cerebellum dynamics in the execution and learning of a motor task. Cell 177, 669–682.e24 (2019).

    PubMed  PubMed Central  Google Scholar 

  14. Vosshall, L. B., Wong, A. M. & Axel, R. An olfactory sensory map in the fly brain. Cell 102, 147–159 (2000).

    CAS  PubMed  Google Scholar 

  15. Marin, E. C., Jefferis, G. S. X. E., Komiyama, T., Zhu, H. & Luo, L. Representation of the glomerular olfactory map in the Drosophila brain. Cell 109, 243–255 (2002).

    CAS  PubMed  Google Scholar 

  16. Wong, A. M., Wang, J. W. & Axel, R. Spatial representation of the glomerular map in the Drosophila protocerebrum. Cell 109, 229–241 (2002).

    CAS  PubMed  Google Scholar 

  17. Berck, M. E. et al. The wiring diagram of a glomerular olfactory system. eLife 5, e14859 (2016).

    PubMed  PubMed Central  Google Scholar 

  18. Bates, A. S. et al. Complete connectomic reconstruction of olfactory projection neurons in the fly brain. Curr. Biol. 30, 3183–3199.e6 (2020).

    PubMed  PubMed Central  Google Scholar 

  19. Chadderton, P., Margrie, T. W. & Häusser, M. Integration of quanta in cerebellar granule cells during sensory processing. Nature 428, 856–860 (2004).

    CAS  PubMed  Google Scholar 

  20. Ito, I., Ong, R. C.-Y., Raman, B. & Stopfer, M. Sparse odor representation and olfactory learning. Nat. Neurosci. 11, 1177–1184 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  21. Kolkman, K. E., McElvain, L. E. & du Lac, S. Diverse precerebellar neurons share similar intrinsic excitability. J. Neurosci. 31, 16665–16674 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  22. Shenoy, K. V., Sahani, M. & Churchland, M. M. Cortical control of arm movements: a dynamical systems perspective. Annu. Rev. Neurosci. 36, 337–359 (2013).

    CAS  PubMed  Google Scholar 

  23. Rumelhart, D. E., Hinton, G. E. & Williams, R. J. Learning representations by back-propagating errors. Nature 323, 533–536 (1986).

    Google Scholar 

  24. Caron, S. J. C., Ruta, V., Abbott, L. F. & Axel, R. Random convergence of olfactory inputs in the Drosophila mushroom body. Nature 497, 113–117 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  25. Gruntman, E. & Turner, G. C. Integration of the olfactory code across dendritic claws of single mushroom body neurons. Nat. Neurosci. 16, 1821–1829 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  26. Hallem, E. A. & Carlson, J. R. Coding of odors by a receptor repertoire. Cell 125, 143–160 (2006).

    CAS  PubMed  Google Scholar 

  27. Friedrich, R. W. & Wiechert, M. T. Neuronal circuits and computations: pattern decorrelation in the olfactory bulb. FEBS Lett. 588, 2504–2513 (2014).

    CAS  PubMed  Google Scholar 

  28. Schlegel, P. et al. Information flow, cell types and stereotypy in a full olfactory connectome. eLife 10, e66018 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  29. Peters, A. J., Lee, J., Hedrick, N. G., O’Neil, K. & Komiyama, T. Reorganization of corticospinal output during motor learning. Nat. Neurosci. 20, 1133–1141 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  30. Wolpert, D. M., Miall, R. C. & Kawato, M. Internal models in the cerebellum. Trends Cogn. Sci. 2, 338–347 (1998).

    CAS  PubMed  Google Scholar 

  31. Russo, A. A. et al. Motor cortex embeds muscle-like commands in an untangled population response. Neuron 97, 953–966.e8 (2018).

    PubMed  PubMed Central  Google Scholar 

  32. Saxena, S., Russo, A. A., Cunningham, J. & Churchland, M. M. Motor cortex activity across movement speeds is predicted by network-level strategies for generating muscle activity. eLife 11, e67620 (2022).

    CAS  PubMed  PubMed Central  Google Scholar 

  33. Gallego, J. A., Perich, M. G., Chowdhury, R. H., Solla, S. A. & Miller, L. E. Long-term stability of cortical population dynamics underlying consistent behavior. Nat. Neurosci. 23, 260–270 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  34. Oja, E. Simplified neuron model as a principal component analyzer. J. Math. Biol. 15, 267–273 (1982).

    CAS  PubMed  Google Scholar 

  35. Pehlevan, C. & Chklovskii, D. B. Optimization theory of Hebbian/anti-Hebbian networks for PCA and whitening. In 53rd Annual Allerton Conference on Communication, Control, and Computing, Monticello, IL, USA 1458–1465 (Allerton, 2015).

  36. Schwarz, C. & Thier, P. Binding of signals relevant for action: towards a hypothesis of the functional role of the pontine nuclei. Trends Neurosci. 22, 443–451 (1999).

    CAS  PubMed  Google Scholar 

  37. Pehlevan, C., Hu, T. & Chklovskii, D. B. A Hebbian/anti-Hebbian neural network for linear subspace learning: a derivation from multidimensional scaling of streaming data. Neural Comput. 27, 1461–1495 (2015).

    PubMed  Google Scholar 

  38. Barak, O., Rigotti, M. & Fusi, S. The sparseness of mixed selectivity neurons controls the generalization–discrimination trade-off. J. Neurosci. 33, 3844–3856 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  39. Ganguli, S. & Sompolinsky, H. Compressed sensing, sparsity, and dimensionality in neuronal information processing and data analysis. Annu. Rev. Neurosci. 35, 485–508 (2012).

    CAS  PubMed  Google Scholar 

  40. Barlow, H. B. in Sensory Communication (ed. Rosenblith, W. A.) 216–234 (MIT Press, 1961).

  41. Atick, J. J. Could information theory provide an ecological theory of sensory processing? Netw. Comput. Neural Syst. 3, 213–251 (1992).

    Google Scholar 

  42. Simoncelli, E. P. Vision and the statistics of the visual environment. Curr. Opin. Neurobiol. 13, 144–149 (2003).

    CAS  PubMed  Google Scholar 

  43. Kramer, M. A. Nonlinear principal component analysis using autoassociative neural networks. AIChE J. 37, 233–243 (1991).

    CAS  Google Scholar 

  44. Benna, M. K. & Fusi, S. Place cells may simply be memory cells: memory compression leads to spatial tuning and history dependence. Proc. Natl Acad. Sci. USA 118, e2018422118 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  45. Baldi, P. & Hornik, K. Neural networks and principal component analysis: learning from examples without local minima. Neural Netw. 2, 53–58 (1989).

    Google Scholar 

  46. Apps, R. & Garwicz, M. Anatomical and physiological foundations of cerebellar information processing. Nat. Rev. Neurosci. 6, 297–311 (2005).

    CAS  PubMed  Google Scholar 

  47. Oscarsson, O. Functional organization of the spino- and cuneocerebellar tracts. Physiol. Rev. 45, 495–522 (1965).

    CAS  PubMed  Google Scholar 

  48. Kennedy, A. et al. A temporal basis for predicting the sensory consequences of motor commands in an electric fish. Nat. Neurosci. 17, 416–422 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  49. Bratton, B. & Bastian, J. Descending control of electroreception. II. Properties of nucleus praeeminentialis neurons projecting directly to the electrosensory lateral line lobe. J. Neurosci. 10, 1241–1253 (1990).

    CAS  PubMed  PubMed Central  Google Scholar 

  50. Kazama, H. & Wilson, R. I. Origins of correlated activity in an olfactory circuit. Nat. Neurosci. 12, 1136–1144 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  51. Chapochnikov, N. M., Pehlevan, C. & Chklovskii, D. B. Normative and mechanistic model of an adaptive circuit for efficient encoding and feature extraction. Proc. Natl Acad. Sci. USA 120, e21174841 (2023).

    Google Scholar 

  52. Kebschull, J. M. et al. Cerebellar nuclei evolved by repeatedly duplicating a conserved cell-type set. Science 370, eabd5059 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  53. Barbosa, J., Proville, R., Rodgers, C. C., Ostojic, S. & Boubenec, Y. Flexible selection of task-relevant features through across-area population gating. Preprint at bioRxiv https://doi.org/10.1101/2022.07.21.500962 (2022).

  54. Leergaard, T. B. & Bjaalie, J. G. Topography of the complete corticopontine projection: from experiments to principal Maps. Front. Neurosci. 1, 211–223 (2007).

    PubMed  PubMed Central  Google Scholar 

  55. Kratochwil, C. F., Maheshwari, U. & Rijli, F. M. The long journey of pontine nuclei neurons: from rhombic lip to cortico-ponto-cerebellar circuitry. Front. Neural Circuits https://doi.org/10.3389/fncir.2017.00033 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  56. Mihailoff, G. A., Lee, H., Watt, C. B. & Yates, R. Projections to the basilar pontine nuclei from face sensory and motor regions of the cerebral cortex in the rat. J. Comp. Neurol. 237, 251–263 (1985).

    CAS  PubMed  Google Scholar 

  57. Lanore, F., Cayco-Gajic, N. A., Gurnani, H., Coyle, D. & Silver, R. A. Cerebellar granule cell axons support high-dimensional representations. Nat. Neurosci. 24, 1142–1150 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  58. Xie, M., Muscinelli, S., Harris, K. D. & Litwin-Kumar, A. Task-dependent optimal representations for cerebellar learning. Preprint at bioRxiv https://doi.org/10.1101/2022.08.15.504040 (2022).

  59. Stewart, G. W. The efficient generation of random orthogonal matrices with an application to condition estimators. SIAM J. Numer. Anal. 17, 403–409 (1980).

    Google Scholar 

  60. Abbott, L. F., Rajan, K. & Sompolinsky, H. The Dynamic Brain: An Exploration of Neuronal Variability and its Functional Significance (eds Ding, M. & Glanzman, D.) 65–82 (Oxford Academic, 2011).

  61. Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. Preprint at arXiv https://doi.org/10.48550/arXiv.1412.6980 (2017).

  62. Fagg, A., Sitkoff, N., Barto, A. & Houk, J. Cerebellar learning for control of a two-link arm in muscle space. In Proc. of International Conference on Robotics and Automation, Albuquerque, NM, USA, Vol. 3, 2638–2644 (IEEE, 1997).

Download references

Acknowledgements

We would like to thank M. Xie, A. Hantman, B. Sauerbrei, J. Kadmon and R. Warren for helpful discussions and comments. We would also like to thank L.F. Abbott, N. Sawtell, M. Beiran, K. Lakshminarasimhan, N.A. Cayco-Gajic for their comments on the manuscript. The Wagner laboratory is supported by the NINDS Intramural Research Program. A.L.-K. and S.P.M. were supported by the Gatsby Charitable Foundation, National Science Foundation award DBI-1707398, and the Simons Collaboration on the Global Brain. S.P.M. was also supported by the Swartz Foundation. A.L.-K. was also supported by the Burroughs Wellcome Foundation, the McKnight Endowment Fund and NIH award R01EB029858. We acknowledge computing resources from Columbia University’s Shared Research Computing Facility project, which is supported by NIH Research Facility Improvement Grant 1G20RR030893-01.

Author information

Authors and Affiliations

Authors

Contributions

S.P.M. and A.L.-K. conceived the study. S.P.M. performed simulations and analyses. M.J.W. performed the experiments and provided the data. S.P.M., M.J.W. and A.L.-K. wrote the paper.

Corresponding authors

Correspondence to Samuel P. Muscinelli or Ashok Litwin-Kumar.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Neuroscience thanks the anonymous reviewers for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Learned compression is not beneficial when the input representation is unstructured.

a: Performance over learning when the compression weights are being trained using error backpropagation. Parameters are the same as in Fig. 2a. The solid line and shaded areas indicate the mean and standard deviation of the fraction of errors across network realizations. b: Left: Fraction of error for different network architecture when the input representation consists of random and uncorrelated Gaussian patterns, as in previous work4,5. Single-step expansion performs significantly better than learned compression (two-sided Welch’s t-test, n = 10, t = 4.82, p = 2.4  10−4), presumably due to incomplete convergence of gradient descent, and comparably to whitening compression. Parameters: N = D = P = 500, M = 2000, f = 0.1, σ = 0.1. Right: same as the left panel, but with Nc = N/2 instead of Nc = N. Single-step expansion performs significantly better than learned compression (two-sided Welch’s t-test, n = 10, t = 26.8, p = 1.3  10−15). The box boundary extends from the first to the third quartile of the data. The whiskers extend from the box by 1.5 times the inter-quartile range. The horizontal line indicates the median. Parameters: N = D = P = 500, M = 2000, f = 0.1, σ = 0.1. In both left and right panels, the task-relevant input PC eigenvalues were set to not decay (p = 0) in contrast to previous figures, to consider a fully unstructured input representation.

Extended Data Fig. 2 Sign-constrained compression for clustered and distributed representations.

a: Distribution of the excitatory compression weights that maximize the \({{{{\rm{SNR}}}}}_{{{{\rm{c}}}}}\propto \dim (c){(1-{\Delta }_{{{{\rm{c}}}}})}^{2}\), in the presence of a distributed input representation. b: Standard deviation of the out-degree of the input for the same compression matrix as in a, averaged across 10 realizations (red dashed line). The gray histogram represents the distribution of the same quantity for a compression matrix with the same sparsity but shuffled entries. c, d: Performance of a network with purely excitatory compression in the presence of a distributed input representation. Solid lines and shaded areas indicate the mean and standard deviation of the fraction of errors across network realizations, respectively. Parameters are the same as in Fig. 3e. c: Fraction of errors on a random classification task as a function of the redundancy in the input representation N/D. d: For fixed N/D = 10, network performance for different network architectures, as in Fig. 2a. ‘Excitatory’ indicates a network whose compression weights are trained to maximize the Hebbian SNR at the compression layer, that is \({{{{\rm{SNR}}}}}_{{{{\rm{c}}}}}\propto \dim (c){(1-\Delta c)}^{2}\), while unconstrained indicates a network trained on the same objective but without sign constraints on the weights. Excitatory and optimal compression are not statistically different for n = 10). The training procedure is the same used in Fig. 2a. The box boundary extends from the first to the third quartile of the data. The whiskers extend from the box by 1.5 times the inter-quartile range. The horizontal line indicates the median. e, f: Increasing input redundancy yields a smaller benefit when considering clustered input representations. All the parameters are the same as c, d, except for the type of input representation. e: Same as c, but for a clustered input representation. f: Same as d, but for a clustered input representation. Purely excitatory compression does not achieve the performance of whitening (two-sided Welch’s t-test, t-statistics = 10.615, p = 2.54  10−11, n = 10) nor of unconstrained compression trained with the same objective (two-sided Welch’s t-test, t-statistics = 8.563, p =9.19  10−8, n = 10). In panels c, e the shaded regions indicate the standard deviation across 10 network realizations.

Extended Data Fig. 3 Realistic properties of odor receptor responses.

a: Covariance of single odor receptor responses, computed from the Hallem-Carlson dataset26, sorted according to the response variances. b: Histogram of off-diagonal terms in the covariance matrix in a (in red), compared to a shuffle distribution (blue) obtained by shuffling the responses to different odorants for a given odor receptor. c: Mean of off-diagonal elements of the data covariance matrix (red dashed line), compared to the histogram of the same mean for the shuffled responses as in b (blue). The mean of the original data is significantly larger than the mean of the shuffle distribution (permutation test, p < 10−4). d: Geometrical representation of tuning vectors that are aligned (yellow) versus not aligned (black) with principal components (gray), corresponding to clustered and distributed compression layer representations, respectively. e: Dimension expansion dim(m)/ dim(x) at the expansion layer plotted against the in-degree of expansion layer neurons K. f: Same as e, but showing the fraction of errors on a random classification task instead of the dimension. g: Same as e, right, but showing the noise at the expansion layer instead of the dimension. In panels e-g, the solid lines and shaded areas indicate the mean and standard error of the mean across network realizations, respectively. Network parameters: N = 1000, M = 2000, Nc = D = P = 50, p = 1, f = 0.1, and σ = 0.1.

Extended Data Fig. 4 Effect of architectural parameters on the effectiveness of Hebbian plasticity.

a: Dependence of the network performance on Nc. Notice that performance saturates for relatively large values of Nc. b-d: The non-monotonic behavior of the network performance with L is robust to changes in Nc (b), N (c) and M (d). The optimal L moderately increases with N and it seems to start saturating for N > 500. e: Left: schematics of the setup in which compression weights are learned with Hebbian plasticity. Right: resulting mean squared overlaps between the rows of the compression matrix and the principal components, as a function of PC index. f: Same as e, but when compression weights are learned using Hebbian and anti-Hebbian learning rules in the presence of recurrent inhibition. We used the learning rule proposed in35 (see their Eq. (18)) to learn the compression weights. This learning scheme updates both the feedforward (excitatory/inhibitory) and the recurrent (inhibitory only) weights to introduce competition among compression layer units, enabling the extraction of sub-leading PCs. Notice that the decay is slower than without recurrent inhibition, indicating that several PCs are estimated considerably better, especially for large L. Unless otherwise stated, parameters were N = 500, Nc = 250, M = 5000, f = 0.1, D = P = 50, σ = 0.5, p = 0.1.

Extended Data Fig. 5 Learning a forward model of a two-joint arm.

a: Performance on the forward model task is non-monotonic with the pontine in-degree L. We plot the MSE on the forward model task as a function of L for the network with and without feedback from DCN. The best L is of the same order as we found for the classification task in Fig. 6a. We set σ = 1, while all the other parameters are the same as in Fig. 6e. The solid lines and shaded areas indicate the mean and standard deviation of the MSE across network realizations, respectively. b: DCN feedback leads to higher overlap of compression weights with signal principal components. We define the overlap of the weights onto unit i of the compression layer with the jth PC as \({{{{\rm{overlap}}}}}_{{{{\rm{ij}}}}}=\mathop{\sum }\nolimits_{k = 1}^{N}{G}_{ik}{A}_{kj}\), where G is the compression matrix learned without (left) or with (right) the feedback from DCN, while A is the embedding matrix of the task-relevant components (blue) or task-irrelevant components (red). The violin plot shows the mean and distribution of the overlaps across compression layer units. We set σ = 1.8 and L = 50, while all the other parameters are the same as in Fig. 6e. In the violin plots, the whiskers indicate the entire data range, and the horizontal line indicates the median of the distribution. c: Performance on the forward model task while the compression weights are adjusted using our modified version of Oja’s rule in the presence of feedback from DCN, for two different levels of input noise and two target dimensions. All the other parameters are the same as in Fig. 6e.

Extended Data Fig. 6 Dimension and noise contributions to local decorrelation performance.

a, b: Dimension (a) and noise (b) contributions to the performance shown in Fig. 8b, using the same parameters. c, d: Dimension (c) and noise (d) contributions to the performance shown in Fig. 8c, using the same parameters. e-g: Dimension (e) and noise (f) contributions to the performance (g), for the antennal lobe architecture, as a function of the in-degree of Kenyon cells K. Input was generated using a clustered representation. The green dashed line indicates the value obtained with optimal compression. The parameters were chosen to be consistent with the insect olfactory system anatomy, that is D = Nc = 50, N = 1000, M = 2000, p = 1, f = 0.1, σ = 1, P = 100. Note that when K ≥ 8, the local decorrelation strategy requires more synapses than the optimal compression one, for which K = 7 and L = 20. h, i: Dimension (h) and noise (i) contributions to the performance shown in Fig. 8d, using the same parameters. For all panels, the shaded areas indicate the standard deviation across network realizations.

Extended Data Fig. 7 Effect of nonlinearities at the compression layer.

To achieve a performance with nonlinear compression layer units comparable to that of linear units, we set Nc = 250. To maximize the dimension of the compression layer after the nonlinearity, we also introduced a random rotation of the optimal compression matrix (see Methods 5). a: Dimension of the compression layer representation for linear versus nonlinear (ReLU) compression. For ReLU compression, the nonlinearity is applied after random (left), PC-aligned (center), and whitening compression (right). b: Same as a, but showing the noise strength at the compression layer Δc. c: Same as a, but showing the fraction of errors in the random classification task. In panels a-c, the box boundary extends from the first to the third quartile of the data. The whiskers extend from the box by 1.5 times the inter-quartile range. The horizontal line indicates the median. d: Fraction of errors over training when the compression weights are trained using gradient descent and the compression layer units are nonlinear (ReLU). For comparison, the horizontal dashed lines indicate the performance of networks with linear compression layer units. The solid lines indicate the mean over 10 network realizations and the shading indicates the standard deviation across network realizations. e: Performance at convergence for the same networks as in d. For all panels, parameters were N = D = P = 500, Nc = 250, M = 2000, f = 0.1, fc = 0.3, and σ = 0.1.

Extended Data Fig. 8 Expansion layer dimension and noise strength depend on compression layer dimension and noise strength.

a: Dimension of the expansion layer representation as a function of the compression layer one. The compression layer representation was distributed, and its dimension was varied by changing p between 0 and 1. b: Noise strength Δm at the expansion layer as a function of the noise strength at the compression layer. Noise was additive, Gaussian, and isotropic at the compression layer, with standard deviation varying from 0 to 0.1. In both panels, solid lines show the theoretical result and dots are simulation results, averaged over 10 network realizations. Standard deviation of numerical simulations is not visible because it is smaller than the size of the marker. Parameters: Nc = 100, M = 1000, f = 0.1.

Supplementary information

Supplementary Information

Supplementary Modeling Note.

Reporting Summary

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Muscinelli, S.P., Wagner, M.J. & Litwin-Kumar, A. Optimal routing to cerebellum-like structures. Nat Neurosci 26, 1630–1641 (2023). https://doi.org/10.1038/s41593-023-01403-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41593-023-01403-7

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing