Inversion dynamics of class manifolds in deep learning reveals tradeoffs underlying generalization

Ciceri, Simone; Cassani, Lorenzo; Osella, Matteo; Rotondo, Pietro; Valle, Filippo; Gherardi, Marco

doi:10.1038/s42256-023-00772-9

Article
Published: 08 January 2024

Inversion dynamics of class manifolds in deep learning reveals tradeoffs underlying generalization

Nature Machine Intelligence volume 6, pages 40–47 (2024)Cite this article

1508 Accesses
1 Citations
2 Altmetric
Metrics details

Subjects

Abstract

To achieve near-zero training error in a classification problem, the layers of a feed-forward network have to disentangle the manifolds of data points with different labels to facilitate the discrimination. However, excessive class separation can lead to overfitting because good generalization requires learning invariant features, which involve some level of entanglement. We report on numerical experiments showing how the optimization dynamics finds representations that balance these opposing tendencies with a non-monotonic trend. After a fast segregation phase, a slower rearrangement (conserved across datasets and architectures) increases the class entanglement. The training error at the inversion is stable under subsampling and across network initializations and optimizers, which characterizes it as a property solely of the data structure and (very weakly) of the architecture. The inversion is the manifestation of tradeoffs elicited by well-defined and maximally stable elements of the training set called ‘stragglers’, which are particularly influential for generalization.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Non-monotonic learning dynamics.**

**Fig. 2: Stragglers shape the dynamics and influence generalization.**

**Fig. 3: Stragglers across datasets and architectures.**

Separability and geometry of object manifolds in deep neural networks

Article Open access 06 February 2020

Efficient neural codes naturally emerge through gradient descent learning

Article Open access 29 December 2022

Complexity control by gradient descent in deep networks

Article Open access 24 February 2020

Data availability

The datasets analysed during the current study are available in public repositories; links are in the corresponding publications^54,55,56,57.

Code availability

The code produced and used in the current study⁵⁹ is available on GitHub under GNU General Public License v.3 (GPL-3.0) at https://github.com/marco-gherardi/stragglers.

References

Pacelli, R. et al. A statistical mechanics framework for Bayesian deep neural networks beyond the infinite-width limit. Nat. Mach. Intell. https://doi.org/10.1038/s42256-023-00767-6 (2023).
Wakhloo, A. J., Sussman, T. J. & Chung, S. Linear classification of neural manifolds with correlated variability. Phys. Rev. Lett. 131, 027301 (2023).
Article MathSciNet Google Scholar
Cagnetta, F., Petrini, L., Tomasini, U. M., Favero, A. & Wyart, M. How deep neural networks learn compositional data: the random hierarchy model. Preprint at arXiv https://doi.org/10.48550/arXiv.2307.02129 (2023).
Feng, Y., Zhang, W. & Tu, Y. Activity–weight duality in feed-forward neural networks reveals two co-determinants for generalization. Nat. Mach. Intell. 5, 908–918 (2023).
Article Google Scholar
Baldassi, C. et al. Learning through atypical phase transitions in overparameterized neural networks. Phys. Rev. E 106, 014116 (2022).
Article MathSciNet Google Scholar
Ingrosso, A. & Goldt, S. Data-driven emergence of convolutional structure in neural networks. Proc. Natl Acad. Sci. USA 119, e2201854119 (2022).
Article Google Scholar
Advani, M. S., Saxe, A. M. & Sompolinsky, H. High-dimensional dynamics of generalization error in neural networks. Neural Netw. 132, 428–446 (2020).
Article Google Scholar
Goldt, S., Mézard, M., Krzakala, F. & Zdeborová, L. Modeling the influence of data structure on learning in neural networks: the hidden manifold model. Phys. Rev. X 10, 041044 (2020).
Google Scholar
Mézard, M. Mean-field message-passing equations in the hopfield model and its generalizations. Phys. Rev. E 95, 022117 (2017).
Article MathSciNet Google Scholar
Neyshabur, B., Bhojanapalli, S., McAllester, D. & Srebro, N. Exploring generalization in deep learning. In Proc. 31st International Conference on Neural Information Processing Systems (eds von Luxburg, U. et al.) 5949–5958 (Curran Associates, 2017).
Zhang, C., Bengio, S., Hardt, M., Recht, B. & Vinyals, O. Understanding deep learning requires rethinking generalization. Preprint at arXiv https://doi.org/10.48550/arXiv.1611.03530 (2016).
Martin, C. H. & Mahoney, M. W. Rethinking generalization requires revisiting old ideas: statistical mechanics approaches and complex learning behavior. Preprint at arXiv https://doi.org/10.48550/arXiv.1710.09553 (2017).
Khosla, P. et al. Supervised contrastive learning. In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2020 (eds Larochelle, H. et al.) 18661–18673 (Curran Associates, 2020); https://proceedings.neurips.cc/paper/2020/file/d89a66c7c80a29b1bdbab0f2a1a94af8-Paper.pdf
Kamnitsas, K. et al. Semi-supervised learning via compact latent space clustering. In Proc. 35th International Conference on Machine Learning (eds Dy, J. & Krause, A.) 2459–2468 (PMLR, 2018); https://proceedings.mlr.press/v80/kamnitsas18a.html
Hoffer, E. & Ailon, N. in Similarity-Based Pattern Recognition (eds Feragen, A. et al.) 84–92 (Springer, 2015).
Salakhutdinov, R. & Hinton, G. Learning a nonlinear embedding by preserving class neighbourhood structure. In Proc. 11th International Conference on Artificial Intelligence and Statistics (eds Meila, M. & Shen, X.) 412–419 (PMLR, 2007); https://proceedings.mlr.press/v2/salakhutdinov07a.html
Chopra, S., Hadsell, R. & LeCun, Y. Learning a similarity metric discriminatively, with application to face verification. In Proc. 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (eds Schmid, C. et al.) 539–546 (IEEE, 2005).
Schilling, A., Maier, A., Gerum, R., Metzner, C. & Krauss, P. Quantifying the separability of data classes in neural networks. Neural Netw. 139, 278–293 (2021).
Article Google Scholar
Chung, S., Lee, D. D. & Sompolinsky, H. Classification and geometry of general perceptual manifolds. Phys. Rev. X 8, 031003 (2018).
Google Scholar
Russo, A. A. et al. Motor cortex embeds muscle-like commands in an untangled population response. Neuron 97, 953–966 (2018).
Article Google Scholar
Kadmon, J. & Sompolinsky, H. Optimal architectures in a solvable model of deep networks. In Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016 (eds Lee, D. et al.) 4781–4789 (Curran Associates, 2016); https://proceedings.neurips.cc/paper/2016/file/0fe473396242072e84af286632d3f0ff-Paper.pdf
Pagan, M., Urban, L. S., Wohl, M. P. & Rust, N. C. Signals in inferotemporal and perirhinal cortex suggest an untangling of visual target information. Nat. Neurosci. 16, 1132–1139 (2013).
Article Google Scholar
DiCarlo, J. J. & Cox, D. D. Untangling invariant object recognition. Trends Cogn. Sci. 11, 333–341 (2007).
Article Google Scholar
Farrell, M., Recanatesi, S., Moore, T., Lajoie, G. & Shea-Brown, E. Gradient-based learning drives robust representations in recurrent neural networks by balancing compression and expansion. Nat. Mach. Intell. 4, 564–573 (2022).
Article Google Scholar
Cohen, U., Chung, S., Lee, D. D. & Sompolinsky, H. Separability and geometry of object manifolds in deep neural networks. Nature Commun. 11, 746 (2020).
Article Google Scholar
Ansuini, A., Laio, A., Macke, J. & Zoccolan, D. Intrinsic dimension of data representations in deep neural networks. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019 (eds Wallach, H. et al.) 6109–6119 (Curran Associates, 2020); https://proceedings.neurips.cc/paper_files/paper/2019/file/cfcce0621b49c983991ead4c3d4d3b6b-Paper.pdf
Farrell, M., Recanatesi, S., Lajoie, G. & Shea-Brown, E. Recurrent neural networks learn robust representations by dynamically balancing compression and expansion. Poster presented at Real Neurons & Hidden Units: Future Directions at the Intersection of Neuroscience and Artificial Intelligence @ NeurIPS 2019 (2019); https://openreview.net/forum?id=BylmV7tI8S
Recanatesi, S. et al. Dimensionality compression and expansion in deep neural networks. Preprint at arXiv https://doi.org/10.48550/arXiv.1906.00443 (2019).
Poole, B., Lahiri, S., Raghu, M., Sohl-Dickstein, J. & Ganguli, S. Exponential expressivity in deep neural networks through transient chaos. In Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016 (eds Lee, D. et al.) 3360–3368 (Curran Associates, 2016); https://proceedings.neurips.cc/paper/2016/file/148510031349642de5ca0c544f31b2ef-Paper.pdf
Frosst, N., Papernot, N. & Hinton, G. Analyzing and improving representations with the soft nearest neighbor loss. In Proc. 36th International Conference on Machine Learning (eds Chaudhuri, K. & Salakhutdinov, R.) 2012–2020 (PMLR, 2019); https://proceedings.mlr.press/v97/frosst19a.html
Achille, A., Paolini, G. & Soatto, S. Where is the information in a deep neural network? Preprint at arXiv https://doi.org/10.48550/arXiv.1905.12213 (2019).
Achille, A. & Soatto, S. Emergence of invariance and disentanglement in deep representations. J. Mach. Learn. Res. 19, 1947–1980 (2018).
MathSciNet Google Scholar
Shwartz-Ziv, R. & Tishby, N. Opening the black box of deep neural networks via information. Preprint at arXiv https://doi.org/10.48550/arXiv.1703.00810 (2017).
Bengio, Y. in Statistical Language and Speech Processing (eds Dediu, A.-H. et al.) 1–37 (Springer, 2013).
Zdeborová, L. Understanding deep learning is also a job for physicists. Nat. Phys. 16, 602–604 (2020).
Article Google Scholar
Gherardi, M. Solvable model for the linear separability of structured data. Entropy 23, 305 (2021).
Article MathSciNet Google Scholar
Mézard, M. Spin glass theory and its new challenge: structured disorder. Indian J. Phys. https://doi.org/10.1007/s12648-023-03029-8 (2023).
Rotondo, P., Lagomarsino, M. C. & Gherardi, M. Counting the learnable functions of geometrically structured data. Phys. Rev. Res. 2, 023169 (2020).
Article Google Scholar
Belkin, M., Hsu, D., Ma, S. & Mandal, S. Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proc. Natl Acad. Sci. USA 116, 15849–15854 (2019).
Article MathSciNet Google Scholar
Nakkiran, P. et al. Deep double descent: where bigger models and more data hurt. J. Stat. Mech. 2021, 124003 (2021).
Article MathSciNet Google Scholar
Arpit, D. et al. A closer look at memorization in deep networks. In Proc. 34th International Conference on Machine Learning (eds Precup, D. & Teh, Y. W.) 233–242 (PMLR, 2017).
Saxe, A. M., Mcclelland, J. L. & Ganguli, S. Exact solutions to the nonlinear dynamics of learning in deep linear neural network. Preprint at arXiv https://doi.org/10.48550/arXiv.1312.6120 (2014).
Erba, V., Gherardi, M. & Rotondo, P. Intrinsic dimension estimation for locally undersampled data. Sci. Rep. 9, 17133 (2019).
Article Google Scholar
Facco, E., d’Errico, M., Rodriguez, A. & Laio, A. Estimating the intrinsic dimension of datasets by a minimal neighborhood information. Sci. Rep. 7, 12140 (2017).
Article Google Scholar
Li, C., Farkhoor, H., Liu, R. & Yosinski, J. Measuring the intrinsic dimension of objective landscapes. Preprint at arXiv https://doi.org/10.48550/arXiv.1804.08838 (2018).
Rotondo, P., Pastore, M. & Gherardi, M. Beyond the storage capacity: data-driven satisfiability transition. Phys. Rev. Lett. 125, 120601 (2020).
Article Google Scholar
Pastore, M., Rotondo, P., Erba, V. & Gherardi, M. Statistical learning theory of structured data. Phys. Rev. E 102, 032119 (2020).
Article MathSciNet Google Scholar
Gherardi, M. & Rotondo, P. Measuring logic complexity can guide pattern discovery in empirical systems. Complexity 21, 397–408 (2016).
Article MathSciNet Google Scholar
Geiger, M., Spigler, S., Jacot, A. & Wyart, M. Disentangling feature and lazy training in deep neural networks. J. Stat. Mech. 2020, 113301 (2020).
Article MathSciNet Google Scholar
Mazzolini, A., Gherardi, M., Caselle, M., Cosentino Lagomarsino, M. & Osella, M. Statistics of shared components in complex component systems. Phys. Rev. X 8, 021023 (2018).
Google Scholar
Mazzolini, A. et al. Zipf and heaps laws from dependency structures in component systems. Phys. Rev. E 98, 012315 (2018).
Article Google Scholar
Sorscher, B., Geirhos, R., Shekhar, S., Ganguli, S. & Morcos, A. Beyond neural scaling laws: beating power law scaling via data pruning. Adv. Neural Inf. Process. Syst. 35, 19523–19536 (2022).
Google Scholar
Bengio, Y., Louradour, J., Collobert, R. & Weston, J. Curriculum learning. In Proc. 26th Annual International Conference on Machine Learning (eds. Bottou, L. & Littman, M.) 41–48 (ACM, 2009).
LeCun, Y. & Cortes, C. MNIST handwritten digit database (2010); http://yann.lecun.com/exdb/mnist/
Clanuwat, T. et al. Deep learning for classical Japanese literature. Preprint at arXiv https://doi.org/10.20676/00000341 (2018).
Xiao, H., Rasul, K. & Vollgraf, R. Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms Preprint at arXiv https://doi.org/10.48550/arXiv.1708.07747 (2017).
Krizhevsky, A. & Hinton, G. Learning Multiple Layers of Features from Tiny Images. Tech. Rep. 0 (Univ. of Toronto, 2009).
Cardy, J. Finite-Size Scaling (North-Holland, 1988).
Gherardi, M. Inversion dynamics of class manifolds in deep learning reveals tradeoffs underlying generalisation. Zenodo https://doi.org/10.5281/zenodo.8355859 (2023).

Download references

Acknowledgements

P.R. acknowledges funding from the Fellini programme under the H2020-MSCA-COFUND action, grant no. 754496, INFN (IT).

Author information

Authors and Affiliations

Università degli Studi di Milano, Milan, Italy
Simone Ciceri, Lorenzo Cassani & Marco Gherardi
Università degli Studi di Torino and INFN, Sezione di Torino, Turin, Italy
Matteo Osella & Filippo Valle
Istituto Nazionale di Fisica Nucleare — Sezione di Milano, Milan, Italy
Pietro Rotondo & Marco Gherardi

Authors

Simone Ciceri
View author publications
You can also search for this author in PubMed Google Scholar
Lorenzo Cassani
View author publications
You can also search for this author in PubMed Google Scholar
Matteo Osella
View author publications
You can also search for this author in PubMed Google Scholar
Pietro Rotondo
View author publications
You can also search for this author in PubMed Google Scholar
Filippo Valle
View author publications
You can also search for this author in PubMed Google Scholar
Marco Gherardi
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

S.C. and M.G. discovered the stragglers. M.G., M.O. and P.R. conceived and designed the experiments. L.C., S.C., M.G. and F.V. performed the experiments. All authors analysed the results and wrote the paper. M.G. and M.O. supervised the analysis. M.G. coordinated the project.

Corresponding author

Correspondence to Marco Gherardi.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Machine Intelligence thanks Christopher Kanan and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Figs. 1–6.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ciceri, S., Cassani, L., Osella, M. et al. Inversion dynamics of class manifolds in deep learning reveals tradeoffs underlying generalization. Nat Mach Intell 6, 40–47 (2024). https://doi.org/10.1038/s42256-023-00772-9

Download citation

Received: 29 October 2022
Accepted: 16 November 2023
Published: 08 January 2024
Issue Date: January 2024
DOI: https://doi.org/10.1038/s42256-023-00772-9