General conditions for predictivity in learning theory

Poggio, Tomaso; Rifkin, Ryan; Mukherjee, Sayan; Niyogi, Partha

doi:10.1038/nature02341

Letter
Published: 25 March 2004

General conditions for predictivity in learning theory

Tomaso Poggio¹,
Ryan Rifkin^1,4,
Sayan Mukherjee^1,3 &
…
Partha Niyogi²

Nature volume 428, pages 419–422 (2004)Cite this article

2292 Accesses
174 Citations
4 Altmetric
Metrics details

Abstract

Developing theoretical foundations for learning is a key step towards understanding intelligence. ‘Learning from examples’ is a paradigm in which systems (natural or artificial) learn a functional relationship from a training set of examples. Within this paradigm, a learning algorithm is a map from the space of training sets to the hypothesis space of possible functional solutions. A central question for the theory is to determine conditions under which a learning algorithm will generalize from its finite training set to novel examples. A milestone in learning theory^1,2,3,4,5 was a characterization of conditions on the hypothesis space that ensure generalization for the natural class of empirical risk minimization (ERM) learning algorithms that are based on minimizing the error on the training set. Here we provide conditions for generalization in terms of a precise stability property of the learning process: when the training set is perturbed by deleting one example, the learned hypothesis does not change much. This stability property stipulates conditions on the learning map rather than on the hypothesis space, subsumes the classical theory for ERM algorithms, and is applicable to more general algorithms. The surprising connection between stability and predictivity has implications for the foundations of learning theory and for the design of novel algorithms, and provides insights into problems as diverse as language learning and inverse problems in physics and engineering.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Example of an empirical minimizer with large expected error.**

**Figure 2: Measuring CV₁₀₀ stability in a simple case.**

Fundamental limits to learning closed-form mathematical models from data

Article Open access 24 February 2023

Oscar Fajardo-Fontiveros, Ignasi Reichardt, … Roger Guimerà

The Eighty Five Percent Rule for optimal learning

Article Open access 05 November 2019

Robert C. Wilson, Amitai Shenhav, … Jonathan D. Cohen

Tutorial: a beginner’s guide to building a representative model of dynamical systems using the adjoint method

Article Open access 15 April 2024

Leon Lettermann, Alejandro Jurado, … Sebastian Herzog

References

Vapnik, V. & Chervonenkis, A. Y. The necessary and sufficient conditions for consistency in the empirical risk minimization method. Pattern Recognit. Image Anal. 1, 283–305 (1991)
Google Scholar
Vapnik, V. N. Statistical Learning Theory (Wiley, New York, 1998)
MATH Google Scholar
Alon, N., Ben-David, S., Cesa-Bianchi, N. & Haussler, D. Scale-sensitive dimensions, uniform convergence, and learnability. J. Assoc. Comp. Mach. 44, 615–631 (1997)
Article MathSciNet Google Scholar
Dudley, R. M. Uniform Central Limit Theorems (Cambridge studies in advanced mathematics, Cambridge Univ. Press, 1999)
Book Google Scholar
Dudley, R., Gine, E. & Zinn, J. Uniform and universal Glivenko-Cantelli classes. J. Theor. Prob. 4, 485–510 (1991)
Article MathSciNet Google Scholar
Poggio, T. & Smale, S. The mathematics of learning: Dealing with data. Not. Am. Math. Soc. 50, 537–544 (2003)
MathSciNet MATH Google Scholar
Cucker, F. & Smale, S. On the mathematical foundations of learning. Bull. Am. Math. Soc. 39, 1–49 (2001)
Article MathSciNet Google Scholar
Wahba, G. Spline Models for Observational Data (Series in Applied Mathematics, Vol. 59, SIAM, Philadelphia, 1990)
Book Google Scholar
Breiman, L. Bagging predictors. Machine Learn. 24, 123–140 (1996)
MATH Google Scholar
Hastie, T., Tibshirani, R. & Friedman, J. The Elements of Statistical Learning (Springer series in statistics, Springer, Basel, 2001)
Book Google Scholar
Freund, Y. & Schapire, R. A decision-theoretic generalization of online learning and an application to boosting. J. Comput. Syst. Sci. 55, 119–139 (1997)
Article Google Scholar
Fix, E. & Hodges, J. Discriminatory analysis, nonparametric discrimination: consistency properties? (Techn. rep. 4, Project no. 21-49-004, USAF School of Aviation Medicine, Randolph Field, TX, 1951).
Bottou, L. & Vapnik, V. Local learning algorithms. Neural Comput. 4(6), 888–900 (1992)
Article Google Scholar
Devroye, L. & Wagner, T. Distribution-free performance bounds for potential function rules. IEEE Trans. Inform. Theory 25, 601–604 (1979)
Article MathSciNet Google Scholar
Bousquet, O. & Elisseeff, A. Stability and generalization. J Machine Learn. Res. 2, 499–526 (2001)
MathSciNet MATH Google Scholar
Mukherjee, S., Niyogi, P., Poggio, T. & Rifkin, R. Statistical learning: stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization (CBCL Paper 223, Massachusetts Institute of Technology, 2002, revised 2003).
Kutin, S. & Niyogi, P. in Proceedings of Uncertainty in AI (eds Daruich, A. & Friedman, N.) (Morgan Kaufmann, Univ. Alberta, Edmonton, 2002)
Google Scholar
Stone, C. The dimensionality reduction principle for generalized additive models. Ann. Stat. 14, 590–606 (1986)
Article MathSciNet Google Scholar
Donocho, D. & Johnstone, I. Projection-based approximation and a duality with kernel methods. Ann. Stat. 17, 58–106 (1989)
Article MathSciNet Google Scholar
Engl, H., Hanke, M. & Neubauer, A. Regularization of Inverse Problems (Kluwer Academic, Dordrecht, 1996)
Book Google Scholar
Evgeniou, T., Pontil, M. & Elisseeff, A. Leave one out error, stability, and generalization of voting combinations of classifiers. Machine Learn. (in the press)
Pouget, A. & Sejnowski, T. J. Spatial transformations in the parietal cortex using basis functions. J. Cogn. Neurosci. 9, 222–237 (1997)
Article CAS Google Scholar
Poggio, T. A theory of how the brain might work. Cold Spring Harbor Symp. Quant. Biol. 55, 899–910 (1990)
Article CAS Google Scholar
Chomsky, N. Lectures on Government and Binding (Foris, Dordrecht, 1995)
Google Scholar
Zhou, D. The covering number in learning theory. J. Complex. 18, 739–767 (2002)
Article MathSciNet Google Scholar
Tikhonov, A. N. & Arsenin, V. Y. Solutions of Ill-posed Problems (Winston, Washington DC, 1977)
MATH Google Scholar
Devroye, L. & Wagner, T. Distribution-free performance bounds for potential function rules. IEEE Trans. Inform. Theory 25, 601–604 (1979)
Article MathSciNet Google Scholar
Kearns, M. & Ron, D. Algorithmic stability and sanity-check bounds for leave-one-out cross-validation. Neural Comput. 11, 1427–1453 (1999)
Article CAS Google Scholar
Evgeniou, T., Pontil, M. & Poggio, T. Regularization networks and support vector machines. Adv. Comput. Math. 13, 1–50 (2000)
Article MathSciNet Google Scholar
Valiant, L. A theory of the learnable. Commun. Assoc. Comp. Mach. 27, 1134–1142 (1984)
MATH Google Scholar

Download references

Acknowledgements

We thank D. Panchenko, R. Dudley, S. Mendelson, A. Rakhlin, F. Cucker, D. Zhou, A. Verri, T. Evgeniou, M. Pontil, P. Tamayo, M. Poggio, M. Calder, C. Koch, N. Cesa-Bianchi, A. Elisseeff, G. Lugosi and especially S. Smale for several insightful and helpful comments. This research was sponsored by grants from the Office of Naval Research, DARPA and National Science Foundation. Additional support was provided by the Eastman Kodak Company, Daimler-Chrysler, Honda Research Institute, NEC Fund, NTT, Siemens Corporate Research, Toyota, Sony and the McDermott chair (T.P.).

Author information

Authors and Affiliations

Center for Biological and Computational Learning, McGovern Institute Computer Science Artificial Intelligence Laboratory, Brain Sciences Department, MIT, Cambridge, Massachusetts, 02139, USA
Tomaso Poggio, Ryan Rifkin & Sayan Mukherjee
Departments of Computer Science and Statistics, University of Chicago, Chicago, Illinois, 60637, USA
Partha Niyogi
Cancer Genomics Group, Center for Genome Research/Whitehead Institute, MIT, Cambridge, Massachusetts, 02139, USA
Sayan Mukherjee
Honda Research Institute USA Inc., Boston, Massachusetts, 02111, USA
Ryan Rifkin

Authors

Tomaso Poggio
View author publications
You can also search for this author in PubMed Google Scholar
Ryan Rifkin
View author publications
You can also search for this author in PubMed Google Scholar
Sayan Mukherjee
View author publications
You can also search for this author in PubMed Google Scholar
Partha Niyogi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tomaso Poggio.

Ethics declarations

Competing interests

The authors declare that they have no competing financial interests.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Poggio, T., Rifkin, R., Mukherjee, S. et al. General conditions for predictivity in learning theory. Nature 428, 419–422 (2004). https://doi.org/10.1038/nature02341

Download citation

Received: 03 November 2003
Accepted: 16 January 2004
Issue Date: 25 March 2004
DOI: https://doi.org/10.1038/nature02341

This article is cited by

Inferring the dynamics of “black-box” systems using a learning machine
- Hong Zhao
Science China Physics, Mechanics & Astronomy (2021)
Goal scoring, coherent loss and applications to machine learning
- Wenzhuo Yang
- Melvyn Sim
- Huan Xu
Mathematical Programming (2020)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

General conditions for predictivity in learning theory

Abstract

Access options

Similar content being viewed by others

Fundamental limits to learning closed-form mathematical models from data

The Eighty Five Percent Rule for optimal learning

Tutorial: a beginner’s guide to building a representative model of dynamical systems using the adjoint method

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Rights and permissions

About this article

Cite this article

This article is cited by

Inferring the dynamics of “black-box” systems using a learning machine

Goal scoring, coherent loss and applications to machine learning

Comments

A biophysical signature of network affiliation and sensory processing in mitral cells

Search

Quick links

Abstract

Access options

Similar content being viewed by others

Fundamental limits to learning closed-form mathematical models from data

The Eighty Five Percent Rule for optimal learning

Tutorial: a beginner’s guide to building a representative model of dynamical systems using the adjoint method

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Inferring the dynamics of “black-box” systems using a learning machine

Goal scoring, coherent loss and applications to machine learning

Comments

Search

Quick links