Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Letter
  • Published:

General conditions for predictivity in learning theory

Abstract

Developing theoretical foundations for learning is a key step towards understanding intelligence. ‘Learning from examples’ is a paradigm in which systems (natural or artificial) learn a functional relationship from a training set of examples. Within this paradigm, a learning algorithm is a map from the space of training sets to the hypothesis space of possible functional solutions. A central question for the theory is to determine conditions under which a learning algorithm will generalize from its finite training set to novel examples. A milestone in learning theory1,2,3,4,5 was a characterization of conditions on the hypothesis space that ensure generalization for the natural class of empirical risk minimization (ERM) learning algorithms that are based on minimizing the error on the training set. Here we provide conditions for generalization in terms of a precise stability property of the learning process: when the training set is perturbed by deleting one example, the learned hypothesis does not change much. This stability property stipulates conditions on the learning map rather than on the hypothesis space, subsumes the classical theory for ERM algorithms, and is applicable to more general algorithms. The surprising connection between stability and predictivity has implications for the foundations of learning theory and for the design of novel algorithms, and provides insights into problems as diverse as language learning and inverse problems in physics and engineering.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Example of an empirical minimizer with large expected error.
Figure 2: Measuring CV100 stability in a simple case.

Similar content being viewed by others

References

  1. Vapnik, V. & Chervonenkis, A. Y. The necessary and sufficient conditions for consistency in the empirical risk minimization method. Pattern Recognit. Image Anal. 1, 283–305 (1991)

    Google Scholar 

  2. Vapnik, V. N. Statistical Learning Theory (Wiley, New York, 1998)

    MATH  Google Scholar 

  3. Alon, N., Ben-David, S., Cesa-Bianchi, N. & Haussler, D. Scale-sensitive dimensions, uniform convergence, and learnability. J. Assoc. Comp. Mach. 44, 615–631 (1997)

    Article  MathSciNet  Google Scholar 

  4. Dudley, R. M. Uniform Central Limit Theorems (Cambridge studies in advanced mathematics, Cambridge Univ. Press, 1999)

    Book  Google Scholar 

  5. Dudley, R., Gine, E. & Zinn, J. Uniform and universal Glivenko-Cantelli classes. J. Theor. Prob. 4, 485–510 (1991)

    Article  MathSciNet  Google Scholar 

  6. Poggio, T. & Smale, S. The mathematics of learning: Dealing with data. Not. Am. Math. Soc. 50, 537–544 (2003)

    MathSciNet  MATH  Google Scholar 

  7. Cucker, F. & Smale, S. On the mathematical foundations of learning. Bull. Am. Math. Soc. 39, 1–49 (2001)

    Article  MathSciNet  Google Scholar 

  8. Wahba, G. Spline Models for Observational Data (Series in Applied Mathematics, Vol. 59, SIAM, Philadelphia, 1990)

    Book  Google Scholar 

  9. Breiman, L. Bagging predictors. Machine Learn. 24, 123–140 (1996)

    MATH  Google Scholar 

  10. Hastie, T., Tibshirani, R. & Friedman, J. The Elements of Statistical Learning (Springer series in statistics, Springer, Basel, 2001)

    Book  Google Scholar 

  11. Freund, Y. & Schapire, R. A decision-theoretic generalization of online learning and an application to boosting. J. Comput. Syst. Sci. 55, 119–139 (1997)

    Article  Google Scholar 

  12. Fix, E. & Hodges, J. Discriminatory analysis, nonparametric discrimination: consistency properties? (Techn. rep. 4, Project no. 21-49-004, USAF School of Aviation Medicine, Randolph Field, TX, 1951).

  13. Bottou, L. & Vapnik, V. Local learning algorithms. Neural Comput. 4(6), 888–900 (1992)

    Article  Google Scholar 

  14. Devroye, L. & Wagner, T. Distribution-free performance bounds for potential function rules. IEEE Trans. Inform. Theory 25, 601–604 (1979)

    Article  MathSciNet  Google Scholar 

  15. Bousquet, O. & Elisseeff, A. Stability and generalization. J Machine Learn. Res. 2, 499–526 (2001)

    MathSciNet  MATH  Google Scholar 

  16. Mukherjee, S., Niyogi, P., Poggio, T. & Rifkin, R. Statistical learning: stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization (CBCL Paper 223, Massachusetts Institute of Technology, 2002, revised 2003).

  17. Kutin, S. & Niyogi, P. in Proceedings of Uncertainty in AI (eds Daruich, A. & Friedman, N.) (Morgan Kaufmann, Univ. Alberta, Edmonton, 2002)

    Google Scholar 

  18. Stone, C. The dimensionality reduction principle for generalized additive models. Ann. Stat. 14, 590–606 (1986)

    Article  MathSciNet  Google Scholar 

  19. Donocho, D. & Johnstone, I. Projection-based approximation and a duality with kernel methods. Ann. Stat. 17, 58–106 (1989)

    Article  MathSciNet  Google Scholar 

  20. Engl, H., Hanke, M. & Neubauer, A. Regularization of Inverse Problems (Kluwer Academic, Dordrecht, 1996)

    Book  Google Scholar 

  21. Evgeniou, T., Pontil, M. & Elisseeff, A. Leave one out error, stability, and generalization of voting combinations of classifiers. Machine Learn. (in the press)

  22. Pouget, A. & Sejnowski, T. J. Spatial transformations in the parietal cortex using basis functions. J. Cogn. Neurosci. 9, 222–237 (1997)

    Article  CAS  Google Scholar 

  23. Poggio, T. A theory of how the brain might work. Cold Spring Harbor Symp. Quant. Biol. 55, 899–910 (1990)

    Article  CAS  Google Scholar 

  24. Chomsky, N. Lectures on Government and Binding (Foris, Dordrecht, 1995)

    Google Scholar 

  25. Zhou, D. The covering number in learning theory. J. Complex. 18, 739–767 (2002)

    Article  MathSciNet  Google Scholar 

  26. Tikhonov, A. N. & Arsenin, V. Y. Solutions of Ill-posed Problems (Winston, Washington DC, 1977)

    MATH  Google Scholar 

  27. Devroye, L. & Wagner, T. Distribution-free performance bounds for potential function rules. IEEE Trans. Inform. Theory 25, 601–604 (1979)

    Article  MathSciNet  Google Scholar 

  28. Kearns, M. & Ron, D. Algorithmic stability and sanity-check bounds for leave-one-out cross-validation. Neural Comput. 11, 1427–1453 (1999)

    Article  CAS  Google Scholar 

  29. Evgeniou, T., Pontil, M. & Poggio, T. Regularization networks and support vector machines. Adv. Comput. Math. 13, 1–50 (2000)

    Article  MathSciNet  Google Scholar 

  30. Valiant, L. A theory of the learnable. Commun. Assoc. Comp. Mach. 27, 1134–1142 (1984)

    MATH  Google Scholar 

Download references

Acknowledgements

We thank D. Panchenko, R. Dudley, S. Mendelson, A. Rakhlin, F. Cucker, D. Zhou, A. Verri, T. Evgeniou, M. Pontil, P. Tamayo, M. Poggio, M. Calder, C. Koch, N. Cesa-Bianchi, A. Elisseeff, G. Lugosi and especially S. Smale for several insightful and helpful comments. This research was sponsored by grants from the Office of Naval Research, DARPA and National Science Foundation. Additional support was provided by the Eastman Kodak Company, Daimler-Chrysler, Honda Research Institute, NEC Fund, NTT, Siemens Corporate Research, Toyota, Sony and the McDermott chair (T.P.).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tomaso Poggio.

Ethics declarations

Competing interests

The authors declare that they have no competing financial interests.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Poggio, T., Rifkin, R., Mukherjee, S. et al. General conditions for predictivity in learning theory. Nature 428, 419–422 (2004). https://doi.org/10.1038/nature02341

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nature02341

This article is cited by

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing