Abstract
Developing theoretical foundations for learning is a key step towards understanding intelligence. ‘Learning from examples’ is a paradigm in which systems (natural or artificial) learn a functional relationship from a training set of examples. Within this paradigm, a learning algorithm is a map from the space of training sets to the hypothesis space of possible functional solutions. A central question for the theory is to determine conditions under which a learning algorithm will generalize from its finite training set to novel examples. A milestone in learning theory1,2,3,4,5 was a characterization of conditions on the hypothesis space that ensure generalization for the natural class of empirical risk minimization (ERM) learning algorithms that are based on minimizing the error on the training set. Here we provide conditions for generalization in terms of a precise stability property of the learning process: when the training set is perturbed by deleting one example, the learned hypothesis does not change much. This stability property stipulates conditions on the learning map rather than on the hypothesis space, subsumes the classical theory for ERM algorithms, and is applicable to more general algorithms. The surprising connection between stability and predictivity has implications for the foundations of learning theory and for the design of novel algorithms, and provides insights into problems as diverse as language learning and inverse problems in physics and engineering.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 51 print issues and online access
$199.00 per year
only $3.90 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Vapnik, V. & Chervonenkis, A. Y. The necessary and sufficient conditions for consistency in the empirical risk minimization method. Pattern Recognit. Image Anal. 1, 283–305 (1991)
Vapnik, V. N. Statistical Learning Theory (Wiley, New York, 1998)
Alon, N., Ben-David, S., Cesa-Bianchi, N. & Haussler, D. Scale-sensitive dimensions, uniform convergence, and learnability. J. Assoc. Comp. Mach. 44, 615–631 (1997)
Dudley, R. M. Uniform Central Limit Theorems (Cambridge studies in advanced mathematics, Cambridge Univ. Press, 1999)
Dudley, R., Gine, E. & Zinn, J. Uniform and universal Glivenko-Cantelli classes. J. Theor. Prob. 4, 485–510 (1991)
Poggio, T. & Smale, S. The mathematics of learning: Dealing with data. Not. Am. Math. Soc. 50, 537–544 (2003)
Cucker, F. & Smale, S. On the mathematical foundations of learning. Bull. Am. Math. Soc. 39, 1–49 (2001)
Wahba, G. Spline Models for Observational Data (Series in Applied Mathematics, Vol. 59, SIAM, Philadelphia, 1990)
Breiman, L. Bagging predictors. Machine Learn. 24, 123–140 (1996)
Hastie, T., Tibshirani, R. & Friedman, J. The Elements of Statistical Learning (Springer series in statistics, Springer, Basel, 2001)
Freund, Y. & Schapire, R. A decision-theoretic generalization of online learning and an application to boosting. J. Comput. Syst. Sci. 55, 119–139 (1997)
Fix, E. & Hodges, J. Discriminatory analysis, nonparametric discrimination: consistency properties? (Techn. rep. 4, Project no. 21-49-004, USAF School of Aviation Medicine, Randolph Field, TX, 1951).
Bottou, L. & Vapnik, V. Local learning algorithms. Neural Comput. 4(6), 888–900 (1992)
Devroye, L. & Wagner, T. Distribution-free performance bounds for potential function rules. IEEE Trans. Inform. Theory 25, 601–604 (1979)
Bousquet, O. & Elisseeff, A. Stability and generalization. J Machine Learn. Res. 2, 499–526 (2001)
Mukherjee, S., Niyogi, P., Poggio, T. & Rifkin, R. Statistical learning: stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization (CBCL Paper 223, Massachusetts Institute of Technology, 2002, revised 2003).
Kutin, S. & Niyogi, P. in Proceedings of Uncertainty in AI (eds Daruich, A. & Friedman, N.) (Morgan Kaufmann, Univ. Alberta, Edmonton, 2002)
Stone, C. The dimensionality reduction principle for generalized additive models. Ann. Stat. 14, 590–606 (1986)
Donocho, D. & Johnstone, I. Projection-based approximation and a duality with kernel methods. Ann. Stat. 17, 58–106 (1989)
Engl, H., Hanke, M. & Neubauer, A. Regularization of Inverse Problems (Kluwer Academic, Dordrecht, 1996)
Evgeniou, T., Pontil, M. & Elisseeff, A. Leave one out error, stability, and generalization of voting combinations of classifiers. Machine Learn. (in the press)
Pouget, A. & Sejnowski, T. J. Spatial transformations in the parietal cortex using basis functions. J. Cogn. Neurosci. 9, 222–237 (1997)
Poggio, T. A theory of how the brain might work. Cold Spring Harbor Symp. Quant. Biol. 55, 899–910 (1990)
Chomsky, N. Lectures on Government and Binding (Foris, Dordrecht, 1995)
Zhou, D. The covering number in learning theory. J. Complex. 18, 739–767 (2002)
Tikhonov, A. N. & Arsenin, V. Y. Solutions of Ill-posed Problems (Winston, Washington DC, 1977)
Devroye, L. & Wagner, T. Distribution-free performance bounds for potential function rules. IEEE Trans. Inform. Theory 25, 601–604 (1979)
Kearns, M. & Ron, D. Algorithmic stability and sanity-check bounds for leave-one-out cross-validation. Neural Comput. 11, 1427–1453 (1999)
Evgeniou, T., Pontil, M. & Poggio, T. Regularization networks and support vector machines. Adv. Comput. Math. 13, 1–50 (2000)
Valiant, L. A theory of the learnable. Commun. Assoc. Comp. Mach. 27, 1134–1142 (1984)
Acknowledgements
We thank D. Panchenko, R. Dudley, S. Mendelson, A. Rakhlin, F. Cucker, D. Zhou, A. Verri, T. Evgeniou, M. Pontil, P. Tamayo, M. Poggio, M. Calder, C. Koch, N. Cesa-Bianchi, A. Elisseeff, G. Lugosi and especially S. Smale for several insightful and helpful comments. This research was sponsored by grants from the Office of Naval Research, DARPA and National Science Foundation. Additional support was provided by the Eastman Kodak Company, Daimler-Chrysler, Honda Research Institute, NEC Fund, NTT, Siemens Corporate Research, Toyota, Sony and the McDermott chair (T.P.).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no competing financial interests.
Rights and permissions
About this article
Cite this article
Poggio, T., Rifkin, R., Mukherjee, S. et al. General conditions for predictivity in learning theory. Nature 428, 419–422 (2004). https://doi.org/10.1038/nature02341
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1038/nature02341
This article is cited by
-
Inferring the dynamics of “black-box” systems using a learning machine
Science China Physics, Mechanics & Astronomy (2021)
-
Goal scoring, coherent loss and applications to machine learning
Mathematical Programming (2020)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.