Abstract
The mathematical foundations of machine learning play a key role in the development of the field. They improve our understanding and provide tools for designing new learning paradigms. The advantages of mathematics, however, sometimes come with a cost. Gödel and Cohen showed, in a nutshell, that not everything is provable. Here we show that machine learning shares this fate. We describe simple scenarios where learnability cannot be proved nor refuted using the standard axioms of mathematics. Our proof is based on the fact the continuum hypothesis cannot be proved nor refuted. We show that, in some cases, a solution to the ‘estimating the maximum’ problem is equivalent to the continuum hypothesis. The main idea is to prove an equivalence between learnability and compression.
This is a preview of subscription content, access via your institution
Relevant articles
Open Access articles citing this article.
-
Physics is Organized Around Transformations Connecting Contextures in a Polycontextural World
Foundations of Science Open Access 10 August 2021
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Rent or buy this article
Prices vary by article type
from$1.95
to$39.95
Prices may be subject to local taxes which are calculated during checkout
Data availability
The data that support the findings of this study are available from the corresponding author upon request.
Change history
23 January 2019
In the version of this Article originally published, the following text was missing from the Acknowledgements: ‘Part of the research was done while S.M. was at the Institute for Advanced Study in Princeton and was supported by NSF grant CCF-1412958.’ This has now been corrected.
References
Valiant, L. G. A theory of the learnable. Commun. ACM 27, 1134–1142 (1984).
Vapnik, V. N. & Chervonenkis, A. Y. On the uniform convergence of relative frequencies of events to their probabilities. Theory Probab. Appl. 16, 264–280 (1971).
Vapnik, V. N. & Chervonenkis, A. Y. Theory of Pattern Recognition [in Russian] (Nauka, Moscow, 1974).
Blumer, A., Ehrenfeucht, A., Haussler, D. & Warmuth, M. K. Learnability and the Vapnik–Chervonenkis dimension. J. ACM 36, 929–965 (1989).
Vapnik, V. N. Statistical Learning Theory (Wiley, Hoboken, 1998).
Vapnik, V. N. An overview of statistical learning theory. IEEE Trans. Neural Netw. 10, 988–999 (1999).
Shalev-Shwartz, S., Shamir, O., Srebro, N. & Sridharan, K. Learnability, stability and uniform convergence. J. Mach. Learn. Res. 11, 2635–2670 (2010).
Ben-David, S., Cesa-Bianchi, N., Haussler, D. & Long, P. M. Characterizations of learnability for classes of {0, …, n}-valued functions. J. Comput. Syst. Sci. 50, 74–86 (1995).
Daniely, A., Sabato, S., Ben-David, S. & Shalev-Shwartz, S. Multiclass learnability and the ERM principle. J. Mach. Learn. Res. 16, 2377–2404 (2015).
Daniely, A., Sabato, S. & Shalev-Shwartz, S. Multiclass learning approaches: a theoretical comparison with implications. In Proc. NIPS 485–493 (ACM, 2012).
Daniely, A. & Shalev-Shwartz, S. Optimal learners for multiclass problems. In Proc. COLT 287–316 (2014).
Cohen, P. J. The independence of the continuum hypothesis. Proc. Natl Acad. Sci. USA 50, 1143–1148 (1963).
Cohen, P. J. The independence of the continuum hypothesis, II. Proc. Natl Acad. Sci. USA 51, 105–110 (1964).
Jech, T. J. Set Theory: Third Millenium Edition, Revised and Expanded (Springer, Berlin, 2003).
Kunen, K. Set Theory: An Introduction to Independence Proofs (Elsevier, Amsterdam, 1980).
Gödel, K. The Consistency of the Continuum Hypothesis (Princeton University Press, Princeton, 1940).
David, O., Moran, S. & Yehudayoff, A. Supervised learning through the lens of compression. In Proc. NIPS 2784–2792 (ACM, 2016).
Littlestone, N. & Warmuth, M. Relating Data Compression and Learnability. Technical Report (Univ. of California, 1986).
Moran, S. & Yehudayoff, A. Sample compression schemes for VC classes. J. ACM 63, 1–21 (2016).
Shalev-Shwartz, S. & Ben-David, S. Understanding Machine Learning: From Theory to Algorithms (Cambridge Univ. Press, New York, 2014).
Alon, N., Ben-David, S., Cesa-Bianchi, N. & Haussler, D. Scale-sensitive dimensions, uniform convergence, and learnability. J. ACM 44, 615–631 (1997).
Kearns, M. J. & Schapire, R. E. Efficient distribution-free learning of probabilistic concepts. J. Comput. Syst. Sci. 48, 464–497 (1994).
Simon, H. U. Bounds on the number of examples needed for learning functions. SIAM J. Comput. 26, 751–763 (1997).
Hanneke, S. The optimal sample complexity of PAC learning. J. Mach. Learn. Res. 15, 1–38 (2016).
Ben-David, S. & Ben-David, S. Learning a classifier when the labeling is known. In Proc. ALT 2011 (Lecture Notes in Computer Science Vol. 6925, 2011).
Acknowledgements
The authors thank D. Chodounský, S. Hanneke, R. Honzk and R. Livni for useful discussions. The authors also acknowledge the Simons Institute for the Theory of Computing for support. A.S.’s research has received funding from the Israel Science Foundation (ISF grant no. 552/16) and from the Len Blavatnik and the Blavatnik Family foundation. A.Y.’s research is supported by ISF grant 1162/15. Part of the research was done while S.M. was at the Institute for Advanced Study in Princeton and was supported by NSF grant CCF-1412958.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ben-David, S., Hrubeš, P., Moran, S. et al. Learnability can be undecidable. Nat Mach Intell 1, 44–48 (2019). https://doi.org/10.1038/s42256-018-0002-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s42256-018-0002-3
This article is cited by
-
Physics is Organized Around Transformations Connecting Contextures in a Polycontextural World
Foundations of Science (2022)
-
Cantor-like transfinite sequences and Gödel-like incompleteness revealed by means of Mersenne transfinite dimensional boolean hypercube concatenation
Journal of Mathematical Chemistry (2020)