Probabilistic machine learning and artificial intelligence

Ghahramani, Zoubin

doi:10.1038/nature14541

Review Article
Published: 27 May 2015

Probabilistic machine learning and artificial intelligence

Zoubin Ghahramani¹

Nature volume 521, pages 452–459 (2015)Cite this article

94k Accesses
924 Citations
103 Altmetric
Metrics details

Subjects

Abstract

How can a machine learn from experience? Probabilistic modelling provides a framework for understanding what learning is, and has therefore emerged as one of the principal theoretical and practical approaches for designing machines that learn from data acquired through experience. The probabilistic framework, which describes how to represent and manipulate uncertainty about models and predictions, has a central role in scientific data analysis, machine learning, robotics, cognitive science and artificial intelligence. This Review provides an introduction to this framework, and discusses some of the state-of-the-art advances in the field, namely, probabilistic programming, Bayesian optimization, data compression and automatic model discovery.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 2: Probabilistic programming.**

**Figure 4: The Automatic Statistician.**

Bayesian statistics and modelling

Article 14 January 2021

Rens van de Schoot, Sarah Depaoli, … Christopher Yau

Fundamental limits to learning closed-form mathematical models from data

Article Open access 24 February 2023

Oscar Fajardo-Fontiveros, Ignasi Reichardt, … Roger Guimerà

Mouse tracking reveals structure knowledge in the absence of model-based choice

Article Open access 20 April 2020

Arkady Konovalov & Ian Krajbich

References

Russell, S. & Norvig, P. Artificial Intelligence: a Modern Approach (Prentice–Hall, 1995).
MATH Google Scholar
Thrun, S., Burgard, W. & Fox, D. Probabilistic Robotics (MIT Press, 2006).
MATH Google Scholar
Bishop, C. M. Pattern Recognition and Machine Learning (Springer, 2006).
MATH Google Scholar
Murphy, K. P. Machine Learning: A Probabilistic Perspective (MIT Press, 2012).
MATH Google Scholar
Hinton, G. et al. Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process. Mag. 29, 82–97 (2012).
ADS Google Scholar
Krizhevsky, A., Sutskever, I. & Hinton, G. E. ImageNet classification with deep convolutional neural networks. In Proc. Advances in Neural Information Processing Systems 25 1097–1105 (2012).
Google Scholar
Sermanet, P. et al. Overfeat: integrated recognition, localization and detection using convolutional networks. In Proc. International Conference on Learning Representations http://arxiv.org/abs/1312.6229 (2014).
Google Scholar
Bengio, Y., Ducharme, R., Vincent, P. & Janvin, C. A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003).
MATH Google Scholar
Ghahramani, Z. Bayesian nonparametrics and the probabilistic approach to modelling. Phil. Trans. R. Soc. A 371, 20110553 (2013). A review of Bayesian non-parametric modelling written for a general scientific audience.
ADS PubMed MATH Google Scholar
Jaynes, E. T. Probability Theory: the Logic of Science (Cambridge Univ. Press, 2003).
MATH Google Scholar
Koller, D. & Friedman, N. Probabilistic Graphical Models: Principles and Techniques (MIT Press, 2009). This is an encyclopaedic text on probabilistic graphical models spanning many key topics.
MATH Google Scholar
Cox, R. T. The Algebra of Probable Inference (Johns Hopkins Univ. Press, 1961).
MATH Google Scholar
Van Horn, K. S. Constructing a logic of plausible inference: a guide to Cox's theorem. Int. J. Approx. Reason. 34, 3–24 (2003).
MathSciNet MATH Google Scholar
De Finetti, B. La prévision: ses lois logiques, ses sources subjectives. In Annales de l'institut Henri Poincaré [in French] 7, 1–68 (1937).
Google Scholar
Knill, D. & Richards, W. Perception as Bayesian inference (Cambridge Univ.Press, 1996).
Griffiths, T. L. & Tenenbaum, J. B. Optimal predictions in everyday cognition. Psychol. Sci. 17, 767–773 (2006).
PubMed Google Scholar
Wolpert, D. M., Ghahramani, Z. & Jordan, M. I. An internal model for sensorimotor integration. Science 269, 1880–1882 (1995).
ADS CAS PubMed Google Scholar
Tenenbaum, J. B., Kemp, C., Griffiths, T. L. & Goodman, N. D. How to grow a mind: statistics, structure, and abstraction. Science 331, 1279–1285 (2011).
ADS MathSciNet CAS PubMed MATH Google Scholar
Marcus, G. F. & Davis, E. How robust are probabilistic models of higher-level cognition? Psychol. Sci. 24, 2351–2360 (2013).
PubMed Google Scholar
Goodman, N. D. et al. Relevant and robust a response to Marcus and Davis (2013). Psychol. Sci. 26, 539–541 (2015).
PubMed Google Scholar
Doya, K., Ishii, S., Pouget, A. & Rao, R. P. N. Bayesian Brain: Probabilistic Approaches to Neural Coding (MIT Press, 2007).
MATH Google Scholar
Deneve, S. Bayesian spiking neurons I: inference. Neural Comput. 20, 91–117 (2008).
MathSciNet PubMed MATH Google Scholar
Neal, R. M. Probabilistic Inference Using Markov Chain Monte Carlo Methods. Report No. CRG-TR-93–1 http://www.cs.toronto.edu/∼radford/review.abstract.html (Univ. Toronto, 1993).
Google Scholar
Jordan, M., Ghahramani, Z., Jaakkola, T. & Saul, L. An introduction to variational methods in graphical models. Mach. Learn. 37, 183–233 (1999).
MATH Google Scholar
Doucet, A., de Freitas, J. F. G. & Gordon, N. J. Sequential Monte Carlo Methods in Practice (Springer, 2000).
MATH Google Scholar
Minka, T. P. Expectation propagation for approximate Bayesian inference. In Proc. Uncertainty in Artificial Intelligence 17 362–369 (2001).
Google Scholar
Neal, R. M. In Handbook of Markov Chain Monte Carlo (eds Brooks, S., Gelman, A., Jones, G. & Meng, X.-L.) (Chapman & Hall/CRC, 2010).
Google Scholar
Girolami, M. & Calderhead, B. Riemann manifold Langevin and Hamiltonian Monte Carlo methods. J. R. Stat. Soc. Series B Stat. Methodol. 73, 123–214 (2011).
MathSciNet MATH Google Scholar
Sutskever, I., Vinyals, O. & Le, Q. V. Sequence to sequence learning with neural networks. In Proc. Advances in Neural Information Processing Systems 27, 3104–3112 (2014).
Google Scholar
Neal, R. M. in Maximum Entropy and Bayesian Methods 197–211 (Springer, 1992).
Google Scholar
Orbanz, P. & Teh, Y. W. in Encyclopedia of Machine Learning 81–89 (Springer, 2010).
Google Scholar
Hjort, N., Holmes, C., Müller, P. & Walker, S. (eds). Bayesian Nonparametrics (Cambridge Univ. Press, 2010).
MATH Google Scholar
Rasmussen, C. E. & Williams, C. K. I. Gaussian Processes for Machine Learning (MIT Press, 2006). This is a classic monograph on Gaussian processes, relating them to kernel methods and other areas of machine learning.
MATH Google Scholar
Lu, C. & Tang, X. Surpassing human-level face verification performance on LFW with GaussianFace. In Proc. 29th AAAI Conference on Artificial Intelligence http://arxiv.org/abs/1404.3840 (2015).
Google Scholar
Ferguson, T. S. A Bayesian analysis of some nonparametric problems. Ann. Stat. 1, 209–230 (1973).
MathSciNet MATH Google Scholar
Teh, Y. W., Jordan, M. I., Beal, M. J. & Blei, D. M. Hierarchical Dirichlet processes. J. Am. Stat. Assoc. 101, 1566–1581 (2006).
MathSciNet CAS MATH Google Scholar
Kemp, C., Tenenbaum, J. B., Griffiths, T. L., Yamada, T. & Ueda, N. Learning systems of concepts with an infinite relational model. In Proc. 21st National Conference on Artificial Intelligence 381–388 (2006).
Google Scholar
Medvedovic, M. & Sivaganesan, S. Bayesian infinite mixture model based clustering of gene expression profiles. Bioinformatics 18, 1194–1206 (2002).
CAS PubMed Google Scholar
Rasmussen, C. E., De la Cruz, B. J., Ghahramani, Z. & Wild, D. L. Modeling and visualizing uncertainty in gene expression clusters using Dirichlet process mixtures. Trans. Comput. Biol. Bioinform. 6, 615–628 (2009).
CAS Google Scholar
Griffiths, T. L. & Ghahramani, Z. The Indian buffet process: an introduction and review. J. Mach. Learn. Res. 12, 1185–1224 (2011). This article introduced a new class of Bayesian non-parametric models for latent feature modelling.
MathSciNet MATH Google Scholar
Adams, R. P., Wallach, H. & Ghahramani, Z. Learning the structure of deep sparse graphical models. In Proc. 13th International Conference on Artificial Intelligence and Statistics (eds Teh, Y. W. & Titterington, M.) 1–8 (2010).
Google Scholar
Miller, K., Jordan, M. I. & Griffiths, T. L. Nonparametric latent feature models for link prediction. In Proc. Advances in Neural Information Processing Systems 1276–1284 (2009).
Google Scholar
Hinton, G. E., McClelland, J. L. & Rumelhart, D. E. in Parallel Distributed Processing: Explorations in the Microstructure of Cognition: Foundations 77–109 (MIT Press, 1986).
Google Scholar
Neal, R. M. Bayesian Learning for Neural Networks (Springer, 1996). This text derived MCMC-based Bayesian inference in neural networks and drew important links to Gaussian processes.
MATH Google Scholar
Koller, D., McAllester, D. & Pfeffer, A. Effective Bayesian inference for stochastic programs. In Proc. 14th National Conference on Artificial Intelligence 740–747 (1997).
Google Scholar
Goodman, N. D. & Stuhlmüller, A. The Design and Implementation of Probabilistic Programming Languages. Available at http://dippl.org (2015).
Google Scholar
Pfeffer, A. Practical Probabilistic Programming (Manning, 2015).
Google Scholar
Freer, C., Roy, D. & Tenenbaum, J. B. in Turing's Legacy (ed. Downey, R.) 195–252 (2014).
Google Scholar
Marjoram, P., Molitor, J., Plagnol, V. & Tavaré, S. Markov chain Monte Carlo without likelihoods. Proc. Natl Acad. Sci. USA 100, 15324–15328 (2003).
ADS CAS PubMed Google Scholar
Mansinghka, V., Kulkarni, T. D., Perov, Y. N. & Tenenbaum, J. Approximate Bayesian image interpretation using generative probabilistic graphics programs. In Proc. Advances in Neural Information Processing Systems 26 1520–1528 (2013).
Google Scholar
Bishop, C. M. Model-based machine learning. Phil. Trans. R. Soc. A 371, 20120222 (2013). This article is a very clear tutorial exposition of probabilistic modelling.
ADS MathSciNet PubMed MATH Google Scholar
Lunn, D. J., Thomas, A., Best, N. & Spiegelhalter, D. WinBUGS — a Bayesian modelling framework: concepts, structure, and extensibility. Stat. Comput. 10, 325–337 (2000). This reports an early probabilistic programming framework widely used in statistics.
Google Scholar
Stan Development Team. Stan Modeling Language Users Guide and Reference Manual, Version 2.5.0. http://mc-stan.org/ (2014).
Fischer, B. & Schumann, J. AutoBayes: a system for generating data analysis programs from statistical models. J. Funct. Program. 13, 483–508 (2003).
MathSciNet MATH Google Scholar
Minka, T. P., Winn, J. M., Guiver, J. P. & Knowles, D. A. Infer.NET 2.4. http://research.microsoft.com/infernet (Microsoft Research, 2010).
Google Scholar
Wingate, D., Stuhlmüller, A. & Goodman, N. D. Lightweight implementations of probabilistic programming languages via transformational compilation. In Proc. International Conference on Artificial Intelligence and Statistics 770–778 (2011).
Google Scholar
Pfeffer, A. IBAL: a probabilistic rational programming language. In Proc. International Joint Conference on Artificial Intelligence 733–740 (2001).
Google Scholar
Milch, B. et al. BLOG: probabilistic models with unknown objects. In Proc. 19th International Joint Conference on Artificial Intelligence 1352–1359 (2005).
Google Scholar
Goodman, N., Mansinghka, V., Roy, D., Bonawitz, K. & Tenenbaum, J. Church: a language for generative models. In Proc. Uncertainty in Artificial Intelligence 22 23 (2008). This is an influential paper introducing the Turing-complete probabilistic programming language Church.
Google Scholar
Pfeffer, A. Figaro: An Object-Oriented Probabilistic Programming Language. Tech. Rep. (Charles River Analytics, 2009).
Google Scholar
Mansinghka, V., Selsam, D. & Perov, Y. Venture: a higher-order probabilistic programming platform with programmable inference. Preprint at http://arxiv.org/abs/1404.0099 (2014).
Wood, F., van de Meent, J. W. & Mansinghka, V. A new approach to probabilistic programming inference. In Proc. 17th International Conference on Artificial Intelligence and Statistics 1024–1032 (2014).
Google Scholar
Li, L., Wu, Y. & Russell, S. J. SWIFT: Compiled Inference for Probabilistic Programs. Report No. UCB/EECS-2015–12 (Univ. California, Berkeley, 2015).
Google Scholar
Bergstra, J. et al. Theano: a CPU and GPU math expression compiler. In Proc. 9th Python in Science Conference http://conference.scipy.org/proceedings/scipy2010/ (2010).
Google Scholar
Kushner, H. A new method of locating the maximum point of an arbitrary multipeak curve in the presence of noise. J. Basic Eng. 86, 97–106 (1964).
Google Scholar
Jones, D. R., Schonlau, M. & Welch, W. J. Efficient global optimization of expensive black-box functions. J. Glob. Optim. 13, 455–492 (1998).
MathSciNet MATH Google Scholar
Brochu, E., Cora, V. M. & de Freitas, N. A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. Preprint at http://arXiv.org/abs/1012.2599 (2010).
Hennig, P. & Schuler, C. J. Entropy search for information-efficient global optimization. J. Mach. Learn. Res. 13, 1809–1837 (2012).
MathSciNet MATH Google Scholar
Hernández-Lobato, J. M., Hoffman, M. W. & Ghahramani, Z. Predictive entropy search for efficient global optimization of black-box functions. In Proc. Advances in Neural Information Processing Systems 918–926 (2014).
Google Scholar
Snoek, J., Larochelle, H. & Adams, R. P. Practical Bayesian optimization of machine learning algorithms. In Proc. Advances in Neural Information Processing Systems 2960–2968 (2012).
Google Scholar
Thornton, C., Hutter, F., Hoos, H. H. & Leyton-Brown, K. Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms. In Proc. 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 847–855 (2013).
Google Scholar
Robbins, H. Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. 55, 527–535 (1952).
MathSciNet MATH Google Scholar
Deisenroth, M. P. & Rasmussen, C. E. PILCO: a model-based and data-efficient approach to policy search. In Proc. 28th International Conference on Machine Learning 465–472 (2011).
Google Scholar
Poupart, P. in Encyclopedia of Machine Learning 90–93 (Springer, 2010).
Google Scholar
Diaconis, P. in Statistical Decision Theory and Related Topics IV 163–175 (Springer, 1988).
Google Scholar
O'Hagan, A. Bayes-Hermite quadrature. J. Statist. Plann. Inference 29, 245–260 (1991).
MathSciNet MATH Google Scholar
Shannon, C. & Weaver, W. The Mathematical Theory of Communication (Univ. Illinois Press, 1949).
MATH Google Scholar
MacKay, D. J. C. Information Theory, Inference, and Learning Algorithms (Cambridge Univ. Press, 2003).
MATH Google Scholar
Wood, F., Gasthaus, J., Archambeau, C., James, L. & Teh, Y. W. The sequence memoizer. Commun. ACM 54, 91–98 (2011). This article derives a state-of-the-art data compression scheme based on Bayesian nonparametric models.
Google Scholar
Steinruecken, C., Ghahramani, Z. & MacKay, D. J. C. Improving PPM with dynamic parameter updates. In Proc. Data Compression Conference (in the press).
Lloyd, J. R., Duvenaud, D., Grosse, R., Tenenbaum, J. B. & Ghahramani, Z. Automatic construction and natural-language description of nonparametric regression models. In Proc. 28th AAAI Conference on Artificial Intelligence Preprint at: http://arxiv.org/abs/1402.4304 (2014). Introduces the Automatic Statistician, translating learned probabilistic models into reports about data.
Google Scholar
Grosse, R. B., Salakhutdinov, R. & Tenenbaum, J. B. Exploiting compositionality to explore a large space of model structures. In Proc. Conference on Uncertainty in Artificial Intelligence 306–315 (2012).
Google Scholar
Schmidt, M. & Lipson, H. Distilling free-form natural laws from experimental data. Science 324, 81–85 (2009).
ADS CAS PubMed Google Scholar
Wolstenholme, D. E., O'Brien, C. M. & Nelder, J. A. GLIMPSE: a knowledge-based front end for statistical analysis. Knowl. Base. Syst. 1, 173–178 (1988).
Google Scholar
Hand, D. J. Patterns in statistical strategy. In Artificial Intelligence and Statistics (ed Gale, W. A.) (Addison-Wesley Longman, 1986).
MATH Google Scholar
King, R. D. et al. Functional genomic hypothesis generation and experimentation by a robot scientist. Nature 427, 247–252 (2004).
ADS CAS PubMed Google Scholar
Welling, M. et al. Bayesian inference with big data: a snapshot from a workshop. ISBA Bulletin 21, https://bayesian.org/sites/default/files/fm/bulletins/1412.pdf (2014).
Google Scholar
Bakker, B. & Heskes, T. Task clustering and gating for Bayesian multitask learning. J. Mach. Learn. Res. 4, 83–99 (2003).
MATH Google Scholar
Houlsby, N., Hernández-Lobato, J. M., Huszár, F. & Ghahramani, Z. Collaborative Gaussian processes for preference learning. In Proc. Advances in Neural Information Processing Systems 26 2096–2104 (2012).
Google Scholar
Russell, S. J. & Wefald, E. Do the Right Thing: Studies in Limited Rationality (MIT Press, 1991).
Google Scholar
Jordan, M. I. On statistics, computation and scalability. Bernoulli 19, 1378–1390 (2013).
MathSciNet MATH Google Scholar
Hoffman, M., Blei, D., Paisley, J. & Wang, C. Stochastic variational inference. J. Mach. Learn. Res. 14, 1303–1347 (2013).
MathSciNet MATH Google Scholar
Hensman, J., Fusi, N. & Lawrence, N. D. Gaussian processes for big data. In Proc. Conference on Uncertainty in Artificial Intelligence 244 (UAI, 2013).
Google Scholar
Korattikara, A., Chen, Y. & Welling, M. Austerity in MCMC land: cutting the Metropolis-Hastings budget. In Proc. 31th International Conference on Machine Learning 181–189 (2014).
Google Scholar
Paige, B., Wood, F., Doucet, A. & Teh, Y. W. Asynchronous anytime sequential Monte Carlo. In Proc. Advances in Neural Information Processing Systems 27 3410–3418 (2014).
Google Scholar
Jefferys, W. H. & Berger, J. O. Ockham's Razor and Bayesian Analysis. Am. Sci. 80, 64–72 (1992).
ADS Google Scholar
Rasmussen, C. E. & Ghahramani, Z. Occam's Razor. In Neural Information Processing Systems 13 (eds Leen, T. K., Dietterich, T. G., & Tresp, V.) 294–300 (2001).
Google Scholar
Rabiner, L. R. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77, 257–286 (1989).
Google Scholar
Gelman, A. et al. Bayesian Data Analysis 3rd edn (Chapman & Hall/CRC, 2013).
Google Scholar
Lloyd, J. R. & Ghahramani, Z. Statistical model criticism using kernel two sample tests http://mlg.eng.cam.ac.uk/Lloyd/papers/kernel-model-checking.pdf (2015).

Download references

Acknowledgements

I acknowledge an EPSRC grant EP/I036575/1, the DARPA PPAML programme, a Google Focused Research Award for the Automatic Statistician and support from Microsoft Research. I am also grateful for valuable input from D. Duvenaud, H. Ge, M. W. Hoffman, J. R. Lloyd, A. Ścibior, M. Welling and D. Wolpert.

Author information

Authors and Affiliations

Department of Engineering, University of Cambridge, Trumpington Street, Cambridge, CB2 1PZ, UK
Zoubin Ghahramani

Authors

Zoubin Ghahramani
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zoubin Ghahramani.

Ethics declarations

Competing interests

I serve as an advisor to a number of companies working on machine learning and artificial intelligence.

Additional information

Reprints and permissions information is available at www.nature.com/reprints.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ghahramani, Z. Probabilistic machine learning and artificial intelligence. Nature 521, 452–459 (2015). https://doi.org/10.1038/nature14541

Download citation

Received: 12 February 2015
Accepted: 21 April 2015
Published: 27 May 2015
Issue Date: 28 May 2015
DOI: https://doi.org/10.1038/nature14541

This article is cited by

A New Construction of Covariance Functions for Gaussian Random Fields
- Weichao Wu
- Athanasios C. Micheas
Sankhya A (2024)
An Optimized XGBoost Model for Predicting Tunneling-Induced Ground Settlement
- Xiaojie Geng
- Shunchuan Wu
- Zhongxin Zhang
Geotechnical and Geological Engineering (2024)
Modeling the rutting performance of asphalt pavements: a review
- Yong Deng
- Xianming Shi
Journal of Infrastructure Preservation and Resilience (2023)
Machine learning framework with feature selection approaches for thyroid disease classification and associated risk factors identification
- Azrin Sultana
- Rakibul Islam
Journal of Electrical Systems and Information Technology (2023)
Predictive factors for degenerative lumbar spinal stenosis: a model obtained from a machine learning algorithm technique
- Janan Abbas
- Malik Yousef
- Kamal Hamoud
BMC Musculoskeletal Disorders (2023)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.