Ten years ago, scientists trained an algorithm (M. Schmidt and H. Lipson, Science 324, 81–85; 2009) to seek patterns in the erratic behaviour of a chaotic double pendulum. The algorithm, using a learning strategy inspired by evolution, was soon able to discover the laws of motion on its own, despite being given no scientific knowledge by the researchers. “Without any prior knowledge about physics, kinematics, or geometry,” they noted, “the algorithm discovered Hamiltonians, Lagrangians, and other laws of geometric and momentum conservation.” A machine had, apparently, deduced laws of physics.

This kind of success fuels an increasingly widely held belief that the explosive rise in technology for data gathering and analysis may soon make the scientific method unnecessary. After several centuries of science driven by the profitable interplay of observation and theory, the idea goes, we’re now moving into a new era in which theory and conceptual understanding will play a smaller role, if any role at all. Forget models and hypotheses, all we will need are smart machine-learning algorithms devouring huge datasets. Through fully automated learning, we’ll come to make vastly more accurate predictions, and face fewer real-world surprises. Machines will do science for us.

Some computer scientists even believe we may one day find what they call the master algorithm — an ultimate data-processing device that, running quite independently from human oversight, will organize human politics to “speed poverty’s decline” and make everyone’s lives “longer, happier and more productive” (P. Domingos, The Master Algorithm; 2015).

Of course, we’re all rapidly growing accustomed to machine-learning tools impinging on all aspects of our lives. They make decisions on bank loans and credit risks, determine the advertisements and prices we see when shopping online, and recommend medical diagnoses. Artificial intelligence is already changing scientific practice too — from materials scientists using algorithms to explore the space of possible substances to physicists using them to design quantum networks. But is the bigger vision at all plausible? Probably not, and not only because of the unreasonable optimism of the technology’s enthusiasts. For all its likely power and value, big data will only ever be a part of the scientific enterprise. On its own, it probably won’t even lead to better predictions. In many cases, we can expect that more data may actually lead to worse predictions.

That point was made in a recent essay by physicists Hykel Hosni and Angelo Vulpiani (preprint at https://arxiv.org/abs/1705.11186; 2017). The conclusion follows by noting some of the lessons to be learned from simple examples, including weather forecasting. The main lesson is that the best predictions generally come from a judicious trade-off between modelling and quantitative analysis. Conceptual insight counts as much as quantity of data. ‘Big data need big theory too’, as the title of another paper making a similar point in the context of biology goes (P. V. Coveney, E. R. Dougherty and R. R. Highfield, Phil. Trans. R. Soc. A https://doi.org/10.1098/rsta.2016.0153; 2016).

One expectation about limits on the predictive accuracy of big data comes from dynamical systems theory in the context of high-dimensional systems — the norm for most real-world applications. Some degree of predictability is guaranteed by the notion of Poincaré recurrence, which applies to any Hamiltonian system, as well as to the dynamics on the attractor of any dissipative system. Poincaré’s insight was that, given any current state of a system, one can be sure that the system will return to a neighbourhood of this state some time in the future. How long we should expect this time to be is the crucial question.

As Hosni and Vulpiani note, it’s easy to estimate this time — it is proportional to ɛD, with ɛ being some small number reflecting the precision of the prediction and D being the system dimension. This means that, with sufficient data specifying the current system state, useful predictions can be made, at least for low-dimensional systems. But if D is large — 10 or larger, for example — then the time of recurrence is astronomically large even if the aim is for predictions of fairly low precision.

Hence, the possibility of prediction from data alone may be guaranteed in theory, but it is often impossible in practice — even considering the clean situation in which one has perfectly precise information about the system. It’s the system dimension at fault. Making useful predictions in high-dimensional systems therefore requires something more than data — theory or conceptual insight. “Nothing is more practical,” as the Austrian physicist Ludwig Boltzmann once wrote, “than a good theory.”

Indeed, this is precisely how weather forecasters make useful predictions — by giving up on some data, and working with less detailed models. As Hosni and Vulpiani review the history, it was a few decades after Lewis Fry Richardson pioneered numerical weather forecasting that von Neumann and Charney came to the conclusion that the equations Richardson had proposed, though correct from the standpoint of fluid dynamics and atmospheric physics, were not practically useful. They were, in fact, too accurate, as they included details on high-frequency wave motions having no importance for weather forecasting. Accuracy was achieved in numerical weather forecasting by scientists realizing that too much data can be a bad thing — and it took insight to build simpler and more capable models.

In 2008, the editor of the technology magazine Wired made the famous claim that “the data deluge makes the scientific method obsolete.” Since then, this view has gained significant backing — no doubt, in large part, because of the commercial benefits accruing to companies rolling out their algorithms into every corner of modern society. Among many others, Hosni and Vulpiani suggest we should think again, and reign in our expectations.

We should also work hard to spread understanding about the likely limits to machine prediction, and the many ways false belief may lead to misuse and painful mistakes.

Clearly, if the day comes that computers can do all the things human brains can do, including generating creative insights and conceptual revolutions, then the human era of science may be over. We’re still a long way from that.