There was a time when horses were a major source of physical power. When the steam engine started to rival them, manufacturers wanted to know how many horses a particular engine would replace. James Watt soon realized how important these comparisons were, and conceived a new measure: the horsepower. From discussions with millwrights, who used horses to turn their wheels, one mechanical horsepower was estimated to be 33,000 foot-pounds per minute — the measure was a great success. And now, as artificial intelligence (AI) emerges as an alternative source of mental power, scientists are rethinking if and how the mental capabilities of humans and machines can be measured.

figure a

Ievgen Chepil / Alamy Stock Vector

For the moment, humans still claim superiority in mental power, but it’s clear that AI is becoming a rival. As in Watt’s approach, one valuable comparison seems to be whether a particular AI system is more powerful than a ‘standard’ human, what is naively referred to as ‘human level’ machine intelligence. Perhaps project management could lend us a measure: the ‘person month’.

Stimulating analogies aside, there are many differences between physical and mental work. The early psychometricians pushed the analogy as far as they could, measuring intelligence as the capability of producing a particular kind of information-processing work. However, psychometric measurement derives from human populations. In many of its forms, it just captures a deviation from the mean, but not an actual magnitude. There is no such thing as an imperial foot for intelligence.

So back in the late eighteenth century, what Watt did was Copernican: he measured horses in terms of universal physical measures — feet, pounds and minutes. In Watt’s time, the understanding of the physical world was sufficiently mature to realize that the power needed in a mill could be compared to the power needed to boil a pot of water.

In contrast, even today, there is nothing like a unit for mental power, which is independent of both the human and the task. In fact, the main problem for adapting psychometrics to AI is not only the lack of a ratio-scale measure, but the fact that it derives from human populations. For obvious reasons, the notion of machine population in AI is thorny. Still, a bevy of AI competitions, benchmarks and platforms have been recently introduced1. Progress is measured in terms of performance on particular tasks, usually compared with some average human estimate. Cross-task comparison remains elusive though, as many AI systems are specialized for a single task.

Some would say that cognitive tasks cannot be reduced to a limited number of capabilities, less so to a single one, because different tasks can never be compared. But others would say that intelligence may have different manifestations and a complex structure, a phenomenon that is common in physics. By looking at the performance of a system on one set of tasks, we might be able to predict its performance on a different set of tasks.

The range between these two extremes — the importance of bias — is immanent within machine learning.

In this context, Ray Solomonoff’s prediction theory2 and Leonid Levin’s universal heuristics3 see Occam’s razor as a bias that emerges from algorithmic information theory — a possible foundation for computational measures of intelligence. These theories are still insufficient for nailing down such a measure, and they are superficially different from deep learning, the dominant paradigm in AI today. Still, they have more potential than any other current computational theory of intelligence when it comes to quantifying mental power. For example, Levin’s universal search makes it possible to define the difficulty of any inversion task. From here, the capability of a system can be defined as an integral of performance over a range of difficulties. In this way, both difficulty and capability are measured on a ratio scale with the same unit: the logarithm of the number of computational steps4. This unit is ultimately commensurate to bits, under the two terms of Levin’s universal search.

This conceptually appealing formulation has some technical limitations. For example, without the choice of a reference machine, the Kolmogorov complexity and the logarithm of the number of computational steps will depend on constants that need to be evaluated independently. One way to overcome those limitations might be the further linking of computation and information to physics. Indeed, there must be bounds between mental power and physical energy, and discovering them may shed light on questions such as AI progress, intelligence growth, footprints on the environment and the effect of quantum computing on AI.

By trying to identify the units of mental power and then linking them to physical units, we may well look eccentric from the perspective of a thriving — and seemingly unbridled — AI field. But like Watt two centuries ago, sometimes we have to put the cart before the horse.

figure b