Two decades ago, computer scientist Kemal Ebcioğlu at IBM described a computer program that wrote music like Johann Sebastian Bach. Now I know what you're thinking: no one has ever written music like Bach. And Ebcioğlu's algorithm had a somewhat more modest goal: given the bare melody of a Bach chorale, it could fill in the rest (that is, the harmonies) in the style of the maestro1.

Ebcioğlu's aim was not to rival Bach, but to explore whether the 'laws' governing his composition could be abstracted from the 'data'. The goal was really no different from that attempted by scientists all the time: to deduce underlying principles from a mass of observations. Writing Bach-like music, however, highlights the constant dilemma in this approach. Even if the computerized chorales would have fooled experts (they were never actually put to the test), there would be no guarantee that the algorithm's rules bore any relation to the compositional mental processes in Bach's head.

That issue is becoming increasingly acute, especially in the hazily defined area of science called complexity. Computer models can now supply convincing mimics of all manner of complex behaviours, from the flocking of birds to traffic jams to the dynamics of economic markets. But do the rules of these models bear any relation to the physical processes that generate the real-world behaviour, or are the resemblances coincidental?

This matter is raised by a paper today in Science that reports on a technique to automate the identification of natural laws from experimental data2. As the authors Michael Schmidt and Hod Lipson of Cornell University in Ithaca, New York, point out, this is much more than a question of data-fitting — it examines what it means to think like a physicist, and perhaps even interrogates the issue of what natural laws are.

Something out of nothing?

A mathematical equation can always be found to fit any data set to arbitrary precision. But that risks capturing incidental noise along with any significant relationships between variables. What is needed is a law that obeys Einstein's famous dictum, being as simple as possible but not simpler.

Robots are our friends. Credit: Punchstock

Avoiding 'simpler' means not reducing the data to a trivial level. In complex systems, it has become common, even fashionable, to find power laws (y is proportional to xn) that link two variables3. But the ubiquity of such laws in systems ranging from economics to linguistics is now leading some to suspect that power laws might in themselves lack much fundamental significance.

Ideally, the mathematical laws governing a process should reflect the physically meaningful invariants of that process. They might, for example, stem from conservation of energy or of momentum. But it can be hard to distinguish true invariants from trivial patterns. A study4 published in 2005, for instance, showed that the constancy of various dimensionless parameters from the life histories of different species — such as the ratio of average life span to age at maturity — is not, as previously thought, evidence of underlying laws but follows inevitably from the way the parameters are chosen.

It's not always easy to distinguish the trivial from the profound. Isaac Newton showed that Kepler's laws identifying mathematical regularities in the parameters of planetary orbits have a deep origin in the inverse-square law of gravity. But the notorious Titius–Bode 'law' that alleges a mathematical relationship between the semi-major axes and the ranking of planets in the Solar System remains contentious and is dismissed by many astronomers as numerology.

As Schmidt and Lipson point out, some of the invariants embedded in natural laws aren't at all intuitive because they don't actually relate to observable quantities. Newtonian mechanics deals with quantities such as mass, velocity and acceleration, whereas its more fundamental formulation by Joseph Louis Lagrange invokes the principle of minimal action — yet 'action' is an abstract mathematical quantity that can be calculated but not really 'measured'.

And many of the seemingly fundamental constructs of natural law — the concept of force, say, or the Schrödinger equation in quantum theory — turn out to be mathematical conveniences or arbitrary (if well motivated) guesses that merely work well. Whether any physical reality should be ascribed to such things, or whether they should just be used as theoretical conveniences, remains unresolved in many of these constructs.

Deep questions

Schmidt and Lipson present a clever genetic algorithm to narrow down the list of candidate laws describing a data set by using additional criteria, such as whether partial derivatives of the equations also fit those of the data. The best candidate is finally selected by parsimony.

When used to deduce mathematical laws that describe the data from two experiments in mechanics, the algorithm came up with precisely the equations of motion that physicists would construct from first principles using Newton's laws of motion and Lagrangian mechanics. In other words, the solutions encode not just the observed data but also the underlying physics.

Perhaps the arena most in need of a tool such as this is not physics but biology. Another paper in today's Science, by UK researchers, reports a 'robot scientist' named Adam that can frame and test hypotheses about the genomics of yeast5 (see 'Introducing robo-scientist'). By identifying connections between genes and enzymes, Adam could channel post-docs away from such donkey-work and towards more creative endeavours.

But the really deep questions, about which we remain largely ignorant, concern what one might call the physics of genomics: whether there are the equivalents of Newtonian and Lagrangian principles, and if so, what. Despite the current fads for banking vast swathes of biological data, theories of this sort are not going to simply fall out of the numbers. So we need all the help we can get — even if it is from robots.