It's time we paid closer attention to the real history of how science works.

An economist at a recent interdisciplinary meeting suggested that physics is more precise than social science mostly because it's easier to do experiments. In social science, the variability of people, among other factors, makes it almost impossible to do truly replicable experiments, whereas in physics and chemistry we can settle basic theoretical differences with experimental tests — think Eddington's famous expedition of 1919, which confirmed the predictions of Einstein's general theory of relativity.

This seems to be a fairly widespread view, and it persists, I suspect, because most non-physicists don't appreciate just how rare definitive experiments really are, especially for experiments probing physical phenomena at the boundary of what can be controlled and measured, which is where the greatest interest always lies. The hardest things to measure always tend to be the most open to debate.

This truth is well illustrated by the ongoing effort to settle basic questions about a classic and conceptually simple problem in heat transfer. Put a liquid in a closed container and heat it from below. When the heating is weak, heat will flow upwards by conduction through the stationary liquid. Under stronger heating — controlled by the applied temperature difference between the lower and upper boundaries — the dynamics change to convection, with heat now being carried by warmer fluid rising and cooler fluid falling.

This is, of course, the classic problem of Rayleigh–Bénard convection, initially studied by Claude Bénard with the abrupt transition explained somewhat later by Lord Rayleigh. But this initial transition to convection only hints at a rich story of what happens to the fluid under stronger heating, as the fluid flow becomes progressively more complex and eventually turbulent. Physicists have been exploring this problem experimentally for half a century, and in recent years have made great progress in understanding, at least roughly, how heat flow scales with increasing temperature differences.

As in many fluid problems, several dimensionless parameters characterize the issue. The first is the Rayleigh number, which reflects the strength of the temperature difference that drives the heat transport, relative to the fluid viscosity, which tends to damp out motion. Another is the Prandtl number that gives the ratio of viscous to thermal dissipation. For a flow determined by these two parameters, a key parameter is the Nusselt number, which measures the total heat transfer through the fluid relative to that in the initial quiescent (no convection) state.

Heat a fluid from below and measure what happens — it sounds easy. But experiments aiming to test the extremes show important and interesting inconsistencies. For example, experiments carried out a decade ago by a group in Grenoble seemed to confirm a transition at sufficiently high Rayleigh number (around 1011) into a so-called ultimate regime — a transition predicted theoretically by Robert Kraichnan nearly 50 years ago. This qualitative transition reflects the final destruction of thin thermal boundary layers at the top and bottom of the fluid, and implies a certain stability of the flow, in a statistical sense, showing a uniform asymptotic scaling of the heat flow (Nusselt number) with increasing Rayleigh number, apparently to arbitrarily large values.

These results — obtained by exploiting the behaviour of liquid helium near its critical point — seemed to achieve an important and satisfying closure on understanding the global character of heat transfer in this problem at high Rayleigh number. But not so fast: as a recent study on this topic notes (D. Funfschilling, E. Bodenschatz and G. Ahlers, http://arxiv.org/abs/0904.2526; 2009), earlier similar experiments by a group in Chicago, using similar methods, failed to find any transition into the predicted ultimate regime, even though they reached a Rayleigh number of 1012. Later experiments by another group in Oregon reached a Rayleigh number as high as 1017 with still no sign of the ultimate regime.

Which of these results is correct, and which is in error? This is by no means clear and illustrates the difficulty of interpreting experimental probes of extreme conditions. All of these efforts may, of course, report legitimate results but, for reasons unknown, may be exploring different fluid regimes, even though the Rayleigh numbers seem to cover the same territory.

A good idea when facing such discrepancies is to test the relevant physics with a totally different experimental set-up. This is precisely what Funfschilling and colleagues have now done, probing the same regime using helium made liquid at high pressure (rather than using cryogenic liquid helium) as well as other liquids, including molecular nitrogen and sulphur hexafluoride. They've used an oddly shaped apparatus made of a large cylinder, 5.5 m in length and 2.5 m in diameter, out of which rises a vertical turret some 4 m high and 1.5 m in diameter. Because of its shape, they call it the 'U-boat of Göttingen'. Placing a Rayleigh–Bénard cell within the turret under pressures up to 15 bar, they've been able to probe flows having Rayleigh numbers up to 1017 — again finding no evidence whatsoever for the elusive ultimate regime.

Of course, these results also cannot yet be taken as definitive. It will take further experiments under the same and different conditions to pin down the origin of the experimental discrepancy, and to determine whether Rayleigh–Bénard convection really does enter an ultimate regime as predicted. Yet the matter clearly illustrates the inevitable difficulty — contrary to the prevailing myth — of settling matters of physics through simple experiment. It almost always takes time for diverse results to accrue and form into some kind of consistent picture.

Indeed, it's time we paid closer attention to the real history, rather than folklore history, of how science works. I mentioned above the classic example of Eddington, yet even this famous case wasn't nearly as simple as commonly reported. As some historians of science have argued in recent years, Eddington's apparatus actually wasn't accurate enough at the time to truly establish the correctness of Einstein's prediction for the deflection of light by the Sun's gravitational field; legitimate confirmation only arrived in separate experiments many years later. Truth doesn't emerge from experiment fully grown, in one startling leap, but does so much less gracefully, trailing irritating but interesting contradictions along the way.