Nature | Comment

Metrology is key to reproducing results

Scientists of all stripes must work with measurement experts so that studies can be compared, urge Martyn Sené, Ian Gilmore and Jan-Theodoor Janssen

Article tools

Subject terms:

Courtesy of UK National Physical Laboratory

Ground-based measurements of light bouncing off deserts can be used to calibrate satellite observations of reflectivity and improve climate modelling.

Imagine you are a policymaker who needs to know how much carbon is stored in the South American forest. On-the-ground data in this area are slim. So when you come across two recently published maps of surface biomass, both made using the exact same satellite data, you think it's your lucky day. Unfortunately, these maps differ in their estimates of biomass by about 20% across the continent, and by even more on a local level. Which map, if either, can you trust1?

Many column inches have been dedicated to discussing this 'reproducibility crisis' in scientific research. Researchers are rarely incentivized to try to replicate results, and when they do, those results often don't match2.

Little attention has been paid in these discussions to how metrology can help. Metrology is the science of measurement: practitioners develop internationally agreed reference points so that measures — of anything from length or mass to radiation doses or gene activity — can be compared to standards with a known uncertainty. Metrologists (like us) also work with scientists making measurements to develop and disseminate best practice. Greater attention to those standards and best practices, and the development of new ones, is needed to help researchers reproduce results.

Today's scholarship is increasingly multidisciplinary and fast-moving, bringing together scientists with widely differing expertise, using different technical languages and techniques. This can lead to measurements being made without the ability or opportunity to validate them properly.

Measurement technology is becoming more powerful and complex. Software often stands between the raw data and the user: numbers are processed and data sets are combined automatically. Tracking and quantifying the uncertainty of the final result can get lost amid all this data crunching. Researchers often treat such tools as a 'black box' that spits out answers they take on trust, and find it harder to have an intuitive feel for when the answers are wrong.

A renewed focus on how data are collected, annotated and analysed could help to fill in this piece of the reproducibility puzzle3. In the South American forest example, differences in instrument calibration, uncertainty in ground-based reference data and differences in modelling methods created the mismatch between the maps1. A serious investigation into exactly how and why results differ can turn up systematic errors, or at least quantify the measurement uncertainty. Without work like this, there is no way that maps such as these can ever match.

International effort

Our institution, the National Physical Laboratory (NPL) in Teddington, UK, is one of dozens of National Metrology Institutes around the world that sit at the heart of the international measurement system. This system provides the framework, facilities and expertise to enable measurements to be reproduced with confidence, and with quantified uncertainty, across the globe.

The benefits of good metrology have been reaped for centuries. In the 1800s, a coherent, agreed system for measuring length and mass helped countries to have confidence in how much they were buying and selling in global trade and in the accuracy of their maps. Prototypes of the metre and the kilogram were made and locked in a vault in France so that no one could dispute their true values. The Industrial Revolution took off because people agreed on common manufacturing standards such as the type of screw thread used. Some two centuries later, the Global Positioning System relies on satellites that carry highly synchronized atomic clocks providing precise measures of time. Although it was Albert Einstein who said that the speed of light was constant, it was the metrology community that measured that speed and set the agreed number.

Jennifer Lauren Lee/NIST

This Kibble balance precisely measures Planck's constant, which is helping to redefine the kilogram.

Today, the International Bureau of Weights and Measures (BIPM), based in Paris, coordinates a robust metrological framework for all seven base units of measurement, from the metre to the kelvin. Advances continue to be made. In November 2018, for example, the definition of the kilogram and some other units are set to change, completing a long-term project to link all units to fundamental, unchanging properties of the Universe that researchers have measured to great precision (in the case of the kilogram, relating it to Planck's constant). If researchers are properly trained to use best metrological practice, following clear procedures and calibrating their measurements against standards that are directly linked to the agreed base units, we can all have confidence in the results.

The system can work extremely well, even for highly complex projects producing vast amounts of data from a range of instrumentation. The detections of the Higgs boson in 2012 and of gravitational waves in 2016, for example, were made with such attention to detail that they produced quantitative results that the world can have confidence in.

Problem areas

An increasing number of research areas lack a metrological framework, however. This is particularly the case in fields such as biology and environmental science, which do not share the long history of metrological practice found in physics and engineering. Defining measurement units in the life sciences is an intrinsically tricky task. Every electron can be counted on to have the same mass and charge, whereas living things have a wide range of natural variability, making it hard to develop and define standards. Before we start to tackle such variability, we need to ensure the measurements we do know how to make, and the tools we can characterize, are on a firm foundation.

One problem area is radiotherapy — the practice of using ionizing radiation to kill cancers or to otherwise affect cells. Although there is strict regulation about how to measure dose delivery for patients in clinical settings, similar regulation does not exist for research labs studying the cellular impacts of radiation. A 2013 report by the US National Institute of Standards and Technology (NIST) found that, in a year's worth of studies in the journal Radiation Research, only 7% cited written dosimetry standards or guides. The NIST survey concluded that radiobiological measurement is “frequently inadequate, thus undermining the reliability and reproducibility of the findings”4. This creates a barrier to the translation of preclinical studies into clinical practice, and unnecessarily increases the number of animals used in studies.

In response to this, work is under way in the United States at a number of centres to standardize dosimetry. In the United Kingdom, NPL is leading the way on services specifically aimed at preclinical studies.

Earth observation has problems too. Light bouncing off the planet's surface, for instance, can be reasonably well calibrated by looking at reflections from polar regions and deserts. Although this effect has been studied sufficiently to allow for consistent, reproducible results from different satellites, the information is not reliable enough to use in climate-change studies or measures such as forest coverage.

“Grants should assess 'pathways to reproducibility' along with 'pathways to impact'.”

Take, for example, four different satellites monitoring leaf area index — a measure proportional to the percentage of ground covered by green, photosynthetically active leaves. All four satellites (CYCLOPES, GEOLAND, GLOBCARBON and MODIS) have a resolution of 1 kilometre. Yet their monthly data over two years vary wildly from each other, sometimes by more than a factor of seven. The reasons for this are likely to be complex: the satellites pass at slightly different times, for instance, so the property being measured might be changing. But there are also differences in how the satellites are calibrated and the data analysed. Researchers are working to build a rigorous system of long-term validation and inter-comparison studies, to tease out systematic uncertainties and create more reliable data.

Another example of such work is the Versailles Project on Advanced Materials and Standards (VAMAS). Established in 1982, it was designed to develop international best practices and standards for making and measuring new materials. It is serving the community well for the measurement of ultra-flat 2D materials such as graphene — metrologists have refined techniques for gauging purity and thickness down to the atomic level.

Today, the National Metrology Institutes are leading efforts to standardize many biological measures, such as quantifying small amounts of protein in complex serums.

Such community efforts are incredibly important, yet they remain much less glamorous than discovery research.

Ways forward

So what can be done? One simple step would be for funding bodies to involve more metrologists in project selection and assessment. This would encourage the funding of replication studies, help to ensure that financed studies use good metrological practice and set studies up to allow for future attempts at replication. Grants should assess 'pathways to reproducibility' along with 'pathways to impact'.

Funding bodies often require that the raw data behind research are captured and made available. This requirement should be extended to include information on the quality of the data. It must be clear how, and how thoroughly, researchers worked to ensure their measures were linked to an internationally recognized standard and to quantify uncertainty. If this information is consistently stored alongside data, it will make it much easier to track uncertainties as data sets are processed and combined.

Some organizations are taking steps in this direction. The Australian Terrestrial Ecosystem Research Network (TERN), for example, has a framework and best-practice guide for collecting this sort of metadata. NPL is taking a leading role in the development of quality-control systems for Earth observation data sets being submitted to the European Copernicus Climate Change Service (C3S). This will ensure that all of the data in the C3S data store are fully traceable and well documented.

Quantifying uncertainty in complex problems is almost becoming a field in itself. The metrology community needs to step up to this challenge, in particular by engaging more statisticians, data experts and researchers from problematic areas such as cell biology. Metrology should be woven into scientific training, at all levels, to forge a dedication to precision measurement throughout science.

In the meantime, researchers should take full advantage of their National Metrology Institutes. It's surprising how many scientists have never heard of us. Labs having trouble reproducing their measurements can simply give us a call: we work in collaboration to provide advice, and to improve or develop new techniques. We measure almost every physical and chemical parameter imaginable, from time with an accuracy better than one second in the lifetime of the Universe, to the amount and localization of drug uptake in single cells. Speaking to us can often save time and improve the precision of results.

The task ahead is a challenging one that cannot be tackled by the metrology community alone. But it does require the mindset of metrologists: an attention to detail and a dedication to global comparability.

Journal name:
Date published:


  1. Mitchard, E. T. A. et al. Carbon Balance Manag. 8, 10 (2013).

  2. Baker, M. Nature 533, 452454 (2016).

  3. Plant, A. L., Locascio, L. E., May, W. E. & Gallagher, P. D. Nature Methods 11, 895898 (2014).

  4. Desrosiers, M. et al. J. Res. Natl Inst. Stand. Technol. 118, 403418 (2013).

Author information


  1. Martyn Sené is deputy chief executive at NPL with a background in nuclear physics.

  2. Ian Gilmore is a senior NPL fellow leading the National Centre of Excellence in Mass Spectrometry Imaging and is head of science.

  3. Jan-Theodoor Janssen is research director at NPL with a background in solid-state quantum technology.

Corresponding author

Correspondence to:

Author details

For the best commenting experience, please login or register as a user and agree to our Community Guidelines. You will be re-directed back to this page where you will see comments updating in real-time and have the ability to recommend comments to other users.

Comments for this thread are now closed.


4 comments Subscribe to comments

  1. Avatar for Pentcho Valev
    Pentcho Valev
    "The detections of the Higgs boson in 2012 and of gravitational waves in 2016, for example, were made with such attention to detail that they produced quantitative results that the world can have confidence in." This must be a joke. The results were so sloppy that, if ordinary researchers (not possessing the secret power of LIGO conspirators) had offered them, the paper would not have been given to referees: "Now a team of independent physicists has sifted through this data, only to find what they describe as strange correlations that shouldn't be there. The team, led by Andrew Jackson, a physicist at the Niels Bohr Institute in Copenhagen, claims that the troublesome signal could be significant enough to call the entire discovery into question. The potential effects of the unexplained correlations "could range from a minor modification of the extracted wave form to a total rejection of LIGO's claimed [gravitational wave] discovery," wrote Jackson in an email to Quanta. LIGO representatives say there may well be some unexplained correlations, but that they should not affect the team's conclusions. [...] The main claim of Jackson's team is that there appears to be correlated noise in the detectors at the time of the gravitational-wave signal. This might mean that, at worst, the gravitational-wave signal might not have been a true signal at all, but just louder noise." The noise correlation can only be an artifact - LIGO conspirators faked both signal and noise but forgot to uncorrelate the noise: "Einstein believed in neither gravitational waves nor black holes. [...] Dr Natalia Kiriushcheva, a theoretical and computational physicist at the University of Western Ontario (UWO), Canada, says that while it was Einstein who initiated the gravitational waves theory in a paper in June 1916, it was an addendum to his theory of general relativity and by 1936, he had concluded that such things did not exist. Furthermore - as a paper published by Einstein in the Annals of Mathematics in October, 1939 made clear, he also rejected the possibility of black holes. [...] On September 16, 2010, a false signal - a so-called "blind injection" - was fed into both the Ligo and Virgo systems as part of an exercise to "test ... detection capabilities". At the time, the vast majority of the hundreds of scientists working on the equipment had no idea that they were being fed a dummy signal. The truth was not revealed until March the following year, by which time several papers about the supposed sensational discovery of gravitational waves were poised for publication. "While the scientists were disappointed that the discovery was not real, the success of the analysis was a compelling demonstration of the collaboration's readiness to detect gravitational waves," Ligo reported at the time. But take a look at the visualisation of the faked signal, says Dr Kiriushcheva, and compare it to the image apparently showing the collision of the twin black holes, seen on the second page of the recently-published discovery paper. "They look very, very similar," she says. "It means that they knew exactly what they wanted to get and this is suspicious for us: when you know what you want to get from science, usually you can get it." The apparent similarity is more curious because the faked event purported to show not a collision between two black holes, but the gravitational waves created by a neutron star spiralling into a black hole. The signals appear so similar, in fact, that Dr Kiriushcheva questions whether the "true" signal might actually have been an echo of the fake, "stored in the computer system from when they turned off the equipment five years before"." Pentcho Valev
  2. Avatar for Pentcho Valev
    Pentcho Valev
    "Although it was Albert Einstein who said that the speed of light was constant, it was the metrology community that measured that speed and set the agreed number." This is a misleading statement. The metrology community measured the VALUE of the speed of light. As for the constancy/variability, all relevant experiments have shown that the speed of light is variable, not constant. For instance, in 1887 (prior to FitzGerald and Lorentz advancing the ad hoc length contraction hypothesis) the Michelson-Morley experiment UNEQUIVOCALLY confirmed the variable speed of light predicted by Newton's emission theory of light and refuted the constant (independent of the speed of the light source) speed of light predicted by the ether theory and later adopted by Einstein as his 1905 second ("light") postulate: "To it, we should add that the null result of the Michelson-Morley experiment was unhelpful and possibly counter-productive in Einstein's investigations of an emission theory of light, for the null result is predicted by an emission theory." "Emission theory, also called emitter theory or ballistic theory of light, was a competing theory for the special theory of relativity, explaining the results of the Michelson–Morley experiment of 1887. [...] The name most often associated with emission theory is Isaac Newton. In his corpuscular theory Newton visualized light "corpuscles" being thrown off from hot bodies at a nominal speed of c with respect to the emitting object, and obeying the usual laws of Newtonian mechanics, and we then expect light to be moving towards us with a speed that is offset by the speed of the distant emitter (c ± v)." "The Michelson-Morley experiment is fully compatible with an emission theory of light that CONTRADICTS THE LIGHT POSTULATE." Banesh Hoffmann, Relativity and Its Roots, p.92: "Moreover, if light consists of particles, as Einstein had suggested in his paper submitted just thirteen weeks before this one, the second principle seems absurd: A stone thrown from a speeding train can do far more damage than one thrown from a train at rest; the speed of the particle is not independent of the motion of the object emitting it. And if we take light to consist of particles and assume that these particles obey Newton's laws, they will conform to Newtonian relativity and thus automatically account for the null result of the Michelson-Morley experiment without recourse to contracting lengths, local time, or Lorentz transformations. Yet, as we have seen, Einstein resisted the temptation to account for the null result in terms of particles of light and simple, familiar Newtonian ideas, and introduced as his second postulate something that was more or less obvious when thought of in terms of waves in an ether. If it was so obvious, though, why did he need to state it as a principle? Because, having taken from the idea of light waves in the ether the one aspect that he needed, he declared early in his paper, to quote his own words, that "the introduction of a 'luminiferous ether' will prove to be superfluous." Pentcho Valev
  3. Avatar for Pentcho Valev
    Pentcho Valev
    Einstein's 1905 light postulate has been directly refuted by Doppler effect measurements. Here it is: "...light is always propagated in empty space with a definite velocity c which is independent of the state of motion of the emitting body." Albert Einstein, On the electrodynamics of moving bodies, 1905 This independence from the state of motion of the light source is only possible if the motion of the source is able to change the wavelength - an ability existing for sound waves but not for light. The following two conditionals are both valid: (A) If the motion of the source DOES change the wavelength, the frequency shifts the observer measures ARE NOT due to changes in the speed of light - Einstein's relativity is saved. (B) If the motion of the source DOES NOT change the wavelength, the frequency shifts the observer measures ARE due to changes in the speed of light - Einstein's relativity has to be abandoned. Einsteinians universally teach the antecedent of (A) of course: Stephen Hawking, "A Brief History of Time", Chapter 3: "Now imagine a source of light at a constant distance from us, such as a star, emitting waves of light at a constant wavelength. Obviously the wavelength of the waves we receive will be the same as the wavelength at which they are emitted (the gravitational field of the galaxy will not be large enough to have a significant effect). Suppose now that the source starts moving toward us. When the source emits the next wave crest it will be nearer to us, so the distance between wave crests will be smaller than when the star was stationary." Note that, in this interpretation, "the distance between wave crests will be smaller" because the moving source is chasing the fleeing wavecrest, just as the moving source of sound does. But this implies that the speed of the wavecrests relative to the moving source is smaller than c - if I run after you, your speed relative to me is smaller than if I'm not running. The principle of relativity is obviously violated - by measuring the speed of the emitted light, or the wavelength of the emitted light, the source would know whether it is stationary or moving. Accordingly, the underlying assumption - that the moving source emits shorter wavelength - has to be rejected. The moving source does not emit shorter wavelength - it emits faster light. If the speed of the source is v, the speed of the light relative to the observer is c'=c+v, in violation of Einstein's relativity. The increased frequency the observer measures (Doppler effect) is due to the increased speed of the light and represents an experimental refutation of Einstein's 1905 light postulate. Pentcho Valev
  4. Avatar for Pentcho Valev
    Pentcho Valev
    "I want to emphasize that light comes in this form - particles. It is very important to know that light behaves like particles, especially for those of you who have gone to school, where you probably learned something about light behaving like waves. I'm telling you the way it does behave - like particles. You might say that it's just the photomultiplier that detects light as particles, but no, every instrument that has been designed to be sensitive enough to detect weak light has always ended up discovering the same thing: light is made of particles." Richard Feynman, "QED: The strange theory of light and matter", p. 15 As far as its speed is concerned, light does indeed behave like particles. Its speed varies with the speed of the source, like the speed of ordinary projectiles, and it accelerates in a gravitational field just as falling bodies do (in the gravitational field of the Earth the acceleration of falling photons is g). Consider this: Paul Fendley, University of Virginia: "First consider light shined downward in a freely falling elevator of height h. [...] By the time the light hits the bottom of the elevator, it [the elevator] is accelerated to some velocity v. [...] We thus simply have v=gt=gh/c. [...] Now to the earth frame. When the light beam is emitted, the elevator is at rest, so earth and elevator agree the frequency is f. But when it hits the bottom, the elevator is moving at velocity v=gh/c with respect to the earth, so earth and elevator must measure different frequencies. In the elevator, we know that the frequency is still f, so on the ground the frequency f'=f(1+v/c)=f(1+gh/c^2). On the earth, we interpret this as meaning that not only does gravity bend light, but changes its frequency as well." Substituting f=c/λ (λ is the wavelength) into Fendley's equations gives: f' = f(1+v/c) = f(1+gh/c^2) = (c+v)/λ = c(1+gh/c^2)/λ = c'/λ where c' = c+v = c(1+gh/c^2) is the speed of light relative to an observer on the ground or, equivalently, relative to an observer in gravitation-free space moving with speed v towards the emitter. Clearly the speed of light varies with both the gravitational potential and the speed of the observer, as predicted by Newton's emission theory of light and in violation of Einstein's relativity. Many scientists know and sometimes even teach that, more or less explicitly: "If we accept the principle of equivalence, we must also accept that light falls in a gravitational field with the same acceleration as material bodies." University of Illinois at Urbana-Champaign: "Consider a falling object. ITS SPEED INCREASES AS IT IS FALLING. Hence, if we were to associate a frequency with that object the frequency should increase accordingly as it falls to earth. Because of the equivalence between gravitational and inertial mass, WE SHOULD OBSERVE THE SAME EFFECT FOR LIGHT. So lets shine a light beam from the top of a very tall building. If we can measure the frequency shift as the light beam descends the building, we should be able to discern how gravity affects a falling light beam. This was done by Pound and Rebka in 1960. They shone a light from the top of the Jefferson tower at Harvard and measured the frequency shift. The frequency shift was tiny but in agreement with the theoretical prediction. Consider a light beam that is travelling away from a gravitational field. Its frequency should shift to lower values. This is known as the gravitational red shift of light." Albert Einstein Institute: "One of the three classical tests for general relativity is the gravitational redshift of light or other forms of electromagnetic radiation. However, in contrast to the other two tests - the gravitational deflection of light and the relativistic perihelion shift -, you do not need general relativity to derive the correct prediction for the gravitational redshift. A combination of Newtonian gravity, a particle theory of light, and the weak equivalence principle (gravitating mass equals inertial mass) suffices. [...] The gravitational redshift was first measured on earth in 1960-65 by Pound, Rebka, and Snider at Harvard University..." "Let's say you, the observer, now move toward the source with velocity vo. You encounter more waves per unit time than you did before. Relative to you, the waves travel at a higher speed: v'=v+vo. The frequency of the waves you detect is higher, and is given by: f'=v'/λ=(v+vo)/λ." "vo is the velocity of an observer moving towards the source. This velocity is independent of the motion of the source. Hence, the velocity of waves relative to the observer is c + vo. [...] The motion of an observer does not alter the wavelength. The increase in frequency is a result of the observer encountering more wavelengths in a given time." Pound, Rebka and Snider knew that their experiments had confirmed the variation of the speed of light predicted by Newton's emission theory of light, not the gravitational time dilation predicted by Einstein's relativity: R. V. Pound and G. A. Rebka, Jr, APPARENT WEIGHT OF PHOTONS R. V. Pound and J. L. Snider, Effect of Gravity on Gamma Radiation: "It is not our purpose here to enter into the many-sided discussion of the relationship between the effect under study and general relativity or energy conservation. It is to be noted that no strictly relativistic concepts are involved and the description of the effect as an "apparent weight" of photons is suggestive. The velocity difference predicted is identical to that which a material object would acquire in free fall for a time equal to the time of flight." Pentcho Valev
sign up to Nature briefing

What matters in science — and why — free in your inbox every weekday.

Sign up



Nature Podcast

Our award-winning show features highlights from the week's edition of Nature, interviews with the people behind the science, and in-depth commentary and analysis from journalists around the world.