Scientists like to grumble about the peer-review system for judging research quality, but there is one sure way to make most of them defend it: suggest that peer review should be replaced with numerical measures of academic output.

A major UK report on the use of such research metrics this week reinforces this defence of the status quo (see Metrics, it concludes, are not yet ready to replace peer review as the preferred way to judge research papers, proposals and individuals.

Even if such metrics do not replace peer review in all situations, will they ever be ready to make a serious and trusted contribution to the assessment of science and scientists? As James Wilsdon, lead author of the UK report, writes in a World View on page 129, the one certainty in this debate is that the lure of metrics will only increase. Scientists should not stick their heads in the sand and pretend that the issue will go away. Rather, they should engage with metrics and work to improve the evidence base for them.

British universities now track the output of their academic staff using systems to gather details about their funding and types of output — patents, papers, citations and research grants — and to analyse institutional strengths for comparison with rival universities.

A sophisticated infrastructure has sprung up to support this activity. But it is patchy and inconsistent, with university managers often hopping between various approaches. Some, for example, have built their own internal research-information systems, and others rely on online databases of researcher outputs collected by funding agencies. There are non-profit systems that use public information, and commercially owned databases of bibliometric citations. A host of commercial benchmarking services can analyse the information. These analytical services are becoming increasingly sophisticated. They feature many different ways to group citation metrics, to cover collections of papers by individual, department, institution or journal, and to benchmark them against similar groups.

It is essential that universities are open about the metrics that they build and use.

The problem is that most of these metrics tools lack transparency. At the heart of the system, databases of academic outputs and citations are not publicly accessible or auditable. And the indicators built on top of these databases can also be black boxes: the UK report notes that there are no fewer than ten major global rankings of universities, for example. Some use poorly explained scores and arbitrary weightings to underpin their league tables, and as the report says, they “assume degrees of objectivity, authority and precision that are not yet possible to achieve in practice”. To some extent, metrics are used and quoted simply because other universities use them — the supply of league tables creates its own demand.

Such opacity can lead to distrust, negating the very advantage of metrics over qualitative assessment as objective, open measures of research performance. It is essential, therefore, that universities are open about the metrics that they build and use.

Transparency is one of the hallmarks of ‘responsible metrics’, a term introduced by the report that covers principles such as using robust data and applying diverse indicators that account for variation by field and for multiple research types. Other principles include being humble about the limits of quantitative evaluation — which the report notes should support, rather than replace, expert assessment — and recognizing that indicators must change over time.

Although it seems legitimate to use a range of metrics to analyse research performance, their use as managerial targets can leave academics feeling ‘painted by numbers’ — requiring them to change their behaviour to meet often-arbitrary goals. Institutions should therefore publicly state their principles to research managers and explain why they are using particular indicators as a management tool, as the report recommends. Perhaps the most important aspect to recognize about metrics is that they can make judgements more objective — but they can also objectify those being judged.