Many countries are keen to measure their universities' research performance with minimal burdens on the participants. Not least of these is the United Kingdom, which last month announced the results of its sixth and final Research Assessment Exercise (RAE, see page 13).

The RAE relied heavily on expert peer review of research publications, and attention in Britain and beyond is now focused on what form the replacement system will take. The proposed successor, the Research Excellence Framework (REF), is opaque. Little is known about how it will work other than a central principle: it will assess research quality using metrics, including publication citations. It may also take into account the number of postgraduates completing their studies and the amount of research income won by universities. There will be a smattering of 'light-touch expert review', although the exact form that this will take is not yet clear — it might simply be used to interpret the metrics results.

Metrics are not well established for the applications of science.

But taken alone, publication citations have repeatedly been shown to be a poor measure of research quality. An example from this journal illustrates the point. Our third most highly cited paper in 2007, with 272 citations at the time of inspection, was of a pilot study in screening for functional elements of the human genome. The importance lay primarily in the technique. In contrast, a paper from the same year revealing key biological insights into the workings of a proton pump, which moves protons across cell membranes, had received 10 citations. There are plenty more examples of such large disparities between papers that may be important for a variety of reasons: technological breakthroughs of immediate use to many, more rarefied achievements of textbook status, critical insights of relevance to small or large communities, 'slow burners' whose impact grows gradually or suddenly after a delay, and so on.

Such isolated statistics serve to illustrate a point that has been more systematically documented in the bibliometrics literature. Take, for example, an analysis of the correlation between judgements of scientific value using metrics, including citations, and those using peer review, in condensed-matter physics (E. J. Rinia et al. Res. Policy 27, 95–107; 1998). The study found disagreements in judgement between the two methods of evaluation in 25% of the 5,000 papers examined. In roughly half of these cases, the experts found a paper to be of interest when the metrics did not, and in the other half, the opposite was the case. The reasons for the differences are not fully understood.

It is also important to note that the use of metrics as an evaluation method does not have widespread support within the scientific community. Some members of the expert panels judging work in this year's RAE warn of the dangers to the quality of research assessment under a metrics-based model. These fears were expressed even by experts in subject areas thought to be most appropriate for metrics-based assessment, such as biology and chemistry. Metrics are not well established for the applications of science, or for disciplines less dependent on journal publication.

Britain is not alone in encountering problems in developing robust metric indicators of research quality. The Australian government is also dealing with a backlash from some universities and leading researchers against its current attempts to do the same thing.

The signs are that, after several false starts and delays, the final proposals for the REF, due in autumn 2009, are unlikely to be the radical departure from the RAE that the government first envisaged in 2006. Expert review is far from a problem-free method of assessment, but policy-makers have no option but to recognize its indispensable and central role.