Opinion

Nature 464, 488-489 (25 March 2010) | doi:10.1038/464488a; Published online 24 March 2010

Let's make science metrics more scientific

See associated Correspondence: Andersen, Nature 464, 1267 (April 2010)

Julia Lane1

Top

To capture the essence of good science, stakeholders must combine forces to create an open, sound and consistent system for measuring all the activities that make up academic productivity, says Julia Lane.

Let's make science metrics more scientific

ILLUSTRATION BY DAVID PARKINS

Summary

  • Existing metrics have known flaws
  • A reliable, open, joined-up data infrastructure is needed
  • Data should be collected on the full range of scientists' work
  • Social scientists and economists should be involved

Measuring and assessing academic performance is now a fact of scientific life. Decisions ranging from tenure to the ranking and funding of universities depend on metrics. Yet current systems of measurement are inadequate. Widely used metrics, from the newly-fashionable Hirsch index to the 50-year-old citation index, are of limited use1. Their well-known flaws include favouring older researchers, capturing few aspects of scientists' jobs and lumping together verified and discredited science. Many funding agencies use these metrics to evaluate institutional performance, compounding the problems2. Existing metrics do not capture the full range of activities that support and transmit scientific ideas, which can be as varied as mentoring, blogging or creating industrial prototypes.

The dangers of poor metrics are well known — and science should learn lessons from the experiences of other fields, such as business. The management literature is rich in sad examples of rewards tied to ill-conceived measures, resulting in perverse outcomes. When the Heinz food company rewarded employees for divisional earnings increases, for instance, managers played the system by manipulating the timing of shipments and pre-payments3. Similarly, narrow or biased measures of scientific achievement can lead to narrow and biased science.

There is enormous potential to do better: to build a science of science measurement. Global demand for, and interest in, metrics should galvanize stakeholders — national funding agencies, scientific research organizations and publishing houses — to combine forces. They can set an agenda and foster research that establishes sound scientific metrics: grounded in theory, built with high-quality data and developed by a community with strong incentives to use them.

Scientists are often reticent to see themselves or their institutions labelled, categorized or ranked. Although happy to tag specimens as one species or another, many researchers do not like to see themselves as specimens under a microscope — they feel that their work is too complex to be evaluated in such simplistic terms. Some argue that science is unpredictable, and that any metric used to prioritize research money risks missing out on an important discovery from left field. It is true that good metrics are difficult to develop, but this is not a reason to abandon them. Rather it should be a spur to basing their development in sound science. If we do not press harder for better metrics, we risk making poor funding decisions or sidelining good scientists.

If we do not press harder for better metrics, we risk making poor funding decisions or sidelining good scientists.

Clean data

Metrics are data driven, so developing a reliable, joined-up infrastructure is a necessary first step. Today, important, but fragmented, efforts such as the Thomson Reuters Web of Knowledge and the US National Bureau of Economic Research Patent Database have been created to track scientific outcomes such as publications, citations and patents. These efforts are all useful, but they are labour intensive and rely on transient funding, some are proprietary and non-transparent, and many cannot talk to each other through compatible software. We need a concerted international effort to combine, augment and institutionalize these databases within a cohesive infrastructure.

The Brazilian experience with the Lattes Database (http://lattes.cnpq.br/english) is a powerful example of good practice. This provides high-quality data on about 1.6 million researchers and about 4,000 institutions. Brazil's national funding agency recognized in the late 1990s that it needed a new approach to assessing the credentials of researchers. First, it developed a 'virtual community' of federal agencies and researchers to design and develop the Lattes infrastructure. Second, it created appropriate incentives for researchers and academic institutions to use the database: the data are referred to by the federal agency when making funding decisions, and by universities in deciding tenure and promotion. Third, it established a unique researcher identification system to ensure that people with similar names are credited correctly. The result is one of the cleanest researcher databases in existence.

On an international level, the issue of a unique researcher identification system is one that needs urgent attention. There are various efforts under way in the open-source and publishing communities to create unique researcher identifiers using the same principles as the Digital Object Identifier (DOI) protocol, which has become the international standard for identifying unique documents. The ORCID (Open Researcher and Contributor ID) project, for example, was launched in December 2009 by parties including Thompson Reuters and Nature Publishing Group. The engagement of international funding agencies would help to push this movement towards an international standard.

Similarly, if all funding agencies used a universal template for reporting scientific achievements, it could improve data quality and reduce the burden on investigators. In January 2010, the Research Business Models Subcommittee of the US National Science and Technology Council recommended the Research Performance Progress Report (RPPR) to standardize the reporting of research progress. Before this, each US science agency required different reports, which burdened principal investigators and rendered a national overview of science investments impossible. The RPPR guidance helps by clearly defining what agencies see as research achievements, asking researchers to list everything from publications produced to websites created and workshops delivered. The standardized approach greatly simplifies such data collection in the United States. An international template may be the logical next step.

Importantly, data collected for use in metrics must be open to the scientific community, so that metric calculations can be reproduced. This also allows the data to be efficiently repurposed. One example is the STAR METRICS (Science and Technology in America's Reinvestment — Measuring the Effects of Research on Innovation, Competitiveness and Science) project, led by the National Institutes of Health and the National Science Foundation under the auspices of the White House Office of Science and Technology Policy. This project aims to match data from institutional administrative records with those on outcomes such as patents, publications and citations, to compile accomplishments achieved by federally funded investigators. A pilot project completed at six universities last year showed that this automation could substantially cut investigators' time on such tasks.

Funding agencies currently invest in fragmented bibliometrics projects that often duplicate the work of proprietary data sets. A concerted international strategy is needed to develop business models that both facilitate broader researcher access to the data produced by publishing houses, and compensate those publishers for the costs associated with collecting and documenting citation data.

Getting creative

As well as building an open and consistent data infrastructure, there is the added challenge of deciding what data to collect and how to use them. This is not trivial. Knowledge creation is a complex process, so perhaps alternative measures of creativity and productivity should be included in scientific metrics, such as the filing of patents, the creation of prototypes4 and even the production of YouTube videos. Many of these are more up-to-date measures of activity than citations. Knowledge transmission differs from field to field: physicists more commonly use preprint servers; computer scientists rely on working papers; others favour conference talks or books. Perhaps publications in these different media should be weighted differently in different fields.

People are starting to think about collecting alternative kinds of data. Systems such as MESUR (Metrics from Scholarly Usage of Resources, http://www.mesur.org), a project funded by the Andrew W. Mellon Foundation and the National Science Foundation, record details such as how often articles are being searched and queried, and how long readers spend on them. New tools are available to capture and analyse 'messy' data on human interactions — for example, visual analytics intended to discover patterns, trends, and relationships between terrorist groups are now being applied to scientific groups (http://nvac.pnl.gov/agenda.stm).

There needs to be a greater focus on what these data mean, and how they can be best interpreted. This requires the input of social scientists, rather than just those more traditionally involved in data capture, such as computer scientists. Basic research is also needed into how measurement can change behaviour, to avoid the problems that Heinz and others have experienced with well-intended metrics that lead to undesirable outcomes. If metrics are to be used to best effect in funding and promotion decisions, economic theory is needed to examine how changes to incentives alter the way research is performed5.

How can we best bring all this theory and practice together? An international data platform supported by funding agencies could include a virtual 'collaboratory', in which ideas and potential solutions can be posited and discussed. This would bring social scientists together with working natural scientists to develop metrics and test their validity through wikis, blogs and discussion groups, thus building a community of practice. Such a discussion should be open to all ideas and theories and not restricted to traditional bibliometric approaches.

Some fifty years after the first quantitative attempts at citation indexing, it should be feasible to create more reliable, more transparent and more flexible metrics of scientific performance. The foundations have been laid. Most national funding agencies are supporting research in science measurement, vast amounts of new data are available on scientific interactions thanks to the Internet, and a community of people invested in the scientific development of metrics is emerging. Far-sighted action can ensure that metrics goes beyond identifying 'star' researchers, nations or ideas, to capturing the essence of what it means to be a good scientist.

Further reading

Zucker, L. G. & Darby, M. R. Linking Government R&D Investment, Science, Technology, Firms and Employment: Science & Technology Agents of Revolution (Star) Database NSF award 0830983 (2008).

Leydesdorff, L. in Beyond Universal Pragmatics: Studies in the Philosophy of Communication (ed. Grant, C. B.) 149–174 (Peter Lang, 2010).

Borner, K. Towards a Macroscope for Science Policy Decision Making NSF award 0738111 (2007).

Gero, J. in The Science of Science Policy: The Handbook (eds Husbands-Fealing, K. et al.) (Stanford University Press, 2010).

Kremer, M. & Williams, H. 'Incentivizing Innovation: Adding to the Tool Kit' in Innovation Policy and the Economy (eds J. Lerner & S. Stern) 10, 1-17 (University of Chicago Press, 2010).

The opinions expressed are those of the author and may not reflect the policies of the National Science Foundation.

Top

References

  1. Campbell, P. Ethics Sci. Environ. Polit. 8, 5–7 (2008). | Article
  2. Curtis, B. Globalis. Soc. Edu. 6, 179–194 (2008). | Article
  3. Kerr, S. Acad. Manage J. 18, 769–783 (1975). | Article | PubMed | ChemPort |
  4. Thrash, T. M., Maruskin, L. A., Cassidy, S. E., Fryer, J. W., & Ryan, R. M. J. Pers. Soc. Psychol. (in the press).
  5. Gibbons, R in Econ. Perspect. 12 115–132 (Peter Lang, 1998).
  1. Julia Lane is the director of the Science of Science & Innovation Policy programme, National Science Foundation, 4201 Wilson Boulevard, Arlington, Virginia 22230, USA.
    Email: jlane@nsf.gov

MORE ARTICLES LIKE THIS

These links to content published by NPG are automatically generated.


Readers' Comments

If you find something abusive or inappropriate or which does not otherwise comply with our Terms and Conditions or Community Guidelines, please select the relevant 'Report this comment' link.

  1. #9806
    Date:
    2010-03-24 02:52 PM
    Nicola Jones said:

    Hello. I am one of the editors with Nature's opinion section.

    Nature would like to hear your views on this subject. What aspects of your own scientific pursuits do you feel are not being properly captured by the metrics used to assess your career? Which aspects of your work are being over-emphasized? If there could be one vital change to how your productivity and effectiveness is measured, what would that be?

    More broadly, what do you feel are the best practices in current metrics work? What more would you like to see done?

    Your comments could help to make a difference; we look forward to reading them.

    Report this comment

  2. #9816
    Date:
    2010-03-24 07:27 PM
    Jim Woodgett said:

    Metrics can always be and always will be "gamed" and, as the article notes, the various means of measuring scientific productivity all have faults. That does not detract from their value or need, however, as long as they are only taken as guides for following trends. A bigger problem, however, is a lack of standardization such that fair comparisons are often difficult to achieve without enormous effort. In an admittedly limited attempt to compare my institution's performance with it's peers over time, I list the top 30 journals in biomedical research by impact factor in a given year (usually the prior year) and exclude review journals. I then tabulate the numbers of primary papers published in these journals for various institutions. This yields manageable and verifiable numbers. Problems arise from author identification (including authors who identify with multiple institutions – resulting in double-dipping) and the question of who to "count" within a given institution.

    An ORCID identifier which includes the affiliations of the investigator would greatly increase accuracy. Of course, Journal Impact Factors have their own limitations, but at least it?s a place to start.

    Regarding activities to measure, as long as like is being compared with like (in given fields), then adding in various activities just needs some aspect of weighting (much like citations help weight impact to a degree). Whether these additional activities are taken into account will depend on the relative importance an institution places on them.

    Report this comment

  3. #9837
    Date:
    2010-03-25 05:13 AM
    Konrad Hinsen said:

    Two fundamental problems with metrics in science are that quantity does not imply quality, and that short-term impact does not imply long-term significance. The real value of many scientific discoveries often becomes apparent only many years later. It would be interesting to evaluate metrics by applying them to research that is a few decades old. Would they have identified ideas and discoveries that we now recognize as breakthroughs?

    As for over- and under-emphasis in currently popular metrics, two problems come to mind:

    1) Research in collaboration is over-emphasized, as impact factors and H-index don't take into account the number of authors of a publication. Three scientists each working and publishing alone could simply improve their bibliometric evaluation by agreeing to put all three author names on all of their publications, although that would clearly not add any scientific value.

    2) Long-term services to the scientific community are undervalued by the current metrics that simply count visible signs of activity. One example is development of scientific software: a new piece of software can become the subject of a publication, but the year-long maintenance and technical support that usually follows remains invisible.

    Report this comment

  4. #9838
    Date:
    2010-03-25 05:48 AM
    Martin Knapmeyer said:

    The article raises important points. We all know anecdotes and have ideas about how to abuse the science metrics (Nobody is doing these evil things, of course), but it is indeed important not only to quantify the number of papers written, but also to quantify the effect that tricking the system may have. It is also important to understand how the introduction and even the change of metrics will change the behaviour of those that are measured.

    One effect that is already known from school: receiving good marks for school work is considered as a reward – but after a while the children learn how to get the reward with the smallest possible effort and forget that school is about learning, not about getting rewarded.

    Although science cannot be done without funding – science is about gaining knowledge, not about getting funded.

    I followed the discussions about science metrics for a while. One point I always found missing: a clear explanation how and why the measurement of quantity can point to quality. It has been said on the pages of nature some time ago that the best way to judge the quality of a paper is to read it. I doubt that the work of a scientist can be judged properly without reading (and understanding) it. That reading (and understanding!) all the stuff is much more labor than just counting it is not an excuse. If one project (reading) is considered impossible and one starts a replacement project (counting), then it has to be proven scientifically that the replacement will really yield the same results. I have not yet seen any such proof.

    Report this comment

  5. #9847
    Date:
    2010-03-25 08:49 AM
    Daniel Corcos said:

    Having now published papers for over 25 years, I could not find any correlation, for my own papers, between their citation indexes and their quality (novelty, accuracy with time, generality). In contrast, a good correlation can be seen for the late citations (after 10 years). From my talk with colleagues, it seems that most of them agree with me.

    Report this comment

  6. #9850
    Date:
    2010-03-25 09:24 AM
    Nicolas Le Novere said:

    @Martin Knapmeyer. That evaluators have to read the papers is the most frequent criticism of bibliometrics. But this is not an appropriate one. First of all, in the vast majority of situations where the use of bibliometrics is suggested, it is just not possible to read all the production of all the people under scrutiny. Second, most panels do not have the sufficient experience to judge the diversity of production of the people under scrutiny. Gosh, most often one cannot cover even the diversity of one scientist! (over the last 18 years, I was scientifically active in four completely disconnected communities). Now, even if an evaluator is competent, reading a scientific paper is not akin to read the daily newspaper. It takes a lot of time, concentration, access to external resources etc. To do an actual evaluation of the novelty and quality of a work, one has to provide the amount of work needed for a journal club. All this makes reading the research production the most inadequate and arbitrary method of evaluating science. Having participated to many grant and evaluation panels not using bibliometrics, in several countries, I can state with an absolute certainty that the evaluation entirely depended on 1) prior flaky knowledge of the evaluators. 2) the charisma of the candidate if an interview/lecture was involved. The most competent evaluators are the peers ... who cite the works.

    Bibliometrics is a preliminary step. It is not the only tool in the evaluator toolkit, and the choice of the tools will depend on the situation. Bibliometrics, patents, grant success, reading past work, reading research project, lectures etc, can inform to judge an individual, an institution, for recruitement, fund granting etc. But a good combination of experience-weighted citation index and H-index (*) is an extremely useful and robust indicator to "weed-out" and build short-lists (I am not including impact-factor here, that is definitively nonsensical).

    (*) The H-index evolves quasi-linearly with years of scientific production. But similarly to impact-factor, the average slope depends on the community. Building "clouds" of domain-specific H-indices can be extremely useful. In the case where more thinking is required (hiring a head of department etc.) plotting individual h-index history is also very informative.

    @Konrad Hinsen, it has actually been shown that the h-index of young scientist was a good predictor of future successes.

    Report this comment

  7. #9852
    Date:
    2010-03-25 10:37 AM
    Martin Knapmeyer said:

    @Nicolas Le Novere: That is exactly my point: A proper judgement of a person's scientific achievements is hard work and cannot be done one the fly.

    Of course scientists feel that mere counting of their papers in appropriate, and of course they critizise those who do it. It is not only difficult and hard work to read and understand a paper, it is also hard work to do the underlying research and to get the paper written and published. Scientists probably would like to get that appreciated. But the message of bibliometry, to first order, is: your content does not count. But the content is why we do the science, and society need scientists because of the content.

    That the h-index of a young scientist is a good indicator for his future career sounds a bit like a self-fulfilling prophecy, because that scientists career was influenced by judgements based on the h-index, or at least on citation or publication counts.
    I must admit that I am not familiar with bibliometry literature – was the power of the h-index shown for people that have never been "counted" during their career?

    Report this comment

  8. #9858
    Date:
    2010-03-25 12:18 PM
    Luigi Foschini said:

    It is worth noting that, in the same issue of Nature, there is an interesting article about large collaborations in high-energy physics (see Merali, p. 482 ). Merali outlined how these huge collaborations are changing certain ways of doing physics, but I can add that also astrophysics is currently changing along these guidelines.

    Obviously, this can severely affect any tentative to measure the productivity of any scientist. The LHC Collaborations, Merali writes, are made of about 10 thousands researchers all over the world: it is the size of a small city (my hometown is smaller, just more than 4000 souls). This results in an abnormal list of authors and it is very difficult to recognize who really did the work. In addition, it is deeply unfair: over-extrapolating this practice, I could say that since I use a Mac to analyze data and write articles and books, I should add Steve Jobs among the authors.

    Surely, it is necessary to give the proper credit to people who build the instruments, but the present way to wildly give the right to sign a paper to anyone who just screws bolts or brings money is killing the concept of authorship and hence any possibility to measure the productivity. You can evaluate the production of a collaboration, but no more that of individuals. And collaborations are made of individuals. Scientific ideas were born thanks to individuals. I cannot imagine that the general relativity could develop in such collaborations. Indeed, these large collaborations are also weakening the advancement of science, as outlined already in 1985 by Pickering and Trower still for high-energy physics. They cited this emblematic sentence by Luis Alvarez: "our present scheduling procedures almost guarantee that nothing unexpected can be found".

    In addition, these large collaborations collects almost all the scientists in that research field, so that – Merali wrote – it is not possible to peer review the produced articles. I would add: why to publish? All the potential readers have signed the paper and therefore it is expected that they have read it. If it is just to archive the work done, it is sufficient a report to be put on an online public data base. There is no more need of published articles, since there is nobody outside the collaboration that can read it.

    On the other hand, it is worth adding that very few authors really understand what is written in these articles. So, why there should be authors who do not understand what they have signed? This creates also ethical issues and cannot be justified simply on the basis that they have built the instrument or brought money. Any way to measure the effective value of these persons is biased by the fact that they are authors of articles that they have not written and understood.

    I don't want to be overly critical with respect to people who build instruments: they do a great work and without these enormous instruments we could not measure important parameters. However, just to explain with a metaphor, a lot of people are employed to build one Ferrari, but there is only one Shumacher who is able to drive it in the best way and to win the races. So, there is one championship for pilots and another for cars factories. In my opinion, the same should occurs also in science: we need to restore the proper concept of authorship, otherwise we will soon lose more than the way to measure productivity.

    Report this comment

  9. #9880
    Date:
    2010-03-26 09:49 AM
    Philippe AMELINE said:

    @Nicolas Le Novere wrote: "The most competent evaluators are the peers ... who cite the works."

    Don't you think that, if peers are the most competent evaluators, they should be provided with a direct mean of evaluation instead of relying on an indirect indicator (here "to cite the works").

    As clearly described by this article, using indirect evaluation opens a cheating field... worsened by the fact that this process has a considerable impact on the way publication are made. In short, research evaluation comes with decrease in publication quality.

    Modern social network tools provide proper qualitative tools for "subjective" evaluation. The obsession with objective data driven evaluation might well been proven not to apply to human beings... especially in such domains where complexity is the rule.

    Report this comment

  10. #9884
    Date:
    2010-03-26 11:22 AM
    Daniel Corcos said:

    The problem with evaluation by citation indexes is circularity. We need another way for evaluating quality and compare it to citation indexes.
    To illustrate the problem of cross-evaluation, we can compare the selling of the books from Nobel laureates in Literature (before they get the prize) to that of the the numerous non laureates. Then, we would conclude that at least some copies of books written by Nobel laureates have been sold, but almost none of them made a best-seller. Conversely, if a publisher wants to make a best seller, he knows some methods, but he will not push in it a writer with the potential of a Nobel Prize winner.
    For a Nobel prize in Sciences, citation indexes might look more relevant. However, the problem of evaluation by citation indexes is again circularity. If funding is attributed only to those who publish a lot, the others will never get acces to fundings and will not publish. Therefore, there is no way to get out of the vicious circle. It becomes quite clear that in some countries this circularity is used to justify what is simply an unfair balance of power.

    Report this comment

  11. #9888
    Date:
    2010-03-26 04:43 PM
    Jevin West said:

    Prudence? For better or for worse, metrics are here to stay. No metric will ever replace reading papers as the best practice for evaluating a scholar?s work. Unfortunately, time and resources are limited, and administrators will invariably seek out computable forms of evaluation. In order to mitigate the negative repercussions of metrics on academia, the community needs to make a concerted effort to understand exactly what these statistics are measuring and, most importantly, what they are not measuring.

    Opportunity? The digital revolution has dramatically changed the landscape of scholarly publishing. Researchers have at their fingertips access to millions of papers that are linked to millions of other papers through citations and hyperlinks. It is less an issue of getting the papers (I can download 10,000 papers a day to my desktop) and much more an issue of finding the papers one should be reading but are not. The science of science metrics has a lot more to offer than just developing better ways to evaluate the scholarly work. There is great opportunity for this burgeoning field to improve the way scientists search and navigate the scholarly literature.

    Report this comment

  12. #9890
    Date:
    2010-03-26 11:44 PM
    Daniel Mietchen said:

    Vital change: Doing science in the open throughout the research cycle rather than publishing in one big step.

    Overemphasized: Peer review as well as publications and putative future impact as criteria for funding decisions .

    Underemphasized: Actually reading and understanding, as pointed out before. I agree with previous commenters that this is tedious to impractical in the current system, but if our research output were reduced to its essentials instead and properly contextualized , reading up on a researcher's previous work would actually much less of a burden than it does now.

    Best practice: Karma systems as used for reputation management at Stackoverflow and similar platforms can serve as a model for managing scientific reputation on the web. What is lacking to make it effective is linking it to an Author ID scheme like ORCID and making it interoperable between different sites to which researchers contribute in different ways (e.g. publishers, databases, blogs, wikis, funders, outreach).

    What more:

    Report this comment

  13. #9900
    Date:
    2010-03-27 08:46 PM
    Damir Ibrisimovic said:

    Dear all,

    I am rather interested in innovation diffusion across disciplines. This would encourage interdisciplinary approach and provide a more realistic criterion about relevance of a scientific work.

    Such criterion could also follow a work over a longer period. And a statistical method could predict a possible future impact of the work based upon immediate citations in other disciplines. Furthermore, the network of citations across disciplines could be followed to further asses an impact.

    Personally, I am growing increasingly concerned about relevance of research in quantum and theoretical physics. Despite huge funds allocated for research in these two disciplines, results seem to be entirely irrelevant in biology, for example.

    I would appreciate a discussion that would take such a criterion into account.

    Kind regards,

    Report this comment

  14. #9914
    Date:
    2010-03-29 09:52 AM
    Maxine Clarke said:

    There's a good analysis by Martin Fenner at Nature Network, in which he adds some more points for discussion.

    Report this comment

  15. #9918
    Date:
    2010-03-30 07:57 AM
    Bjoern Brembs said:

    Wouldn't it be nice if metrics weren't needed? However, despite all the justified objections to bibliometrics, unless we do something drastic to reduce research output to an amount manageable in the traditional way, we will not have any other choice than to use them.

    However, as the commenters before already mentioned, no matter how complex and sophisticated, any system is liable to gaming. Therefore, even in an ideal world where we had the most comprehensive and advanced system for reputation building and automated assessment of the huge scientific enterprise in all its diversity, wouldn't the evolutionary dynamics engaged by the selection pressures within such systems demand that we keep randomly shuffling the weights and rules of these future metrics faster then the population can adapt?

    Report this comment

  16. #9919
    Date:
    2010-03-30 09:17 AM
    Daniel Corcos said:

    Another problem, which has not been discussed here, is the way flaws in science metrics are appreciated.

    For instance, there is nothing worst than rules that don?t apply to everyone. In France, where conflicts of interest are frequent, science metrics can be used whenever it is not detrimental to the evaluator, but as soon as they annoy him, their flaws are considered.

    Another critical issue is efficiency. It seems obvious that the output should be divided by the number of people working on a project and by the amount of money that has been used. But this is usually not done : there is no real author ? sharing ? ; and the amount of money used does not remain on the publication list.

    In France, as I wrote above, science metrics is just there, through this misuse, to justify an unfair balance of power.

    Report this comment

  17. #9935
    Date:
    2010-03-30 04:04 PM
    Martin Fenner said:

    Sabine Hossenfelder also wrote a blog post about this opinion piece: Against Measure.

    In my blog post, I added two more reasons why we should become better in using science metrics:

    Another important motivation for improving science metrics, and not mentioned in the article, is to reduce the burden on researchers and administrators in evaluating research. The proportion of time spent doing research vs. time spent applying for funding, submitting manuscripts, filling out evaluation forms, doing peer review, etc. has become ridiculous for many active scientists.

    Science metrics are not only important for evaluating scientific output, they are also great discovery tools, and this may indeed be their more important use. Traditional ways of discovering science (e.g. keyword searches in bibliographic databases) are increasingly superseded by non-traditional approaches that use social networking tools for awareness, evaluations and popularity measurements of research findings.

    Report this comment

  18. #10002
    Date:
    2010-04-05 04:02 AM
    David Roberts said:

    Bibliometrics are measuring only a portion of the output of scientific endeavour. Konrad Hinsen was absolutely spot on when he highlighted scientific software getting credit only for the paper in which it is described, but not for the continued maintenance and development. We need mechanisms to include the publication of data in an open format. Fundamentally, in addition to accurate author identification, we should be assessing utility of that which is published. That is, after all, the basis of the citation index. A metric that recorded the utility of data made broadly available would hopefully make more data available in fields beyond molecular biology and physics.

    Report this comment

  19. #10140
    Date:
    2010-04-16 01:11 PM
    Franz Barjak said:

    Thanks, Julia, for triggering this discussion. I can agree with most of the points that you raise and hope that the efforts of measuring science are increased on all fronts (including not at last the exploration and definition of what should be measured, the production of data, and the standardization across institutional and national boundaries). To those who doubt the value of science metrics like Sabine Hossenfelder I would say: maybe the value for scientists is not obvious at first sight – though some points have been added on this which I don't need to repeat – but metrics are certainly indispensable for universities, funding bodies and science policy makers in their tasks of monitoring the quantity and quality of scientific work. And we want them to do as good a job as possible and weed out bad science to reserve resources for good science and promising efforts?
    I'd like to give the ball back to you, Julia, and stress: organisations like NSF should dedicate more resources to research on measuring science. A virtual international collaboratory is a nice idea, but, as you yourself stress, more basic research is needed and this won't be done without funding. It could also be a good idea, to pull in international organizations like the OECD and ask them to contribute to standardising activities and formulating guidelines on what should be measured and how it should be done.

    Report this comment

  20. #10143
    Date:
    2010-04-16 08:59 PM
    Stevan Harnad said:

    Harnad, S. (2008) Validating Research Performance Metrics Against Peer Rankings. Ethics in Science and Environmental Politics 8 (11) The Use And Misuse Of Bibliometric Indices In Evaluating Scholarly Performance http://eprints.ecs.soton.ac.uk/15619/

    A rich and diverse set of potential bibliometric and scientometric predictors of research performance quality and importance are emerging today, from the classic metrics (publication counts, journal impact factors and individual article/author citation counts) to promising new online metrics such as download counts, hub/authority scores and growth/decay chronometrics. In and of themselves, however, metrics are circular: They need to be jointly tested and validated against what it is that they purport to measure and predict, with each metric weighted according to its contribution to their joint predictive power. The natural criterion against which to validate metrics is expert evaluation by peers, and a unique opportunity to do this is offered by the 2008 UK Research Assessment Exercise, in which a full spectrum of metrics can be jointly tested, field by field, against peer rankings.

    Report this comment

  21. #10226
    Date:
    2010-04-23 09:17 AM
    Filippo Menczer said:

    Julia Lane is right: we need research on good metrics based on sound science (and as others have remarked, funding agencies should support this research).

    The objections based on 'quality vs quantity' arguments are missing the point: we are well beyond simply counting; the point is to find effective ways to measure (quantify) quality. Of course this is not easy, but we are making progress. For example we already have metrics that adjust for effects of coauthorship (mentioned in some comments). And the fact that any metric can be gamed should not be a reason to stop improving them (as long as they are useful to someone). We don't stop using money just because there is counterfeit, and we do not stop using Google because some spammers can game Pagerank --- and of course search engines should not stop improving their ranking algorithms to make them more robust to abuse.

    Disambiguation, mentioned by Lane and others, is one of the challenges. An initiative like ORCID will be greatly helpful, if it can reach wide adoption. In the meanwhile we need to continue working on machine learning methods as well.

    One interesting development in scholarly metrics is the emergence of universal metrics such as proposed by Radicchi & al. (doi:10.1073/pnas.0806977105 ). These require sound statistics by disciplines and years to properly normalize indices, leading to measures that allow to compare impact across disciplinary boundaries --- apples to apples.

    As part of the Scholarometer project we are developing an infrastructure based on crowdsourcing, leveraging authors' contribution to collect these statistics and provide universal (and other) impact metrics. As Lane suggests, such an infrastructure should be open, and in fact we are working on APIs to make the data publicly available. We are also working with a consortium that includes BibKN.org at Berkeley, JSTOR, CiteSeerX at PSU, the MESUR project (mentioned above) here at Indiana University, and other academic partners, seeking funding to develop a federation of Web services to facilitate open sharing of scholarly work and open evaluation of its impact.

    Report this comment

  22. #10285
    Date:
    2010-04-29 02:40 AM
    Magnus Johnson said:

    One of the most irritating features of the REF is the tendency for universities to try and play the game and invest lots of time planning a strategy for submission. They try to second guess the review process. This means that someone that might be submitted in one university may not be in another. Because of the drive for concentrations of expertise, an academic that does research (and teaches) in non-core area, may not be submitted. This is dangerous if we wish to have the capacity to deliver good quality teaching across the range of topics necessary for a general science based degree.

    Currently all sorts of nonsense goes on, e.g. good quality young researchers are not submitted to the REF because it makes more sense for the university to submit more senior co-authors. A member of one department may be submitted in another because on balance his income makes up what a paper rich but income poor department lacks. This hardly seems to make for a national concensus on what makes for quality research. Being in the REF is important for your career. If you are in it you can make noises about needing more time/space/resources in order to, for the benefit of your institution, remain in it next time around. If you are not in it your research is pointless (or "hobby science") and you are likely to be demoted to the status of teacher/administrator, patronised with the idea that your teaching supports those that are research active.

    The REF should involve the submisson of all academics in an institution and it should be the research that comes out of the institution rather than the individual that is valued. This would encourage a more supportive environment and perhaps get rid of the expensive premier league type movements of top researchers between universities. When a top academic leaves a small department in return for perks and prestige it leaves a gaping hole in leadership, teaching and research that disadvantages colleagues and students for several years. Such disruption makes long-term planning difficult for administrators and is a waste of taxpayers money.

    The assessment of institutions should be bibliometric and should steer away from vague assessments of prestige and the irrelvance of finance (basic field ecology is cheap but it can be just as interesting as expensive hydrothermal vent ecology or particle physics). All of an institutions publications from, e.g. the last 20 years, should be assessed rather than a snapshot and the value of publications should be based on voluntary and broad ranging peer assessment facilitated by IT rather than a select few.

    Report this comment

  23. #10458
    Date:
    2010-05-08 05:07 PM
    Paul Valckenaers said:

    We have to acknowledge that our metrics never will be perfect. Actually, they always will remain very far from perfect. In that respect, any differentiating operation on those measurements (e.g. ranking) truly qualifies as a beginner's mistake (as it amplifies the measuring errors and any shortcomings in the calibration), especially if it is a "close call". The fact that it is the best we can do is not a valid excuse. Therefore, it is more important and urgent to upgrade the manners in which we use our measurements, whatever the (improved) metric. If we start looking at this problem, we will have to conclude that metrics and measurements may enable us to reduce decision spaces and provide some indications. However, there rarely will be sufficient information to make final choices on assigning funding, positions, promotions, etc. The only objective selection procedure is a lottery that reflects the available information (i.e. participants may have different chances of winning). Anything else actually is well-camouflaged manipulation of which the participants probably are unaware themselves. In other words, it is "politics" backed by some plausible-sounding rationale. Much more important, from a tax payer perspective, is that the current way of trying to ensure a large positive impact from R&D is "penny-wise, pound-foolish". The big opportunities, that might be lost if we fail to manage R&D effectively, require entire research communities to adapt. It requires these research communities as whole to cover the search space well (hence lotteries are highly desirable). It requires that we are alert to support whoever is so lucky to stumble over something exceptional. Our current ways of managing research actually counteract this. It took our society about 1800 years to replace the flawed views of Aristotle on physics with Newton's laws. Our current ways are favoring such lock-in situations. Our policies are designed to pick fruits that hang higher and higher, not to discover new territories with low-hanging fruit. Fortunately, society has options to maximize the positive impact of R&D beyond (our poor) attempts to measure the output. It can verify and assure that the starting conditions are favorable and that there is an environment that keeps researchers productive. Society does not need to worry about fairness amongst researchers or research organizations (i.e. rights to funding, positions). Society only has an interest in attracting the right kind of talent and to keep it motivated and productive. It starts by noticing the almost every researcher wants to make a positive difference (and the less-confident need their peers to acknowledge that). R&D policies therefore do not need to enforce/verify/... that. It suffices to create an environment in which this drive and motivation is not lost because of frustration... The unavoidable lotteries have thus to attract talent. This implies that we need well-designed lotteries: (1) the prize is right and allows a researcher or research organization to "finish the job at hand" which implies 'tenure' and alike to be quite commonplace, (2) the tickets are cheap enough for the losers not to be frustrated (e.g. if one fails to get tenure, the labor market will welcome him or her) and (3) if they need to be more expensive there must be adequate consolation prizes (e.g. substantial outplacement support). The current way of competitive interaction needs to be transformed in a cooperative one. R&D policy implementations need to produce added value, not be overhead that needs to be compensated by better R&D performance before society has a net benefit. This is only a beginning of rethinking the issues. What is important is to realize that more efforts to measure the output (in time for usage in a policy) will not contribute much, not in comparison to what can be done at the start and in the value-creating phases. Society should realize that rivalry and feeling of unfair treatment amongst researchers or organizations are separate issues, and distractions when it comes to maximizing progress thanks to R&D. There are better ways, more productive, with less overhead, and more humane than the narrow-minded feedback control on the measurements of past output that is used today.

    Report this comment

Sorry, post comment service is unavailable now due to some technical problem. Please try again later.