Problems with anti-plagiarism database

Sophisticated tools have been developed to detect duplicate publication and plagiarism, as noted in M. Errani and H. Garner's Commentary 'A tale of two citations' (Nature 451, 397–399; 2008) and in your News story 'Entire-paper plagiarism caught by software' (Nature 455, 715; 2008). To my surprise, one of these tools, Déjà vu (, classifies four of our publications as unverified duplicates. These report the analysis of Bruton's tyrosine kinase mutations associated with the rare disease X-linked agammaglobulinaemia (XLA) and of the database BTKbase.

Each of these is a genuinely different and independent report; they cover the development of the database and different analyses of the growing data set. The reason why they are branded as suspect cases is probably that the journal Nucleic Acids Research, in which three of them were published, has a special format for articles in their annual database issue.

Between 1995 and 2006, we published eight articles on BTKbase. The number of XLA cases recorded in the database has grown from 118 to 1,111 during this period. Several colleagues who maintain databases are also listed in Déjà vu. It is worrying that such legitimate articles written by research infrastructure developers and providers are labelled as unethical, just because of some overlap with previous papers as a result of a journal's strict formatting requirement.

Detection of fraud, including duplications, is obviously crucial to the integrity of science. But it is unethical to list thousands of scientists in a public Internet service as suspects, without verifying the claims that are being made. Although the developers indicate that the data are provisional, there is still a risk that the listing will affect decisions on careers, promotions or research funding if individual cases are not investigated.

No professional scientist wants even the slightest suspicion of fraud to tarnish their scholarly reputation, so listed cases need to be closely investigated. To detect real duplicates, the full-length articles must be analysed, not just the abstracts — which occurred in the case of our publications.

  1. Institute of Medical Technology, FI-33014 University of Tampere, Finland

    • Mauno Vihinen


