Scientific misconduct comes in many forms. Fabrication lies at one extreme, but plagiarism and 'citation amnesia' are more common. Some have come to question the peer review system, especially following the spectacular cases of Hendrik Schön and Scott Reubens. Schön was a Bell Labs researcher whose organic field-effect transistors exhibited the fractional quantum Hall effect, superconductivity, lasing, you name it. That he didn't keep a lab book or any raw data during his PhD would already constitute bad practice, but then he went on to actually fabricate data. In 2002, a committee found him guilty of scientific misconduct on 16 out of 24 allegations, and at least 21 of his published papers have since been retracted (a new book chronicling Schön's rise and fall is reviewed on p451 of this issue). Reuben's case came to light in March 2009, when 21 of his papers containing faked data were retracted from anaesthesiology journals. Millions of patients have been treated according to his studies of combinations of drugs for pain relief. In many cases, the patients in his clinical trials were made up.

Following each occurrence, the scientific community has been left wondering how this scale of fraudulent research has escaped detection for so long. In his 1974 Commencement speech at Caltech, Richard Feynman said, “we've learned from experience that the truth will out. Other experimenters will repeat your experiment and find out whether you were wrong or right. Nature's phenomena will agree or they'll disagree with your theory. And although you may gain some temporary fame and excitement, you will not gain a good reputation as a scientist if you haven't tried to be very careful in this kind of work. And it's this type of integrity, this kind of care not to fool yourself, that is missing to a large extent in much of the research in Cargo Cult Science” [his term for bad science or pseudoscience]. His point was that we must be 100% honest in science, even to publish all the shortcomings as well as successes of a particular study. Thirty-five years later, scientific integrity is still not taught explicitly, but is something that it is hoped students will absorb along the way.

Mostly, scientists can be counted on to be honest. Peer review would not work otherwise, and it does work. That is why scientific fraud can escape detection for a short time, because the system is built on trust. When Schön's work couldn't be reproduced by other groups, researchers thought they were not good enough or were missing a key ingredient. Non-believers were written off as jealous. In the end, it took a whistleblower to start a chain reaction leading to the retractions.

Even when confronted with suspicious results, scientists tend not to want to be whistleblowers. In many cases, the allegations do not lead to a formal enquiry, the accused goes free and the whistleblower is censured. In any case, scientific fraud involving fabricated data is rare and will always be difficult to catch. A lot more prevalent, however, is 'cut and paste' science.

This is where Déjà vu comes in (http://spore.swmed.edu/dejavu/). Déjà vu is based on the text-similarity software eTBLAST. When used on the Medline database, eTBLAST flagged up 74,790 pairs of papers similar in content or language. Following manual inspection, 2,125 have been labelled as duplicates, 1,697 as sanctioned, 1,498 as distinct, but the majority remain unverified. For two papers to be considered duplicates, they must share 85% of their text. Given the number of review articles and conference proceedings, the number of duplicates is not surprising — although it is surprising that most of the duplications by the same authors are usually published within five months of each other, which means that they were probably submitted to different journals at the roughly the same time — but 228 of the duplicates are from different authors, which suggests plagiarism. These cases are reported to the authors and journal editors.

Software has its limitations, so the Déjà vu team encourages authors to get in touch. It's also possible to report a duplicate citation to be added to the website. For publishers, CrossCheck (http://www.crossref.org/crosscheck.html) is available for checking submissions against 20 million publications, and is used by the Nature Publishing Group. Hopefully, this kind of publication policing will feed into improved scientific practice — because it is only a matter of time before fraudsters are caught.