Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • NEWS

Major chemical database investigates hundreds of suspicious crystal structures

Exterior of the Cambridge Crystallographic Data Centre building.

The Cambridge Crystallographic Data Centre says that 992 entries in its database are under investigation.Credit: Patrick McCabe/Alamy

The Cambridge Crystallographic Data Centre (CCDC), a go-to resource for chemists seeking information on crystal structures, is reviewing almost 1,000 database entries after a research-integrity sleuth flagged the underlying scientific papers as potentially coming from paper mills — businesses that sell fake scientific papers to researchers who need them for their CVs.

The CCDC’s database has never before seen such a large number of entries flagged as suspicious. Scientists who use it as part of their day-to-day research say they are shocked by the scale of the alleged fraud.

“It creates the possibility that people are wasting their time looking at materials that have never been made,” says Randall Snurr, a chemical engineer at Northwestern University in Evanston, Illinois. He is surprised that such a large number of papers had slipped through the system.

The CCDC says that 992 entries are potentially affected, but that these represent a “very small amount of the total”. It is unusual that multiple investigations into the underlying research are happening at the same time, says Sophie Bryant, marketing manager at the CCDC in Cambridge, UK.

Crystal collection

The CCDC has been collating data on the crystal structures of small organic and metal–organic molecules since 1965, and currently lists more than one million structures. Its subscription-based database is accessible online and through a desktop app, and is an important resource for chemists and biochemists, who use it to study the bonds and geometry of structures and molecular interactions. Many journals in the field of crystallography require researchers to deposit their structural data with the CCDC.

The Cambridge Structural Database does retract entries from time to time, when individual papers get retracted from the literature. In 2010, it retracted 70 entries because of falsified data. But fewer than 300 structures have ever been retracted during its lifetime.

The latest expressions of concern were prompted by a preprint on the Research Square repository that flagged more than 800 questionable papers published in crystallography and exotic-chemistry journals between 2015 and 20221. Many of the papers propose medical applications for metal–organic frameworks, a class of sponge-like materials that comprise both metal ions and organic molecules. The author of the preprint, retired psychology researcher David Bimler, noted that, in these papers, images and spectra that claim to characterize organic or metal–organic structures have been repeated. The papers also bear the hallmarks of having been produced by a paper mill, including recycled and irrelevant references, suspicious e-mail addresses and strange turns of phrase that appear repeatedly in the methods section of apparently unrelated papers.

CCDC staff members do tests to scrutinize the submitted data and hand check each entry. Some were already suspicious of a handful of structures on Bimler’s list before the preprint was posted. When they saw his analysis, they launched an investigation. This involves re-checking all the flagged structures, including tests to identify unusual bond lengths and angles, and searching for evidence that the structures or underlying data could be based on existing database entries.

So far, the CCDC has issued expressions of concern for the 992 entries implicated in the preprint and has removed 12 structures that were described in 9 papers that have been retracted. Because the investigation is still ongoing, 277 of the flagged structures were omitted from the latest desktop data update in mid-June. However, these structures are still available in the online database. If publishers decide to retract a paper, the data will also be retracted. “We mirror the literature,” says Bryant.

Ongoing investigations

Affected journals are also investigating the preprint’s allegations. Chris Graf, director of research integrity at Springer Nature, says that it is investigating the concerns in 157 papers published in at least 5 of its journals, but that it is too early to draw any conclusions. “Should these concerns turn out to be well founded, they would very much support the need for the publishing industry to work collaboratively to address the issue of paper mills,” Graf says. (Nature’s news team is editorially independent of Springer Nature, its publisher.)

Publisher Wiley says that it has already retracted two articles from the Journal of the Chinese Chemical Society, both of which were listed in Bimler’s preprint. It is investigating a further 50 articles published in at least 15 journals — more than the 25 papers that were flagged in the preprint. Elsevier, which published 88 of the papers in at least 4 journals, says that it is investigating and will report its findings in due course. A spokesperson for Taylor and Francis, which published 204 of the papers in at least 2 journals, says that it is actively investigating a large number of articles in these journals. “Our investigation originated with an internal audit we ran in 2021 and was expanded following concerns raised to us by researchers,” the spokesperson says.

“This is probably a wake-up call,” says Suzanna Ward, head of the CCDC’s database. “We’re lucky in crystallography that there is a standard file format that’s universally used to publish data. It’s not like the data is buried in PDFs.”

Chemist Filipe Almeida Paz at the University of Aveiro in Portugal is shocked by the situation. “It’s not in our DNA as scientists to try and deceive others,” he says. Researchers use the CCDC’s database to inform drug discovery, he adds, and incorrect data will ultimately waste time, so it is important that the database is not “contaminated with wrong information”, even if only a small proportion of structures are affected.

Jon Clardy, a biological chemist at Harvard Medical School in Boston, Massachusetts, says that the potentially problematic data make up a small proportion of the database. “I’m not too worried that it will undermine confidence in the CCDC.” He adds that the paper mill has been “extraordinarily clever” to combine metal–organic frameworks with medical applications such as cancer immunotherapy, because the chances that people have studied both topics in depth are slim.

The CCDC is now looking at whether its processes need to change. Discussions are continuing about developing more automated screening to help scientists on the CCDC’s integrity team to identify and prioritize what to look at more closely, says Ward.

Nature 608, 461 (2022)

doi: https://doi.org/10.1038/d41586-022-02100-4

References

  1. Bimler, D. Preprint at Research Square https://doi.org/10.21203/rs.3.rs-1537438/v1 (2022).

Download references

Subjects

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing

Search

Quick links