Deposition of raw data into publicly available databases — now a condition of publication in many journals (Nature 537, 138; 2016) — needs to involve more than just another checkbox for the senior author. Before accepting a manuscript, journals should verify that the data will be immediately useable after publication.
Our group frequently uses published next-generation sequencing data for cancer genomics studies. We are often forced to spend months going back and forth with the original authors, for example tracking down corrupted files, mislabelled samples and missing data. We have yet to find any instances of malicious intent, and in all cases the study authors devoted considerable time to helping us to sort out the errors. However, these delays could have been avoided had the mix-ups been caught before their papers were published.
Such intervention would ensure that raw data are complete and accurate when deposited, and that sufficient detail is available in the paper to identify and link raw data back to individual samples or experiments. The data sets would then serve as useful, high-quality, interpretable resources for future researchers.
About this article
Cite this article
Greenwald, N., Bandopadhayay, P. & Beroukhim, R. Spot data glitches before publication. Nature 550, 333 (2017). https://doi.org/10.1038/550333c