Main

The lack of raw data sets associated with proteomics and molecular-interaction papers is a long-standing and pernicious problem. It not only stymies the exchange, comparison and reanalysis of experimental results, but also inhibits the development of new algorithms and statistics that could improve the confidence in data and conclusions. In addition, it undermines the ability of referees to fully evaluate the quality of data supporting a manuscript's conclusions, sometimes forcing them to assess results simply on 'good faith'. Contrast this with the situation in genome research and structural biology, where there is an abundance of public data sets from DNA microarrays, genome sequencing and X-ray crystallography studies, and it is not difficult to understand why progress in proteomics has lagged.

Part of the problem has been that high-throughput protein analysis technologies like mass spectrometry are still relatively young, and the raw data output from instruments is not represented in standardized formats. What's more, protein mass spectrometrists have been slow to distribute their data to the wider community—a puzzling phenomenon given the wide availability of mass spectra for chemicals and drugs. But perhaps the single most important roadblock has been the chronic lack of public repositories for proteomics and molecular-interaction data.

This has begun to change, however, with the advent of the International Molecular Exchange (IMEx) consortium (http://imex.sourceforge.net/) and databases such as the European Bioinformatics Institute's PRIDE (http://www.ebi.ac.uk/pride) and IntAct (http://www.ebi.ac.uk/intact/), the Seattle-based Institute for Systems Biology's PeptideAtlas (http://www.peptideatlas.org/), the University of Michigan's Tranche (http://www.proteomecommons.org/dev/dfs/users/index.html) and the Rockefeller/University of British Columbia's GPMDB (http://www.thegpm.org/GPMDB/index.html). For the moment we prefer PRIDE and IMEx databases (IntAct, DIP, MINT) because they not only are true databases with complex interfaces and accession numbers, but also offer a mechanism for referees to anonymously review submitted data sets.

Our goal in encouraging data submission to public repositories is to enhance the utility, reproducibility and dissemination of the research published in our pages. It is worth reiterating that publication of a paper includes an obligation on the part of authors to make sufficient data publicly available for an experiment to be reproduced. Public accessibility of results is also consistent with the missions of funding agencies.

Although our new policy on data deposition is a recommendation rather than a requirement, we strongly urge authors to comply for the reasons enumerated above. We intend to monitor the results of this initiative with a view to assessing the future feasibility of requiring data deposition as a condition of publication.