Department of Genome Sciences, Department of Computer Science and Engineering, University of Washington, Health Sciences Center, Box 357730, 1705 NE Pacific Street, Seattle, Washington 98195, USA. noble@gs.washington.edu
To the editor:
A commentary in the April issue by Prince et al. (Nat. Biotechnol.22, 471−472, 2004) cites the difficulty of sharing data in the field of protein mass spectrometry. I am a relative newcomer to the field, having worked primarily on the analysis of DNA and protein sequences and microarray gene expression data. I was therefore surprised to learn that protein mass spectrometry lacks a public repository or even an agreed-upon standard for representing data.
However, I suspect that this lack of standardization is a symptom, rather than a cause, of the corresponding lack of publicly available protein mass spectrometry data. Microarray researchers were sharing data on the web in the form of tab-delimited text files and Excel spreadsheets long before the development of the data exchange formats and online data repositories. In contrast, protein mass spectrometry researchers to whom I have spoken nearly uniformly agree to share data only in the context of a collaboration.
This policy extends even to published data. I recently contacted the authors of an article published in Nature Biotechnology, asking to receive a copy of the mass spectra used in their study. I was told that the data set described in their paper is not yet available because they are using it for further studies.
I will not speculate about why the field of protein mass spectrometry is so competitive. But I am certain that the development of methods for analyzing biological sequences and microarray expression would have matured much more slowly without a culture of scientific openness. Scientists should freely post their published mass spectrometry data sets on the web, and funding agencies and journals should require such publication.