Abstract
Mass spectra provide the ultimate evidence to support the findings of mass spectrometry proteomics studies in publications, and it is therefore crucial to be able to trace the conclusions back to the spectra. The Universal Spectrum Identifier (USI) provides a standardized mechanism for encoding a virtual path to any mass spectrum contained in datasets deposited to public proteomics repositories. USI enables greater transparency of spectral evidence, with more than 1 billion USI identifications from over 3 billion spectra already available through ProteomeXchange repositories.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
No datasets were generated or analyzed during the current study.
Code availability
No custom code was used to analyze data, but code used to fetch spectra using USIs is publicly available in GitHub (https://github.com/proteomexchange/proteomecentral).
References
Deutsch, E. W. et al. The ProteomeXchange consortium in 2020: enabling ‘big data’ approaches in proteomics. Nucleic Acids Res. 48, D1145–D1152 (2020).
Wilkinson, M. D. et al. The FAIR guiding principles for scientific data management and stewardship. Sci. Data 3, 160018 (2016).
Ezkurdia, I., Vázquez, J., Valencia, A. & Tress, M. Analyzing the first drafts of the human proteome. J. Proteome Res. 13, 3854–3855 (2014).
Mylonas, R. et al. Estimating the contribution of proteasomal spliced peptides to the HLA-I ligandome. Mol. Cell. Proteom. 17, 2347–2357 (2018).
Wohlgemuth, G. et al. SPLASH, a hashed identifier for mass spectra. Nat. Biotechnol. 34, 1099–1101 (2016).
Hanash, S. & Celis, J. E. The Human Proteome Organization: a mission to advance proteome knowledge. Mol. Cell. Proteom. 1, 413–414 (2002).
Deutsch, E. W. et al. Development of data representation standards by the human proteome organization proteomics standards initiative. J. Am. Med. Inform. Assoc. JAMIA 22, 495–506 (2015).
Wimalaratne, S. M. et al. Uniform resolution of compact identifiers for biomedical data. Sci. Data 5, 180029 (2018).
Deutsch, E. W. et al. The ProteomeXchange consortium in 2017: supporting the cultural change in proteomics public data deposition. Nucleic Acids Res. 45, D1100–D1106 (2017).
Martens, L. et al. mzML–a community standard for mass spectrometry data. Mol. Cell. Proteom. 10, R110.000133 (2011).
Perez-Riverol, Y. et al. The PRIDE database and related tools and resources in 2019: improving support for quantification data. Nucleic Acids Res. 47, D442–D450 (2019).
Desiere, F. et al. The PeptideAtlas project. Nucleic Acids Res. 34, D655–D658 (2006).
Wang, M. et al. Assembling the community-scale discoverable human proteome. Cell Syst. 7, 412–421.e5 (2018).
Moriya, Y. et al. The jPOST environment: an integrated proteomics data repository and database. Nucleic Acids Res. 47, D1218–D1224 (2019).
Ma, J. et al. iProX: an integrated proteome resource. Nucleic Acids Res. 47, D1211–D1217 (2019).
Omenn, G. S. et al. Research on the human proteome reaches a major milestone: >90% of predicted human proteins now credibly detected, according to the hupo human proteome project. J. Proteome Res. 19, 4735–4746 (2020).
Deutsch, E. W. et al. Human proteome project mass spectrometry data interpretation guidelines 3.0. J. Proteome Res. 18, 4108–4116 (2019).
Huttlin, E. L. et al. The BioPlex Network: a systematic exploration of the human interactome. Cell 162, 425–440 (2015).
Pullman, B. S., Wertz, J., Carver, J. & Bandeira, N. ProteinExplorer: a repository-scale resource for exploration of protein detection in public mass spectrometry data sets. J. Proteome Res. 17, 4227–4234 (2018).
Acknowledgements
This work was funded in part by the National Institutes of Health grants R01GM087221, R24GM127667, 1R01LM013115 and P41GM103484, and National Science Foundation grants 1933311, 1922871 and ABI 1759980. J.A.V. acknowledges Wellcome Trust grant 208391/Z/17/Z, BBSRC grants BB/S01781X/1 and BB/P024599/1, and the partnering grants BB/N022440/1 and BB/N022432/1. S.K. acknowledges the Database Integration Coordination Program from the National Bioscience Database Center, Japan Science and Technology Agency (grant 18063028) and the Japan Society for the Promotion of Science KAKENHI (grant JP20H03245). We also acknowledge the Research Foundation–Flanders (SB grant 1S90918N to T.V.D.B.; SB grant 1S50918N to R.G.; and postdoctoral grant 12W0418N to W.B.).
Author information
Authors and Affiliations
Contributions
All authors designed the standard and contributed to the writing of the PSI specification and to the writing of the article. Implementations of USI were created at ProteomeCentral by L.M. and E.W.D., at PRIDE by Y.P.-R. and J.A.V., at MassIVE by J.C., B.P. and N.B., at jPOST by S.K., at PeptideAtlas by E.W.D, Z.S. and L.M. and at iProX by Y.Z.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Additional information
Peer review information Nature Methods thanks Peter Horvatovich, Matthias Trost and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Arunima Singh was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1
Example use cases for Universal Spectrum Identifiers (USIs), providing a set of 13 example USIs along with a brief comment on each. These same 13 USIs can be easily viewed as the ‘example USIs’ select list at http://proteomecentral.proteomexchange.org/usi. (Ref. 16 in Case 1 is Kim, M.-S. et al. A draft map of the human proteome. Nature 509, 575–581 (2014).) Case 2 shows examples of a single spectrum from a CPTAC CompRef dataset with various supported types of mass modification designations. (Ref. 17 in Case 2 is Zhou, J.-Y. et al. Quality assessments of long-term quantitative proteomic analysis of breast cancer xenograft tissues. J. Proteome Res. 16, 4523–4530(2017).) Example 4c in Box 1 provides the USI for the demonstrated correct PSM of an ordinary UniProtKB protein Q9UQ35 from Mylonas et al.4 Fig. 2b (example 4d is the corresponding synthetic peptide spectrum). Example 4a in Box 1 provides the USI for the same spectrum as example 4c, but annotated with the previously, incorrectly reported HLA (Human Leukocyte Antigen) peptide as described in Mylonas et al. Figure 2a. The non-matching synthetic peptide spectrum for the incorrect sequence is given as Box 1 as example 4b. The Human Proteome Project16 (HPP) has set a high bar for data quality and evidence in support of its goal to provide high-stringency detections for all human proteins. The latest version of its MS data interpretation guidelines 3.017 have set a requirement that key detection claims of proteins not previously seen via MS must be accompanied by USIs referencing the key spectra for each claim, so that the peptide-spectrum matches can be transparently inspected by the community to verify their veracity. For example, the BioPlex dataset18 was important for detecting novel proteins that had not been previously observed19 but it was crucial to consider the provenance of every single identification to exclude all files from experiments where the protein was intentionally overexpressed (as per the standard protocol for analysis of protein-protein interactions). Example 3a in Box 1 provides a PSM derived from a prey protein pulled down as a binding partner to bait protein C5orf38. Example 3b provides a PSM of the same peptide as above, but derived from a recombinant protein used as a bait. This PSM provides a much higher signal-to-noise ratio synthetic peptide reference spectrum as required by HPP guidelines. Illustrating this application of USIs at a community-wide scale, MassIVE further provides an extensive list of USIs for 1,296,916 MassIVE-KB entries in support of HPP Protein Existence (PE) classifications for 16,393 proteins (available at http://massive.ucsd.edu/hpp), including USIs for matching spectra of synthetic peptides (when available in public datasets); an abridged version of this table is also provided as Supplementary Table 1.
Supplementary information
Rights and permissions
About this article
Cite this article
Deutsch, E.W., Perez-Riverol, Y., Carver, J. et al. Universal Spectrum Identifier for mass spectra. Nat Methods 18, 768–770 (2021). https://doi.org/10.1038/s41592-021-01184-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41592-021-01184-6
This article is cited by
-
Reverse metabolomics for the discovery of chemical structures from humans
Nature (2024)
-
microbeMASST: a taxonomically informed mass spectrometry search tool for microbial metabolomics data
Nature Microbiology (2024)
-
PepQuery2 democratizes public MS proteomics data for rapid peptide searching
Nature Communications (2023)
-
Open access repository-scale propagated nearest neighbor suspect spectral library for untargeted metabolomics
Nature Communications (2023)
-
Artificial intelligence for natural product drug discovery
Nature Reviews Drug Discovery (2023)