Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Brief Communication
  • Published:

Universal Spectrum Identifier for mass spectra

Abstract

Mass spectra provide the ultimate evidence to support the findings of mass spectrometry proteomics studies in publications, and it is therefore crucial to be able to trace the conclusions back to the spectra. The Universal Spectrum Identifier (USI) provides a standardized mechanism for encoding a virtual path to any mass spectrum contained in datasets deposited to public proteomics repositories. USI enables greater transparency of spectral evidence, with more than 1 billion USI identifications from over 3 billion spectra already available through ProteomeXchange repositories.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Design and example for the USI.
Fig. 2: Graphical depiction of USI application ecosystem.

Similar content being viewed by others

Data availability

No datasets were generated or analyzed during the current study.

Code availability

No custom code was used to analyze data, but code used to fetch spectra using USIs is publicly available in GitHub (https://github.com/proteomexchange/proteomecentral).

References

  1. Deutsch, E. W. et al. The ProteomeXchange consortium in 2020: enabling ‘big data’ approaches in proteomics. Nucleic Acids Res. 48, D1145–D1152 (2020).

    CAS  PubMed  Google Scholar 

  2. Wilkinson, M. D. et al. The FAIR guiding principles for scientific data management and stewardship. Sci. Data 3, 160018 (2016).

    Article  Google Scholar 

  3. Ezkurdia, I., Vázquez, J., Valencia, A. & Tress, M. Analyzing the first drafts of the human proteome. J. Proteome Res. 13, 3854–3855 (2014).

    Article  CAS  Google Scholar 

  4. Mylonas, R. et al. Estimating the contribution of proteasomal spliced peptides to the HLA-I ligandome. Mol. Cell. Proteom. 17, 2347–2357 (2018).

    Article  CAS  Google Scholar 

  5. Wohlgemuth, G. et al. SPLASH, a hashed identifier for mass spectra. Nat. Biotechnol. 34, 1099–1101 (2016).

    Article  CAS  Google Scholar 

  6. Hanash, S. & Celis, J. E. The Human Proteome Organization: a mission to advance proteome knowledge. Mol. Cell. Proteom. 1, 413–414 (2002).

    Article  CAS  Google Scholar 

  7. Deutsch, E. W. et al. Development of data representation standards by the human proteome organization proteomics standards initiative. J. Am. Med. Inform. Assoc. JAMIA 22, 495–506 (2015).

    Article  Google Scholar 

  8. Wimalaratne, S. M. et al. Uniform resolution of compact identifiers for biomedical data. Sci. Data 5, 180029 (2018).

    Article  Google Scholar 

  9. Deutsch, E. W. et al. The ProteomeXchange consortium in 2017: supporting the cultural change in proteomics public data deposition. Nucleic Acids Res. 45, D1100–D1106 (2017).

    Article  CAS  Google Scholar 

  10. Martens, L. et al. mzML–a community standard for mass spectrometry data. Mol. Cell. Proteom. 10, R110.000133 (2011).

    Article  Google Scholar 

  11. Perez-Riverol, Y. et al. The PRIDE database and related tools and resources in 2019: improving support for quantification data. Nucleic Acids Res. 47, D442–D450 (2019).

    Article  CAS  Google Scholar 

  12. Desiere, F. et al. The PeptideAtlas project. Nucleic Acids Res. 34, D655–D658 (2006).

    Article  CAS  Google Scholar 

  13. Wang, M. et al. Assembling the community-scale discoverable human proteome. Cell Syst. 7, 412–421.e5 (2018).

    Article  CAS  Google Scholar 

  14. Moriya, Y. et al. The jPOST environment: an integrated proteomics data repository and database. Nucleic Acids Res. 47, D1218–D1224 (2019).

    Article  Google Scholar 

  15. Ma, J. et al. iProX: an integrated proteome resource. Nucleic Acids Res. 47, D1211–D1217 (2019).

    Article  Google Scholar 

  16. Omenn, G. S. et al. Research on the human proteome reaches a major milestone: >90% of predicted human proteins now credibly detected, according to the hupo human proteome project. J. Proteome Res. 19, 4735–4746 (2020).

  17. Deutsch, E. W. et al. Human proteome project mass spectrometry data interpretation guidelines 3.0. J. Proteome Res. 18, 4108–4116 (2019).

    Article  Google Scholar 

  18. Huttlin, E. L. et al. The BioPlex Network: a systematic exploration of the human interactome. Cell 162, 425–440 (2015).

    Article  CAS  Google Scholar 

  19. Pullman, B. S., Wertz, J., Carver, J. & Bandeira, N. ProteinExplorer: a repository-scale resource for exploration of protein detection in public mass spectrometry data sets. J. Proteome Res. 17, 4227–4234 (2018).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

This work was funded in part by the National Institutes of Health grants R01GM087221, R24GM127667, 1R01LM013115 and P41GM103484, and National Science Foundation grants 1933311, 1922871 and ABI 1759980. J.A.V. acknowledges Wellcome Trust grant 208391/Z/17/Z, BBSRC grants BB/S01781X/1 and BB/P024599/1, and the partnering grants BB/N022440/1 and BB/N022432/1. S.K. acknowledges the Database Integration Coordination Program from the National Bioscience Database Center, Japan Science and Technology Agency (grant 18063028) and the Japan Society for the Promotion of Science KAKENHI (grant JP20H03245). We also acknowledge the Research Foundation–Flanders (SB grant 1S90918N to T.V.D.B.; SB grant 1S50918N to R.G.; and postdoctoral grant 12W0418N to W.B.).

Author information

Authors and Affiliations

Authors

Contributions

All authors designed the standard and contributed to the writing of the PSI specification and to the writing of the article. Implementations of USI were created at ProteomeCentral by L.M. and E.W.D., at PRIDE by Y.P.-R. and J.A.V., at MassIVE by J.C., B.P. and N.B., at jPOST by S.K., at PeptideAtlas by E.W.D, Z.S. and L.M. and at iProX by Y.Z.

Corresponding authors

Correspondence to Eric W. Deutsch or Nuno Bandeira.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Additional information

Peer review information Nature Methods thanks Peter Horvatovich, Matthias Trost and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Arunima Singh was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1

Example use cases for Universal Spectrum Identifiers (USIs), providing a set of 13 example USIs along with a brief comment on each. These same 13 USIs can be easily viewed as the ‘example USIs’ select list at http://proteomecentral.proteomexchange.org/usi. (Ref. 16 in Case 1 is Kim, M.-S. et al. A draft map of the human proteome. Nature 509, 575–581 (2014).) Case 2 shows examples of a single spectrum from a CPTAC CompRef dataset with various supported types of mass modification designations. (Ref. 17 in Case 2 is Zhou, J.-Y. et al. Quality assessments of long-term quantitative proteomic analysis of breast cancer xenograft tissues. J. Proteome Res. 16, 4523–4530(2017).) Example 4c in Box 1 provides the USI for the demonstrated correct PSM of an ordinary UniProtKB protein Q9UQ35 from Mylonas et al.4 Fig. 2b (example 4d is the corresponding synthetic peptide spectrum). Example 4a in Box 1 provides the USI for the same spectrum as example 4c, but annotated with the previously, incorrectly reported HLA (Human Leukocyte Antigen) peptide as described in Mylonas et al. Figure 2a. The non-matching synthetic peptide spectrum for the incorrect sequence is given as Box 1 as example 4b. The Human Proteome Project16 (HPP) has set a high bar for data quality and evidence in support of its goal to provide high-stringency detections for all human proteins. The latest version of its MS data interpretation guidelines 3.017 have set a requirement that key detection claims of proteins not previously seen via MS must be accompanied by USIs referencing the key spectra for each claim, so that the peptide-spectrum matches can be transparently inspected by the community to verify their veracity. For example, the BioPlex dataset18 was important for detecting novel proteins that had not been previously observed19 but it was crucial to consider the provenance of every single identification to exclude all files from experiments where the protein was intentionally overexpressed (as per the standard protocol for analysis of protein-protein interactions). Example 3a in Box 1 provides a PSM derived from a prey protein pulled down as a binding partner to bait protein C5orf38. Example 3b provides a PSM of the same peptide as above, but derived from a recombinant protein used as a bait. This PSM provides a much higher signal-to-noise ratio synthetic peptide reference spectrum as required by HPP guidelines. Illustrating this application of USIs at a community-wide scale, MassIVE further provides an extensive list of USIs for 1,296,916 MassIVE-KB entries in support of HPP Protein Existence (PE) classifications for 16,393 proteins (available at http://massive.ucsd.edu/hpp), including USIs for matching spectra of synthetic peptides (when available in public datasets); an abridged version of this table is also provided as Supplementary Table 1.

Supplementary information

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Deutsch, E.W., Perez-Riverol, Y., Carver, J. et al. Universal Spectrum Identifier for mass spectra. Nat Methods 18, 768–770 (2021). https://doi.org/10.1038/s41592-021-01184-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41592-021-01184-6

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing