Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

A neural network for large-scale clustering of peptide mass spectra

Repository-scale analysis of hundreds of millions to billions of mass spectra is a challenging endeavor due to the complexity and volume of associated data. A deep neural network embedding method is presented that enables large-scale investigation of repeatedly observed yet consistently unidentified mass spectra.

This is a preview of subscription content

Access options

Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: GLEAMS deep neural network for clustering hundreds of millions of mass spectra.


  1. Perez-Riverol, Y. et al. The PRIDE database and related tools in 2019: improving support for quantification data. Nucleic Acids Res. 47, D442–D450 (2019). This paper describes the increase in publicly available proteomics data in the PRIDE database.

    CAS  Article  Google Scholar 

  2. Frank, A. M. et al. Clustering millions of tandem mass spectra. J. Proteome Res. 7, 113–122 (2008). This paper describes MS-Cluster, the first large-scale clustering algorithm for mass spectra.

    CAS  Article  Google Scholar 

  3. Griss, J. et al. Recognizing millions of consistently unidentified spectra across hundreds of shotgun proteomics datasets. Nat. Methods 13, 651–656 (2016). This paper describes a commonly used spectral clustering algorithm.

    CAS  Article  Google Scholar 

  4. Wang, M. et al. Assembling the community-scale discoverable human proteome. Cell Syst. 7, 412–421.e5 (2018). This paper describes the MassIVE-KB resource that provided training data for GLEAMS.

    CAS  Article  Google Scholar 

Download references

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This is a summary of: Bittremieux, W., May, D. H., Bilmes, J. & Noble, W. S. A learned embedding for efficient joint analysis of millions of mass spectra. Nat. Methods (2021).

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

A neural network for large-scale clustering of peptide mass spectra. Nat Methods 19, 658–659 (2022).

Download citation

  • Published:

  • Issue Date:

  • DOI:


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing