Extended Data Fig. 1 | Nature Methods

Extended Data Fig. 1

From: Universal Spectrum Identifier for mass spectra

Extended Data Fig. 1

Example use cases for Universal Spectrum Identifiers (USIs), providing a set of 13 example USIs along with a brief comment on each. These same 13 USIs can be easily viewed as the ‘example USIs’ select list at http://proteomecentral.proteomexchange.org/usi. (Ref. 16 in Case 1 is Kim, M.-S. et al. A draft map of the human proteome. Nature 509, 575–581 (2014).) Case 2 shows examples of a single spectrum from a CPTAC CompRef dataset with various supported types of mass modification designations. (Ref. 17 in Case 2 is Zhou, J.-Y. et al. Quality assessments of long-term quantitative proteomic analysis of breast cancer xenograft tissues. J. Proteome Res. 16, 4523–4530(2017).) Example 4c in Box 1 provides the USI for the demonstrated correct PSM of an ordinary UniProtKB protein Q9UQ35 from Mylonas et al.4 Fig. 2b (example 4d is the corresponding synthetic peptide spectrum). Example 4a in Box 1 provides the USI for the same spectrum as example 4c, but annotated with the previously, incorrectly reported HLA (Human Leukocyte Antigen) peptide as described in Mylonas et al. Figure 2a. The non-matching synthetic peptide spectrum for the incorrect sequence is given as Box 1 as example 4b. The Human Proteome Project16 (HPP) has set a high bar for data quality and evidence in support of its goal to provide high-stringency detections for all human proteins. The latest version of its MS data interpretation guidelines 3.017 have set a requirement that key detection claims of proteins not previously seen via MS must be accompanied by USIs referencing the key spectra for each claim, so that the peptide-spectrum matches can be transparently inspected by the community to verify their veracity. For example, the BioPlex dataset18 was important for detecting novel proteins that had not been previously observed19 but it was crucial to consider the provenance of every single identification to exclude all files from experiments where the protein was intentionally overexpressed (as per the standard protocol for analysis of protein-protein interactions). Example 3a in Box 1 provides a PSM derived from a prey protein pulled down as a binding partner to bait protein C5orf38. Example 3b provides a PSM of the same peptide as above, but derived from a recombinant protein used as a bait. This PSM provides a much higher signal-to-noise ratio synthetic peptide reference spectrum as required by HPP guidelines. Illustrating this application of USIs at a community-wide scale, MassIVE further provides an extensive list of USIs for 1,296,916 MassIVE-KB entries in support of HPP Protein Existence (PE) classifications for 16,393 proteins (available at http://massive.ucsd.edu/hpp), including USIs for matching spectra of synthetic peptides (when available in public datasets); an abridged version of this table is also provided as Supplementary Table 1.

Back to article page