This is a preview of subscription content, access via your institution
Relevant articles
Open Access articles citing this article.
-
Activity-based metaproteomics driven discovery and enzymological characterization of potential α-galactosidases in the mouse gut microbiome
Communications Chemistry Open Access 16 August 2024
-
Critical Assessment of MetaProteome Investigation (CAMPI): a multi-laboratory comparison of established workflows
Nature Communications Open Access 15 December 2021
-
The RNA landscape of the human placenta in health and disease
Nature Communications Open Access 11 May 2021
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
References
Noble, W.S. Nat. Methods 12, 605–608 (2015).
Kim, S. & Pevzner, P.A. Nat. Commun. 5, 5277 (2014).
Elias, J.E. & Gygi, S.P. Nat. Methods 4, 207–214 (2007).
Bourgon, R., Gentleman, R. & Huber, W. Proc. Natl. Acad. Sci. USA 107, 9546–9551 (2010).
Acknowledgements
This research was supported by the Ghent University Multidisciplinary Research Partnership “Bioinformatics: from nucleotides to network,” VLAIO SBO grant “INSPECTOR” (120025) and the concerted Research Action BOF12/GOA/014, Ghent University.
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Integrated supplementary information
Supplementary Figure 1 Comparison of methods on the Pyrococcus furiosus example
Histograms of MS-GF+ scores (grey) with estimated number of correct PSMs (#target - #decoys, red) and incorrect PSMs (#decoys, blue), 1% FDR cutoff (dashed line) for the search-all-assess-all (all-all, panel a), the search-subset-assess-subset (sub-sub, panels b, d) and the search-all-assess-subset strategy (all- sub, panel c). The GO term “ATP binding” was used to generate the subset of interest. In panel (a) and (c), the spectra are searched against the complete Pyrococcus database and in panel (b) and (d) against an “ATP binding proteins” only database. Panel (a) shows PSM scores for all PSMs, panels (b) - (d) for the “ATP binding proteins” subset, only. The fraction of incorrect PSMs (π0, first mode in the target distribution) is lower in the complete Pyrococcus set (all-all, panel a) than in the “ATP binding proteins” subset (all-sub, panel c) indicating that the all-all FDR is too liberal. The 1% FDR cutoff in the subsub strategy (panel b) shifted to higher values and this leads to a decrease in the number of subset PSMs found compared to the all-all strategy and the all-sub strategy. It also shows that sub-sub forces many PSMs on incorrect subset PSMs (orange bars in panel d). Indeed 6110 (8617-2507) spectra matching to other Pyrococcus targets/decoys in the complete Pyrococcus search switch to an “ATPbinding” sequence in the subset-search. 1.4% of the sub-sub PSMs above the 1% FDR cutoff have switched peptides sequences (panel d orange) as compared to the complete search (all-all and all-sub strategies). They have much lower scores than in a complete search and are questionable at best (black and orange boxplot below histogram in panel d).
Supplementary Figure 2 Comparison of methods on the Plasmodium falciparum example (human-subset).
Histograms of MS-GF+ scores (grey) with estimated number of correct PSMs (#target - #decoys, red) and incorrect PSMs (#decoys, blue), 1% FDR cutoff (dashed line) for the search-all-assess-all (all-all) (a), the search-subset-assess-subset (sub-sub) (b, d) and the search-all-assess-subset strategy (all-sub) (c). In panel (a) and (c), the spectra are searched against a human + Plasmodium database (complete search) and in panel (b) and (d) against a human only database. Panel (a) shows PSM scores for both human and Plasmodium, panels (b) – (d) for the human subset, only. The fraction of incorrect PSMs (π0, first mode in the target distribution) is lower in the human + Plasmodium set (all-all, panel a) than in the human subset (all-sub, panel c) indicating that the all-all FDR is too liberal. The 1% FDR cutoff in the sub-sub strategy (panel b) shifted to higher values and this leads to a decrease in the number of subset PSMs found compared to the all-all and all-sub strategy. It also shows that many PSMs are forced on incorrect subset PSMs in the sub-sub strategy (huge first mode of the distribution). Indeed 13553 (30286-16733) spectra matching to Plasmodium targets/decoys in the human + Plasmodium search switched to a human sequence in the subset-search. 1.3% of the sub-sub PSMs above the 1% FDR cutoff switched peptide sequences (panel d orange) as compared to the complete search (all-all and all-sub strategies). They have lower scores than in a complete search and are questionable at best (black and orange boxplots below histogram in panel d). Sub-sub puts an even higher burden on the target decoy approach for the human-subset than for the Plasmodium-subset (Figure 1 in the main manuscript and Supplementary Figure 3) because more high-quality Plasmodium spectra occur in the sample increasing the problem of forced-PSMs. Note, that the results for the human subset are only included to illustrate that poor FDR control of all-all and sub-sub is not due to a specific choice of the subset. Also note that we do not advocate the use of all-sub on all possible subsets and to combine their results.
Supplementary Figure 3 Histograms of MS-GF+ scores (grey) for the search-subset-assess-subset (sub-sub) method in the Plasmodium falciparum example (Plasmodium-subset).
Common PSMs (green) and PSMs that switched peptides sequences (orange) in the sub-sub search (Plasmodium database) as compared to the complete search (human + Plasmodium database). It shows that the sub-sub strategy forces many PSMs on incorrect subset PSMs (huge first mode of the distribution). 0.6% of the sub-sub PSMs above the 1% FDR cutoff switched peptide sequences (orange) as compared to the complete search. Moreover, they have lower scores than in a complete search and are questionable at best (black and orange boxplot below histogram).
Supplementary Figure 4 Boxplot of the fractions of subset PSMs that matched a different peptide sequence in the complete (all-all) and the subset search (subsub) for 36 different GO subsets of the Pyrococcus furiosus example.
PSM-subset-lists were constructed at 1% (panel a) or 5% FDR (panel b). Vertical grey line indicates the FDR cutoff. Since PSMs always have a higher score in the complete search, we assume that the match in the subset search is likely a false positive. Most subsets return a higher fraction of switched PSMs then the given FDR cutoff. This suggests that the sub-sub strategy suffers from an inaccurate FDR control for most GO subsets.
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–4 and Supplementary Methods
Source data
Rights and permissions
About this article
Cite this article
Sticker, A., Martens, L. & Clement, L. Mass spectrometrists should search for all peptides, but assess only the ones they care about. Nat Methods 14, 643–644 (2017). https://doi.org/10.1038/nmeth.4338
Published:
Issue Date:
DOI: https://doi.org/10.1038/nmeth.4338
This article is cited by
-
Activity-based metaproteomics driven discovery and enzymological characterization of potential α-galactosidases in the mouse gut microbiome
Communications Chemistry (2024)
-
Evaluation of open search methods based on theoretical mass spectra comparison
BMC Bioinformatics (2021)
-
The RNA landscape of the human placenta in health and disease
Nature Communications (2021)
-
Critical Assessment of MetaProteome Investigation (CAMPI): a multi-laboratory comparison of established workflows
Nature Communications (2021)
-
Transfer posterior error probability estimation for peptide identification
BMC Bioinformatics (2020)