MSFragger: ultrafast and comprehensive peptide identification in mass spectrometry–based proteomics

Kong, Andy T; Leprevost, Felipe V; Avtonomov, Dmitry M; Mellacheruvu, Dattatreya; Nesvizhskii, Alexey I

doi:10.1038/nmeth.4256

Article
Published: 10 April 2017

MSFragger: ultrafast and comprehensive peptide identification in mass spectrometry–based proteomics

Nature Methods volume 14, pages 513–520 (2017)Cite this article

31k Accesses
821 Citations
46 Altmetric
Metrics details

Subjects

Abstract

There is a need to better understand and handle the 'dark matter' of proteomics—the vast diversity of post-translational and chemical modifications that are unaccounted in a typical mass spectrometry–based analysis and thus remain unidentified. We present a fragment-ion indexing method, and its implementation in peptide identification tool MSFragger, that enables a more than 100-fold improvement in speed over most existing proteome database search tools. Using several large proteomic data sets, we demonstrate how MSFragger empowers the open database search concept for comprehensive identification of peptides and all their modified forms, uncovering dramatic differences in modification rates across experimental samples and conditions. We further illustrate its utility using protein–RNA cross-linked peptide data and using affinity purification experiments where we observe, on average, a 300% increase in the number of identified spectra for enriched proteins. We also discuss the benefits of open searching for improved false discovery rate estimation in proteomics.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Database-search strategies and the MSFragger algorithm.**

**Figure 2: HEK293 peptide identifications across traditional narrow-window and open searches demonstrate underestimation of FDR.**

**Figure 3: Modification profiles in large-scale HeLa, HEK293, and TNBC shotgun proteomics experiments.**

**Figure 4: Open searching detects modified peptides containing labile modifications.**

**Figure 5: Application of MSFragger to diverse proteomics experiments.**

PepQuery2 democratizes public MS proteomics data for rapid peptide searching

Article Open access 18 April 2023

SPECTRUM – A MATLAB Toolbox for Proteoform Identification from Top-Down Proteomics Data

Article Open access 02 August 2019

Mzion enables deep and precise identification of peptides in data-dependent acquisition proteomics

Article Open access 29 April 2023

References

Nesvizhskii, A.I. A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics. J. Proteomics 73, 2092–2123 (2010).
CAS PubMed PubMed Central Google Scholar
Eng, J.K., Searle, B.C., Clauser, K.R. & Tabb, D.L. A face in the crowd: recognizing peptides through database search. Mol. Cel. Proteomics 10, R111.009522 (2011).
Google Scholar
Skinner, O.S. & Kelleher, N.L. Illuminating the dark matter of shotgun proteomics. Nat. Biotechnol. 33, 717–718 (2015).
CAS PubMed Google Scholar
Chick, J.M. et al. A mass-tolerant database search identifies a large proportion of unassigned spectra in shotgun proteomics as modified peptides. Nat. Biotechnol. 33, 743–749 (2015).
CAS PubMed PubMed Central Google Scholar
Griss, J. et al. Recognizing millions of consistently unidentified spectra across hundreds of shotgun proteomics datasets. Nat. Methods 13, 651–656 (2016).
Article CAS PubMed PubMed Central Google Scholar
Nesvizhskii, A.I. et al. Dynamic spectrum quality assessment and iterative computational analysis of shotgun proteomic data: toward more efficient identification of post-translational modifications, sequence polymorphisms, and novel peptides. Mol. Cell. Proteomics 5, 652–670 (2006).
CAS PubMed Google Scholar
Nielsen, M.L., Savitski, M.M. & Zubarev, R.A. Extent of modifications in human proteome samples and their effect on dynamic range of analysis in shotgun proteomics. Mol. Cell. Proteomics 5, 2384–2391 (2006).
CAS PubMed Google Scholar
Ning, K., Fermin, D. & Nesvizhskii, A.I. Computational analysis of unassigned high-quality MS/MS spectra in proteomic data sets. Proteomics 10, 2712–2718 (2010).
CAS PubMed PubMed Central Google Scholar
Craig, R. & Beavis, R.C. TANDEM: matching proteins with tandem mass spectra. Bioinformatics 20, 1466–1467 (2004).
CAS PubMed Google Scholar
Creasy, D.M. & Cottrell, J.S. Error tolerant searching of uninterpreted tandem mass spectrometry data. Proteomics 2, 1426–1434 (2002).
CAS PubMed Google Scholar
Shortreed, M.R. et al. Global Identification of Protein Post-translational Modifications in a Single-Pass Database Search. J. Proteome Res. 14, 4714–4720 (2015).
CAS PubMed PubMed Central Google Scholar
Ahrné, E., Nikitin, F., Lisacek, F. & Müller, M. QuickMod: A tool for open modification spectrum library searches. J. Proteome Res. 10, 2913–2921 (2011).
PubMed Google Scholar
Bandeira, N., Tsur, D., Frank, A. & Pevzner, P.A. Protein identification by spectral networks analysis. Proc. Natl. Acad. Sci. USA 104, 6140–6145 (2007).
CAS PubMed PubMed Central Google Scholar
Savitski, M.M., Nielsen, M.L. & Zubarev, R.A. ModifiComb, a new proteomic tool for mapping substoichiometric post-translational modifications, finding novel types of modifications, and fingerprinting complex protein mixtures. Mol. Cell. Proteomics 5, 935–948 (2006).
CAS PubMed Google Scholar
Ma, C.W. & Lam, H. Hunting for unexpected post-translational modifications by spectral library searching with tier-wise scoring. J. Proteome Res. 13, 2262–2271 (2014).
CAS PubMed Google Scholar
Tabb, D.L., Saraf, A. & Yates, J.R. III. GutenTag: high-throughput sequence tagging via an empirically derived fragmentation model. Anal. Chem. 75, 6415–6421 (2003).
CAS PubMed PubMed Central Google Scholar
Bern, M., Cai, Y. & Goldberg, D. Lookup peaks: a hybrid of de novo sequencing and database search for protein identification by tandem mass spectrometry. Anal. Chem. 79, 1393–1400 (2007).
CAS PubMed Google Scholar
Dasari, S. et al. Sequence tagging reveals unexpected modifications in toxicoproteomics. Chem. Res. Toxicol. 24, 204–216 (2011).
CAS PubMed PubMed Central Google Scholar
Na, S., Bandeira, N. & Paek, E. Fast multi-blind modification search through tandem mass spectrometry. Mol. Cell. Proteomics 11, M111.010199 (2012).
PubMed Google Scholar
Searle, B.C. et al. Identification of protein modifications using MS/MS de novo sequencing and the OpenSea alignment algorithm. J. Proteome Res. 4, 546–554 (2005).
CAS PubMed Google Scholar
Chen, Y., Chen, W., Cobb, M.H. & Zhao, Y. PTMap--a sequence alignment software for unrestricted, accurate, and full-spectrum identification of post-translational modification sites. Proc. Natl. Acad. Sci. USA 106, 761–766 (2009).
CAS PubMed PubMed Central Google Scholar
Tanner, S., Pevzner, P.A. & Bafna, V. Unrestrictive identification of post-translational modifications through peptide mass spectrometry. Nat. Protoc. 1, 67–72 (2006).
CAS PubMed Google Scholar
Fu, Y. in Statistical Analysis in Proteomics (ed. K. Jung) 265–275 (Springer New York, New York, NY, 2016).
Chi, H. et al. pFind-Alioth: A novel unrestricted database search algorithm to improve the interpretation of high-resolution MS/MS data. J. Proteomics 125, 89–97 (2015).
CAS PubMed Google Scholar
Cox, J. et al. Andromeda: a peptide search engine integrated into the MaxQuant environment. J. Proteome Res. 10, 1794–1805 (2011).
CAS PubMed Google Scholar
McIlwain, S. et al. Crux: rapid open source protein tandem mass spectrometry analysis. J. Proteome Res. 13, 4488–4491 (2014).
CAS PubMed PubMed Central Google Scholar
Eng, J.K., Jahan, T.A. & Hoopmann, M.R. Comet: an open-source MS/MS sequence database search tool. Proteomics 13, 22–24 (2013).
CAS PubMed Google Scholar
Fu, Y. & Qian, X. Transferred subgroup false discovery rate for rare post-translational modifications detected by mass spectrometry. Mol. Cell. Proteomics 13, 1359–1368 (2014).
CAS PubMed Google Scholar
Vaudel, M. et al. PeptideShaker enables reanalysis of MS-derived proteomics data sets. Nat. Biotechnol. 33, 22–24 (2015).
CAS PubMed Google Scholar
Diament, B.J. & Noble, W.S. Faster SEQUEST searching for peptide identification from tandem mass spectra. J. Proteome Res. 10, 3871–3879 (2011).
CAS PubMed PubMed Central Google Scholar
Tsou, C.C. et al. DIA-Umpire: comprehensive computational framework for data-independent acquisition proteomics. Nat. Methods 12, 258–264, 7, 264 (2015).
CAS PubMed PubMed Central Google Scholar
Houel, S. et al. Quantifying the impact of chimera MS/MS spectra on peptide identification in large-scale proteomics studies. J. Proteome Res. 9, 4152–4160 (2010).
CAS PubMed PubMed Central Google Scholar
Avtonomov, D.M., Raskind, A. & Nesvizhskii, A.I. BatMass: a Java software platform for LC-MS data visualization in proteomics and metabolomics. J. Proteome Res. 15, 2500–2509 (2016).
CAS PubMed PubMed Central Google Scholar
Zhang, B., Pirmoradian, M., Chernobrovkin, A. & Zubarev, R.A. DeMix workflow for efficient identification of cofragmented peptides in high resolution data-dependent tandem mass spectrometry. Mol. Cell. Proteomics 13, 3211–3223 (2014).
CAS PubMed PubMed Central Google Scholar
Bogdanow, B., Zauber, H. & Selbach, M. Systematic errors in peptide and protein identification and quantification by modified peptides. Mol. Cell. Proteomics 15, 2791–2801 (2016).
CAS PubMed PubMed Central Google Scholar
Nesvizhskii, A.I. Proteogenomics: concepts, applications and computational strategies. Nat. Methods 11, 1114–1125 (2014).
CAS PubMed PubMed Central Google Scholar
Sharma, K. et al. Ultradeep human phosphoproteome reveals a distinct regulatory nature of Tyr and Ser/Thr-based signaling. Cell Rep. 8, 1583–1594 (2014).
CAS PubMed Google Scholar
Lawrence, R.T. et al. The proteomic landscape of triple-negative breast cancer. Cell Rep. 11, 630–644 (2015).
CAS PubMed PubMed Central Google Scholar
Pozniak, Y. et al. System-wide clinical proteomics of breast cancer reveals global remodeling of tissue homeostasis. Cell Syst. 2, 172–184 (2016).
CAS PubMed Google Scholar
Metz, B. et al. Identification of formaldehyde-induced modifications in proteins: reactions with model peptides. J. Biol. Chem. 279, 6235–6243 (2004).
CAS PubMed Google Scholar
Huttlin, E.L. et al. The BioPlex network: a systematic exploration of the human interactome. Cell 162, 425–440 (2015).
CAS PubMed PubMed Central Google Scholar
Kabil, O. & Banerjee, R. Enzymology of H2S biogenesis, decay and signaling. Antioxid. Redox Signal. 20, 770–782 (2014).
CAS PubMed PubMed Central Google Scholar
Choi, H. et al. SAINT: probabilistic scoring of affinity purification-mass spectrometry data. Nat. Methods 8, 70–73 (2011).
CAS PubMed Google Scholar
Sardiu, M.E. & Washburn, M.P. Construction of protein interaction networks based on the label-free quantitative proteomics. Methods Mol. Biol. 781, 71–85 (2011).
CAS PubMed Google Scholar
Kramer, K. et al. Photo-cross-linking and high-resolution mass spectrometry for assignment of RNA-binding sites in RNA-binding proteins. Nat. Methods 11, 1064–1070 (2014).
CAS PubMed PubMed Central Google Scholar
Perez-Riverol, Y., Alpi, E., Wang, R., Hermjakob, H. & Vizcaíno, J.A. Making proteomics data accessible and reusable: current state of proteomics databases and repositories. Proteomics 15, 930–949 (2015).
CAS PubMed PubMed Central Google Scholar
Tan, M. et al. Identification of 67 histone marks and histone lysine crotonylation as a new type of histone modification. Cell 146, 1016–1028 (2011).
CAS PubMed PubMed Central Google Scholar
Yadav, M. et al. Predicting immunogenic tumour mutations by combining mass spectrometry and exome sequencing. Nature 515, 572–576 (2014).
CAS PubMed Google Scholar
Mommen, G.P.M. et al. Expanding the detectable HLA peptide repertoire using electron-transfer/higher-energy collision dissociation (EThcD). Proc. Natl. Acad. Sci. USA 111, 4507–4512 (2014).
CAS PubMed PubMed Central Google Scholar
van den Broek, I. et al. Quantifying protein measurands by peptide measurements: where do errors arise? J. Proteome Res. 14, 928–942 (2015).
CAS PubMed Google Scholar
Fenyö, D. & Beavis, R.C. A method for assessing the statistical significance of mass spectrometry-based protein identifications using general scoring schemes. Anal. Chem. 75, 768–774 (2003).
PubMed Google Scholar
Deutsch, E.W. et al. A guided tour of the Trans-Proteomic Pipeline. Proteomics 10, 1150–1159 (2010).
CAS PubMed PubMed Central Google Scholar
Kryuchkov, F., Verano-Braga, T., Hansen, T.A., Sprenger, R.R. & Kjeldsen, F. Deconvolution of mixture spectra and increased throughput of peptide identification by utilization of intensified complementary ions formed in tandem mass spectrometry. J. Proteome Res. 12, 3362–3371 (2013).
CAS PubMed Google Scholar
Keller, A., Nesvizhskii, A.I., Kolker, E. & Aebersold, R. Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal. Chem. 74, 5383–5392 (2002).
CAS PubMed Google Scholar
Nesvizhskii, A.I., Keller, A., Kolker, E. & Aebersold, R. A statistical model for identifying proteins by tandem mass spectrometry. Anal. Chem. 75, 4646–4658 (2003).
CAS PubMed Google Scholar
Choi, H., Ghosh, D. & Nesvizhskii, A.I. Statistical validation of peptide identifications in large-scale proteomics using the target-decoy database search strategy and flexible mixture modeling. J. Proteome Res. 7, 286–292 (2008).
CAS PubMed Google Scholar
Shanmugam, A.K., Yocum, A.K. & Nesvizhskii, A.I. Utility of RNA-seq and GPMDB protein observation frequency for improving the sensitivity of protein identification by tandem MS. J. Proteome Res. 13, 4113–4119 (2014).
CAS PubMed PubMed Central Google Scholar
Kong, A.T., Leprevost, F.V., Avtonomov, D.M., Mellacheruvu, D. & Nesvizhskii, A.I. Using MSFragger for ultrafast database searching. Protocol Exchange doi:10.1038/nprot.2017.032 (2017).

Download references

Acknowledgements

We thank R. Beavis for helpful discussions, N. Bandeira and S. Na for help with MODa software, and E. Huttlin for assisting with the transfer of raw MS data from the AP–MS study. This work was funded in part by grants from the NIH (R01GM94231 and U24CA210967 to A.I.N.).

Author information

Authors and Affiliations

Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA
Andy T Kong & Alexey I Nesvizhskii
Department of Pathology, University of Michigan, Ann Arbor, Michigan, USA
Andy T Kong, Felipe V Leprevost, Dmitry M Avtonomov, Dattatreya Mellacheruvu & Alexey I Nesvizhskii

Authors

Andy T Kong
View author publications
You can also search for this author in PubMed Google Scholar
Felipe V Leprevost
View author publications
You can also search for this author in PubMed Google Scholar
Dmitry M Avtonomov
View author publications
You can also search for this author in PubMed Google Scholar
Dattatreya Mellacheruvu
View author publications
You can also search for this author in PubMed Google Scholar
Alexey I Nesvizhskii
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

A.T.K. and A.I.N. conceived the project. A.T.K. developed the algorithm, wrote the software, and analyzed the results. A.I.N. assisted with the algorithm development and software design, analyzed the results, and supervised the entire project. F.V.L., D.M.A., and D.M. contributed to software development and data analysis. A.T.K. and A.I.N. wrote the manuscript.

Corresponding author

Correspondence to Alexey I Nesvizhskii.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Fragment indexing allows efficient spectra similarity comparisons.

The cost and efficiency of spectra similarity calculations can be approximated by the number of fragment comparisons required for each candidate peptide. In conventional strategies, tens to hundreds of comparisons are needed to compare an experimental spectrum to a theoretical spectrum. However, the vast majority of such fragment-fragment comparisons do not result in matches as the differences between their m/z is often far greater than the fragment mass tolerance. Using MSFragger’s fragment index, these comparisons are omitted as the binning strategy allows us to retrieve only the experimental-theoretical fragment pairs that are close in m/z – the majority of which falls within the fragment mass tolerance and are deemed relevant when they contribute to the score of a PSM. MSFragger’s alternative approach results in only a few fragments evaluated per candidate peptide across a variety of search scenarios. Reduction in the fragment bin width allows for fewer fragment comparisons to be performed at the expense of computational overheads associated with traversing a greater number of bins that overlap the fragment tolerance window. MSFragger dynamically selects a bin width appropriate for the search scenario (opting for smaller bins in open search where the number of comparisons is large, and larger bins in narrow window search, where the number of comparisons is small relative to the overhead costs). Hence, a greater number of fragments is evaluated per candidate and a lower percentage of comparisons are found relevant in narrow window searching due to this optimization.

Supplementary Figure 2 MSFragger scales efficiently across large numbers of CPU cores.

Indexing and searching operations in MSFragger are designed for modern multi-core computers and are optimized to reduce pressures on memory bandwidth. Results are generated from open search times of a single LC-MS/MS run on a dual processor system with 14-cores in each processor. (a) MSFragger scales almost linearly in terms of overall search times on up to 8 cores. Reading of mass spectrometry data files and results compilation is not highly parallelizable resulting in reduced scalability beyond 8 cores. The jump from 14 to 28 threads causes non-local memory to be accessed by each processor, impacting scalability. (b) Fragment index searching by itself is efficiently parallelizable in MSFragger and scales to effectively utilize all cores.

Supplementary Figure 3 Open searching identifies similar modifications as MODa.

MODa, run in single blind mode, generates a similar modification profile as that of an open search with differences that are likely due to the characteristics of the modification. Open searches (run in fully tryptic mode in both comparisons) are more likely to recover mass shifted peptides that have little discernible alterations in their tandem mass spectra (such as the modification near 302 Da) as it does not attempt to localize the modified mass. MODa is likely more effective for modifications that are more commonly found near the C-terminus (and disrupts the y-ions used in open search identification). MODa running in semi-tryptic mode (the mode of operation as recommended by its authors) recovers a greater number of PSMs at the expense of additional run time.

Supplementary Figure 4 Preferential boosting of unmodified peptides fails to rescue missing peptides.

Boosting recovers a greater percentage of the peptides found in narrow window search prior to FDR filtering. Note that not all peptides identified in narrow window search are recovered in open search with the boosting option enabled due to the presence of a default peptide probability filter of 0.05 in PeptideProphet (disabling this filter using the –p0 option results in near 100% recovery). However, after controlling for FDR, boosting does not improve the peptide overlap between open and narrow window search.

Supplementary Figure 5 Decreased sensitivity for common modifications in open searching can be overcome by specifying variable modifications.

Standard open searches tend to identify far fewer peptides modified with common modifications than narrow window searching specifying those modifications as variable modifications. This is due to decreased sensitivity when the shifted ions are no longer matched in open search. For the most abundant chemical modifications, this can result in a significant decrease in overall counts. The speed of MSFragger allows variable modifications to be specified in conjunction with open searching. Examining peptides with oxidized methionine reveals that standard open search recovers only 45.37% of the peptides originally identified with oxidized methionine in narrow window searching (with variably oxidized methionine). Specifying oxidized methionine as a variable modification in open search brings that percentage to 88.81%, close to the overall overlap in peptide identifications between narrow window and open searches.

Supplementary Figure 6 Complementary ions aid recovery of peptides with modifications near peptide C terminus.

(a) High intensity fragment ions are selected from the experimental spectrum and are assumed to be modified y-ions. Complementary ions based on the experimental precursor mass are inserted to form a modified spectrum that is subjected to open searching. (b) Evaluation of complementary ions using peptides containing a single oxidized methionine. 10, 20, and 30 complementary ions were inserted into each experimental spectrum and the counts of identified peptides were ordered by the distance of their oxidation site to the N or C-terminus. The addition of complementary ions decreased the number of identifications for peptides with oxidation near the N-terminus but greatly increased identification rates for peptides with oxidation near the C-terminus. For peptides with an oxidized methionine upstream of the tryptic cleavage site, the number of identified peptides increased by 48% when 20 complementary ions were added. The addition of more than 20 complementary ions was not found to be beneficial.

Supplementary Figure 7 Co-isolation of co-eluting precursors can result in mass differences that are not due to chemical modifications.

(a) A MS/MS event was triggered at m/z 685.84 (green arrow) resulting in the identification of the peptide LGPALATGNVVVMK with a mass difference of 0.878. The parent survey scan reveals a co-eluting precursor with m/z 685.40 (cyan arrow). The difference in m/z at charge 2+ matches the observed mass difference suggesting that the co-eluting precursor is identified instead of the target precursor in this chimeric spectrum. (b) BatMass visualization of the MS/MS event described in (a) with MS/MS isolations marked by the purple line segments. The cyan arrow indicates the monoisotopic peak of the target precursor while the red arrow indicates the monoisotopic peak of the identified precursor. (c) The peptide RESVELALK was identified with a mass difference of -349.185 at m/z 348.21 (green arrow). Parent survey scan reveals a co-eluting precursor with m/z 348.87 (cyan arrow). While the target precursor ion is of charge 2+, the co-eluting precursor is of charge 3+, which transforms this 0.66 difference in m/z between these co-eluting precursors into the observed mass difference of -349.185. (d) Similar BatMass visualization of the MS/MS event described in (c). Note how the isolation window of the charge 2+ target precursor (cyan) crosses the monoisotopic peak of the charge 3+ co-eluting precursor (red).

Supplementary Figure 8 MS1-based correction of precursor masses and identification-based calibration helps delineate modifications in close mass proximity.

Identified number of PSMs with mass differences in the range of 0.98 Da to 1.01 Da from a single HEK293 LC-MS/MS run. Expected mass differences in this range are due to deamidation (with a delta mass of 0.984 Da) and C12/C13 error (with a delta mass of 1.003 Da). (a) Prior to correction a broad peak with no coherent shape is observed with a center around 1.005 Da. Knowledge of expected mass differences may lead to the calling of a peak near 0.986 Da. (b) Two cleanly resolved peaks are observed after mass correction. Expected peaks corresponding to deamidation and C12/C13 error are resolved with mean mass accurate to 1/1000 Da. The ability to determine such peaks from a single LC-MS/MS run demonstrates the accuracy of modern instruments and the power of our mass correction procedure.

Supplementary Figure 9 Localization profiles are consistent across experiments.

Common modifications were selected and amino acid localization enrichment was calculated separately for each dataset. Amino acid localizations were largely consistent across each dataset despite the differences in modification rates.

Supplementary Figure 10 Highly similar spectra pair for peptide LEAEIATYR with precursor mass difference of 284.126.

3214 PSMs (corresponding to 1087 unique peptides) were identified in the mass difference bin of 284.126 Da. These PSMs were predominantly observed in the HeLa dataset and were shown to have a spectral similarity score of 0.90 (indicating that the spectra of mass shifted peptides are highly similar to that of corresponding unmodified peptides). Here, we selected a pair of PSMs that were both identified to be the peptide LEAEIATYR in the same LC-MS/MS run. Despite their highly similar fragmentation patterns and few unmatched fragments, they were observed with precursor masses that differ by 284.1251 Da. The full y-ion series was successfully matched, which when overlapped with the matched b-2, b-3, and b-4 ions, rules out the possibility of a modified residue in the fragmentation spectrum.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–10 (PDF 1483 kb)

Supplementary Protocol

MSFragger Manual (PDF 611 kb)

Supplementary Table 1

Analysis times for a single file (b1906_293T_proteinID_01A_QE3_122212) in HEK293 dataset using different search engines. (XLSX 12 kb)

Supplementary Table 2

List of mass spectrometry data files analyzed from each dataset and their corresponding number of MS/MS spectra. (XLSX 146 kb)

Supplementary Table 3

List of top 500 detected features in mass shift histogram with potential explanations. (XLSX 116 kb)

Supplementary Table 4

Mass shift localization by data set. (XLSX 112 kb)

Supplementary Table 5

Number of peptide ions and PSMs identified in narrow-window and open searching by bait protein in AP-MS dataset. (XLSX 187 kb)

Supplementary Table 6

Peptide identifications in ETHE1 AP–MS experiments using narrow-window and mass-tolerant searches. (XLSX 571 kb)

Supplementary Table 7

List of genes associated with 'small molecule metabolic process' that have a large increase in identified bait peptide ions. (XLSX 9 kb)

Supplementary Table 8

List of identified peptides in RNA–protein cross-linking experiment. (XLSX 28 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kong, A., Leprevost, F., Avtonomov, D. et al. MSFragger: ultrafast and comprehensive peptide identification in mass spectrometry–based proteomics. Nat Methods 14, 513–520 (2017). https://doi.org/10.1038/nmeth.4256

Download citation

Received: 30 September 2016
Accepted: 06 March 2017
Published: 10 April 2017
Issue Date: May 2017
DOI: https://doi.org/10.1038/nmeth.4256

This article is cited by

COSMIC-based mutation database enhances identification efficiency of HLA-I immunopeptidome
- Fangzhou Wang
- Zhenpeng Zhang
- Shichun Lu
Journal of Translational Medicine (2024)
AdductHunter: identifying protein-metal complex adducts in mass spectra
- Derek Long
- Liam Eade
- Katerina Taškova
Journal of Cheminformatics (2024)
Kinome and phosphoproteome reprogramming underlies the aberrant immune responses in critically ill COVID-19 patients
- Tomonori Kaneko
- Sally Ezra
- Shawn Shun-Cheng Li
Clinical Proteomics (2024)
The proteome of the blood–brain barrier in rat and mouse: highly specific identification of proteins on the luminal surface of brain microvessels by in vivo glycocapture
- Tammy-Lynn Tremblay
- Wael Alata
- Jennifer J. Hill
Fluids and Barriers of the CNS (2024)
AlphaPept: a modern and open framework for MS-based proteomics
- Maximilian T. Strauss
- Isabell Bludau
- Matthias Mann
Nature Communications (2024)