We present a peptide library and data resource of >100,000 synthetic, unmodified peptides and their phosphorylated counterparts with known sequences and phosphorylation sites. Analysis of the library by mass spectrometry yielded a data set that we used to evaluate the merits of different search engines (Mascot and Andromeda) and fragmentation methods (beam-type collision-induced dissociation (HCD) and electron transfer dissociation (ETD)) for peptide identification. We also compared the sensitivities and accuracies of phosphorylation-site localization tools (Mascot Delta Score, PTM score and phosphoRS), and we characterized the chromatographic behavior of peptides in the library. We found that HCD identified more peptides and phosphopeptides than did ETD, that phosphopeptides generally eluted later from reversed-phase columns and were easier to identify than unmodified peptides and that current computational tools for proteomics can still be substantially improved. These peptides and spectra will facilitate the development, evaluation and improvement of experimental and computational proteomic strategies, such as separation techniques and the prediction of retention times and fragmentation patterns.
At a glance
- Integrated approach for manual evaluation of peptides identified by searching protein sequence databases with tandem mass spectra. J. Proteome Res. 4, 998–1005 (2005). , , &
- Mascot-derived false positive peptide identifications revealed by manual analysis of tandem mass spectra. J. Proteome Res. 8, 3141–3147 (2009). , , &
- Experimental protein mixture for validating tandem mass spectral analysis. OMICS 6, 207–212 (2002). et al.
- Large scale analysis of MASCOT results using a Mass Accuracy-based THreshold (MATH) effectively improves data interpretation. J. Proteome Res. 4, 1353–1360 (2005). , , , &
- The standard protein mix database: a diverse data set to assist in the production of improved Peptide and protein identification software tools. J. Proteome Res. 7, 96–103 (2008). et al.
- Computational prediction of proteotypic peptides for quantitative proteomics. Nat. Biotechnol. 25, 125–131 (2007). et al.
- Analysis and validation of proteomic data generated by tandem mass spectrometry. Nat. Methods 4, 787–797 (2007). , &
- Proteomics: a pragmatic perspective. Nat. Biotechnol. 28, 695–709 (2010). &
- Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nat. Methods 4, 207–214 (2007). &
- Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal. Chem. 74, 5383–5392 (2002). , , &
- A statistical model for identifying proteins by tandem mass spectrometry. Anal. Chem. 75, 4646–4658 (2003). , , &
- iProphet: multi-level integrative analysis of shotgun proteomic data improves peptide and protein identification rates and error estimates. Mol. Cell. Proteomics 10, M111.007690 (2011). et al.
- Combinatorial libraries of synthetic peptides as a model for shotgun proteomics. Anal. Chem. 82, 6559–6568 (2010). et al.
- A probability-based approach for high-throughput protein phosphorylation analysis and site localization. Nat. Biotechnol. 24, 1285–1292 (2006). , , , &
- SLoMo: automated site localization of modifications from ETD/ECD mass spectra. J. Proteome Res. 8, 1965–1971 (2009). et al.
- Phosphorylation site localization in peptides by MALDI MS/MS and the Mascot Delta Score. Anal. Bioanal. Chem. 402, 249–260 (2012). et al.
- Confident phosphorylation site localization using the Mascot Delta Score. Mol. Cell. Proteomics 10, M110.003830 (2011). et al.
- Universal and confident phosphorylation site localization using phosphoRS. J. Proteome Res. 10, 5354–5362 (2011). et al.
- Kinase-selective enrichment enables quantitative phosphoproteomics of the kinome across the cell cycle. Mol. Cell 31, 438–448 (2008). et al.
- Global, in vivo, and site-specific phosphorylation dynamics in signaling networks. Cell 127, 635–648 (2006). et al.
- Quantitative phosphoproteomics reveals widespread full phosphorylation site occupancy during mitosis. Sci. Signal. 3, ra3 (2010). et al.
- Large-scale proteomics analysis of the human kinome. Mol. Cell. Proteomics 8, 1751–1764 (2009). et al.
- Global survey of phosphotyrosine signaling identifies oncogenic kinases in lung cancer. Cell 131, 1190–1203 (2007). et al.
- Phosphorylation analysis by mass spectrometry: myths, facts, and the consequences for qualitative and quantitative measurements. Mol. Cell Proteomics 5, 172–181 (2006). , , , &
- Sequence-specific retention calculator. Algorithm for peptide retention prediction in ion-pair RP-HPLC: application to 300- and 100-A pore size C18 sorbents. Anal. Chem. 78, 7785–7795 (2006).
- Evaluation of HCD- and CID-type fragmentation within their respective detection platforms for murine phosphoproteomics. Mol. Cell. Proteomics 10, M111.009910 (2011). et al.
- Feasibility of large-scale phosphoproteomics with higher energy collisional dissociation fragmentation. J. Proteome Res. 9, 6786–6794 (2010). , , , &
- Decision tree–driven tandem mass spectrometry for shotgun proteomics. Nat. Methods 5, 959–964 (2008). , &
- Human embryonic stem cell phosphoproteome revealed by electron transfer dissociation tandem mass spectrometry. Proc. Natl. Acad. Sci. USA 106, 995–1000 (2009). , , &
- Phosphopeptide fragmentation and analysis by mass spectrometry. J. Mass. Spectrom. 44, 861–878 (2009). , &
- Improved peptide identification by targeted fragmentation using CID, HCD and ETD on an LTQ-Orbitrap Velos. J. Proteome Res. 10, 2377–2388 (2011). et al.
- Enhancing the identification of phosphopeptides from putative basophilic kinase substrates using Ti (IV) based IMAC enrichment. Mol. Cell. Proteomics 10, M110.006452 (2011). et al.
- An evaluation, comparison, and accurate benchmarking of several publicly available MS/MS search algorithms: sensitivity and specificity analysis. Proteomics 5, 3475–3490 (2005). et al.
- Spectral archives: extending spectral libraries to analyze both identified and unidentified spectra. Nat. Methods 8, 587–591 (2011). et al.
- A data-mining scheme for identifying peptide structural motifs responsible for different MS/MS fragmentation intensity patterns. J. Proteome Res. 7, 70–79 (2008). et al.
- The phosphoproteomics data explosion. Curr. Opin. Chem. Biol. 13, 414–420 (2009). &
- Modification site localization scoring integrated into a search engine. Mol. Cell. Proteomics 10, M111.008078 (2011). , &
- Modification site localization scoring: strategies and performance. Mol. Cell. Proteomics 11, 3–14 (2012). &
- Pinpointing phosphorylation sites: quantitative filtering and a novel site-specific x-ion fragment. J. Proteome Res. 10, 2937–2948 (2011). , , &
- Peptide retention standards and hydrophobicity indexes in reversed-phase high-performance liquid chromatography of peptides. Anal. Chem. 81, 9522–9530 (2009). &
- Utility of accurate mass tags for proteome-wide protein identification. Anal. Chem. 72, 3349–3354 (2000). , , , &
- Chromatographic retention time prediction for posttranslationally modified peptides. Proteomics 12, 1151–1159 (2012). et al.
- Training, selection, and robust calibration of retention time models for targeted proteomics. J. Proteome Res. 9, 5209–5216 (2010). , &
- The detection, correlation, and comparison of peptide precursor and product ions from data independent LC-MS with data dependant LC-MS/MS. Proteomics 9, 1683–1695 (2009). et al.
- Mobility labeling for parallel CID of ion mixtures. Anal. Chem. 72, 2737–2740 (2000). , &
- Analytical strategies for phosphoproteomics. Proteomics 9, 1451–1468 (2009). , &
- A simple method for displaying the hydropathic character of a protein. J. Mol. Biol. 157, 105–132 (1982). &
- Robust phosphoproteome enrichment using monodisperse microsphere-based immobilized titanium (IV) ion affinity chromatography. Nat. Protoc. 8, 461–480 (2013). et al.
- Supplementary Text and Figures (3 MB)
Supplementary Figures 1–23
- Supplementary Table 1 (49 KB)
Sequence, site of phosphorylation within the sequence, length and GRAVY score (Hydrophobicity) of the 851 representative sample peptides derived from the consensus of three out of the five publically available human phosphorylation data sets used in this study
- Supplementary Table 2 (16 KB)
Peptide sequence, position of phosphorylation site in the sequence and Gravy score of the seed peptide synthesis of libraries used in this study. For each seed peptide sequence the final number of peptides in the library is given
- Supplementary Table 3 (88 MB)
Search and classification result of HCD data aquired on a Orbitrap Velos.
- Supplementary Table 4 (60 MB)
Search and classification result of ETD-FT data aquired on a Orbitrap Velos.
- Supplementary Table 5 (541 KB)
Number of peptide identifications and phosphorylation site localizations at a given global or local false discovery rate (Mascot)
- Supplementary Table 6 (565 KB)
Coefficients for the computation of local and global FDRs and FLRs