Targeted proteomics by selected/multiple reaction monitoring (S/MRM) or, on a larger scale, by SWATH (sequential window acquisition of all theoretical spectra) MS (mass spectrometry) typically relies on spectral reference libraries for peptide identification. Quality and coverage of these libraries are therefore of crucial importance for the performance of the methods. Here we present a detailed protocol that has been successfully used to build high-quality, extensive reference libraries supporting targeted proteomics by SWATH MS. We describe each step of the process, including data acquisition by discovery proteomics, assertion of peptide-spectrum matches (PSMs), generation of consensus spectra and compilation of MS coordinates that uniquely define each targeted peptide. Crucial steps such as false discovery rate (FDR) control, retention time normalization and handling of post-translationally modified peptides are detailed. Finally, we show how to use the library to extract SWATH data with the open-source software Skyline. The protocol takes 2–3 d to complete, depending on the extent of the library and the computational resources available.
At a glance
- Options and considerations when selecting a quantitative proteomics strategy. Nat. Biotechnol. 28, 710–721 (2010). &
- Selected reaction monitoring-based proteomics: workflows, potential, pitfalls and future directions. Nat. Methods 9, 555–566 (2012). &
- Targeted data extraction of the MS/MS spectra generated by data-independent acquisition: a new concept for consistent and accurate proteome analysis. Mol. Cell. Proteomics 11, O111.016717 (2012). et al.
- Automated approach for quantitative analysis of complex peptide mixtures from tandem mass spectra. Nat. Methods 1, 39–45 (2004). , , , &
- Multiplexed and data-independent tandem mass spectrometry for global proteome profiling. Mass Spectrom. Rev. 33, 452–470 (2014). , &
- OpenSWATH enables automated, targeted analysis of data-independent acquisition MS data. Nat. Biotechnol. 32, 219–223 (2014). et al.
- Spectronaut: a fast and efficient algorithm for MRM-like processing of data independent acquisition (SWATH-MS) data. F1000Posters Presented at the 60th American Society for Mass Spectrometry Conference, 20–24 May 2012 5, 1092 (2014). et al.
- Skyline: an open source document editor for creating and analyzing targeted proteomics experiments. Bioinformatics 26, 966–968 (2010). et al.
- mProphet: automated data processing and statistical validation for large-scale SRM experiments. Nat. Methods 8, 430–435 (2011). et al.
- Expansion of the ion library for mining SWATH-MS data through fractionation proteomics. Anal. Chem. 86, 7242–7246 (2014). et al.
- Building consensus spectral libraries for peptide identification in proteomics. Nat. Methods 5, 873–875 (2008). et al.
- Quantitative proteomic analysis of drug-induced changes in mycobacteria. J. Proteome Res. 5, 54–63 (2006). , , &
- Analysis of peptide MS/MS spectra from large-scale proteomics experiments using spectrum libraries. Anal. Chem. 78, 5678–5684 (2006). , , , &
- A database of mass spectrometric assays for the yeast proteome. Nat. Methods 5, 913–914 (2008). et al.
- Expediting the development of targeted SRM assays: using data from shotgun proteomics to automate method development. J. Proteome Res. 8, 2733–2739 (2009). et al.
- High-throughput generation of selected reaction-monitoring assays for proteins and proteomes. Nat. Methods 7, 43–46 (2010). et al.
- Quantifying protein interaction dynamics by SWATH mass spectrometry: application to the 14-3-3 system. Nat. Methods 10, 1246–1253 (2013). et al.
- A complete mass-spectrometric map of the yeast proteome applied to quantitative trait analysis. Nature 494, 266–270 (2013). et al.
- The Mtb Proteome library: a resource of assays to quantify the complete proteome of Mycobacterium tuberculosis. Cell Host Microbe 13, 602–612 (2013). et al.
- Proteome-wide selected reaction monitoring assays for the human pathogen Streptococcus pyogenes. Nat. Commun. 3, 1301 (2012). , , &
- N-Glycoprotein SRMAtlas: a resource of mass-spectrometric assays for N-glycosites enabling consistent and multiplexed protein quantification for clinical applications. Mol. Cell. Proteomics 12, 1005–1016 (2013). et al.
- Reproducible quantification of cancer-associated proteins in body fluids using targeted proteomics. Sci. Transl. Med. 4, 142ra94 (2012). et al.
- A repository of assays to quantify 10,000 human proteins by SWATH-MS. Sci. Data 1, 140031 (2014). et al.
- A guided tour of the trans-proteomic pipeline. Proteomics 10, 1150–1159 (2010). et al.
- A cross-platform toolkit for mass spectrometry and proteomics. Nat. Biotechnol. 30, 918–920 (2012). et al.
- OpenMS: an open-source software framework for mass spectrometry. BMC Bioinformatics 9, 163 (2008). et al.
- MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat. Biotechnol. 26, 1367–1372 (2008). &
- Development and validation of a spectral library searching method for peptide identification from MS/MS. Proteomics 7, 655–667 (2007). et al.
- Building and searching tandem mass (MS/MS) spectral libraries for peptide identification in proteomics. Methods 54, 424–431 (2011). &
- Accurate peptide fragment mass analysis: multiplexed peptide identification and quantification. J. Proteome Res. 11, 1621–1632 (2012). , , , &
- Reproducible and consistent quantification of the Saccharomyces cerevisiae proteome by SWATH-MS. Mol. Cell. Proteomics, http://dx.doi.org/10.1074/mcp.M113.035550 (2015). et al.
- Added value for tandem mass spectrometry shotgun proteomics data validation through isoelectric focusing of peptides. J. Proteome Res. 4, 2273–2282 (2005). et al.
- Rapid empirical discovery of optimal peptides for targeted proteomics. Nat. Methods 8, 1041–1043 (2011). , , , &
- Improved prediction of peptide detectability for targeted proteomics using a rank-based algorithm and organism-specific data. Proteomics 108, 269–283 (2014). et al.
- Computational prediction of proteotypic peptides for quantitative proteomics. Nat. Biotechnol. 25, 125–131 (2006). et al.
- CONSeQuence: prediction of reference peptides for absolute quantitative proteomics using consensus machine learning approaches. Mol. Cell. Proteomics 10, M110.003384 (2011). et al.
- Prediction of high-responding peptides for targeted protein assays by mass spectrometry. Nat. Biotechnol. 27, 190–198 (2009). , , &
- A computational approach toward label-free protein quantification using predicted peptide detectability. Bioinformatics 22, e481–e488 (2006). et al.
- A support vector machine model for the prediction of proteotypic peptides for accurate mass and time proteomics. Bioinformatics 24, 1503–1509 (2008). et al.
- On the accuracy and limits of peptide fragmentation spectrum prediction. Anal. Chem. 83, 790–796 (2011). , , &
- Using iRT, a normalized retention time for more targeted measurement of peptides. Proteomics 12, 1111–1121 (2012). et al.
- Conserved peptide fragmentation as a benchmarking tool for mass spectrometers and a discriminating feature for targeted proteomics. Mol. Cell. Proteomics 13, 2056–2071 (2014). et al.
- Improving SRM assay development: a global comparison between triple quadrupole, ion trap, and higher energy CID peptide fragmentation spectra. J. Proteome Res. 10, 4334–4341 (2011). , , , &
- mzML: a single, unifying data format for mass spectrometer output. Proteomics 8, 2776–2777 (2008).
- A uniform proteomics MS/MS analysis platform utilizing open XML file formats. Mol. Syst. Biol. 1, 2005.0017 (2005). , , , &
- Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nat. Methods 4, 207–214 (2007). &
- iProphet: multi-level integrative analysis of shotgun proteomic data improves peptide and protein identification rates and error estimates. Mol. Cell. Proteomics 10, M111.007690 (2011). et al.
- Combining results of multiple search engines in proteomics. Mol. Cell. Proteomics 12, 2383–2393 (2013). , , &
- The implications of proteolytic background for shotgun proteomics. Mol. Cell. Proteomics 6, 1589–1598 (2007). , &
- Comprehensive analysis of protein digestion using six trypsins reveals the origin of trypsin as a significant source of variability in proteomics. J. Proteome Res. 12, 5666–5680 (2013). et al.
- In-source fragmentation and the sources of partially tryptic peptides in shotgun proteomics. J. Proteome Res. 12, 910–916 (2013). , , , &
- A face in the crowd: recognizing peptides through database search. Mol. Cell. Proteomics 10, R111.009522 (2011). , , &
- Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal. Chem. 74, 5383–5392 (2002). , , &
- Protein identification false discovery rates for very large proteomics data sets generated by tandem mass spectrometry. Mol. Cell. Proteomics 8, 2405–2417 (2009). et al.
- Methods for peptide identification by spectral comparison. Proteome Sci. 5, 3 (2007). et al.
- A computational tool to detect and avoid redundancy in selected reaction monitoring. Mol. Cell. Proteomics 11, 540–549 (2012). , &
- TraML--a standard format for exchange of selected reaction monitoring transition lists. Mol. Cell. Proteomics 11, R111.015040 (2012). et al.
- ProteomeXchange provides globally coordinated proteomics data submission and dissemination. Nat. Biotechnol. 32, 223–226 (2014). et al.
- Full dynamic range proteome analysis of S. cerevisiae by targeted proteomics. Cell 138, 795–806 (2009). , , , &
- A statistical model for identifying proteins by tandem mass spectrometry. Anal. Chem. 75, 4646–4658 (2003). , , &
- Supplementary Figure 1: Comparison of converters. (217 KB)
(a) Violin plots show the intrinsic fragment ion spectrum variability of three yeast sample injections converted with three DDA centroiding algorithms.
(b) Violin plots show how well the relative intensities of the six most intense fragment ions obtained by a certain centroiding algorithm compare to the relative intensities of the same six fragments in SWATH MS data.
The spectrum similarity was determined using the normalised spectral contrast angle as described by Toprak and co-workers41. N indicates the number of pairwise comparisons of fragment ion spectra for the same peptide precursors between each data file and S indicates the estimated level of dissimilarity obtained by comparing the experimental score distribution to the score distribution of a perturbation benchmark dataset (the lower S, the more similar the spectra).
(c) Depending on the centroiding algorithm implemented in the different peak pickers, the number of peptide and protein identifications from a database search vary slightly (after filtering for protein-level FDR=1%). The size of the circles is proportional to the number of identifications.
- Supplementary Figure 2: MAYU-estimated FDRs with respect to iProphet probability thresholds. (272 KB)
The software MAYU estimates the FDRs on PSM (mFDR), peptide (pepFDR), and protein (protFDR) level with respect to the applied iProphet probability threshold. The data is based on the case study described in the protocol, which consists of three whole cell lysates of yeast. As comparison the protein FDR of a very large data set of 331 runs of different human cells and tissues is shown (Pan Human Library as described by Rosenberger and colleagues23).
- Supplementary Figure 3: Comparison of consensus and best replicate spectral libraries. (92 KB)
a) Violin plots show the intrinsic fragment ion spectrum variability of three injections of a yeast sample converted with the qtofpeakpicker and summarised using either the best replicate fragment ion spectrum or consensus algorithm implemented in SpectraST11.
(b) Violin plots show how well the relative intensities of the six most intense fragment ions compare to the relative intensities of the same six fragments in SWATH MS data.
The spectrum similarity was determined using the normalised spectral contrast angle as described by Toprak and co-workers41. N indicates the number of pairwise comparisons of fragment ion spectra for the same peptide precursors between each data file and S indicates the estimated level of dissimilarity obtained by comparing the experimental score distribution to the score distribution of a perturbation benchmark dataset (the lower S, the more similar the spectra). Common neutral losses were included in the comparison.
- Supplementary Text and Figures (7,125 KB)
Supplementary Figures 1–3, Supplementary Notes 1–5, Supplementary Tables 1–6 and Supplementary Tutorial