Building high-quality assay libraries for targeted analysis of SWATH MS data

Journal name:
Nature Protocols
Volume:
10,
Pages:
426–441
Year published:
DOI:
doi:10.1038/nprot.2015.015
Published online

Abstract

Targeted proteomics by selected/multiple reaction monitoring (S/MRM) or, on a larger scale, by SWATH (sequential window acquisition of all theoretical spectra) MS (mass spectrometry) typically relies on spectral reference libraries for peptide identification. Quality and coverage of these libraries are therefore of crucial importance for the performance of the methods. Here we present a detailed protocol that has been successfully used to build high-quality, extensive reference libraries supporting targeted proteomics by SWATH MS. We describe each step of the process, including data acquisition by discovery proteomics, assertion of peptide-spectrum matches (PSMs), generation of consensus spectra and compilation of MS coordinates that uniquely define each targeted peptide. Crucial steps such as false discovery rate (FDR) control, retention time normalization and handling of post-translationally modified peptides are detailed. Finally, we show how to use the library to extract SWATH data with the open-source software Skyline. The protocol takes 2–3 d to complete, depending on the extent of the library and the computational resources available.

At a glance

Figures

  1. Workflow for SWATH assay library generation.
    Figure 1: Workflow for SWATH assay library generation.

    The library building workflow starts with the selection of representative samples and fragment ion spectra acquisition (part 1), followed by centroiding and conversion of the raw files into an open format (part 2). The centroided fragment ion spectra are searched against a protein sequence database to establish PSMs (part 3). The confidently assigned spectra are then converted into a spectral library, and all retention times are normalized and converted into iRTs (part 4). To address complications when building libraries for post-translationally modified peptides, an optional subroutine has been developed to account for potential errors in site localization of modifications. After consensus library generation, the most intense fragment ions of each peptide precursor are selected (part 5). Optionally, the resulting assay library in table format can be converted into TraML format, and decoy transition groups can be added if required for downstream analysis. RT, retention time.

  2. Splitting peptide identifications with distant elution times.
    Figure 2: Splitting peptide identifications with distant elution times.

    During a DDA search, it may happen that multiple fragment ion spectra are assigned to the same peptide precursor, even though they span a wide retention time segment and might not come from the exact same molecular species. This is not a rare event, especially in the context of post-translationally modified peptides in which the modification cannot be unambiguously assigned to a certain amino acid. This figure depicts such ambiguous peak assignment on the example of a phospho-peptide containing a phosphorylated serine (S*) in presence of a second, unphosphorylated serine (S). Fragment ion spectra recorded at distant retention times can be clustered apart during the SWATH assay library generation. The distinct SWATH assays might then be used to resolve the correct assignment on the level of SWATH MS data. See Supplementary Note 5 for examples.

  3. Comparison of converters.
    Supplementary Fig. 1: Comparison of converters.

    (a) Violin plots show the intrinsic fragment ion spectrum variability of three yeast sample injections converted with three DDA centroiding algorithms.

    (b) Violin plots show how well the relative intensities of the six most intense fragment ions obtained by a certain centroiding algorithm compare to the relative intensities of the same six fragments in SWATH MS data.

    The spectrum similarity was determined using the normalised spectral contrast angle as described by Toprak and co-workers41. N indicates the number of pairwise comparisons of fragment ion spectra for the same peptide precursors between each data file and S indicates the estimated level of dissimilarity obtained by comparing the experimental score distribution to the score distribution of a perturbation benchmark dataset (the lower S, the more similar the spectra).

    (c) Depending on the centroiding algorithm implemented in the different peak pickers, the number of peptide and protein identifications from a database search vary slightly (after filtering for protein-level FDR=1%). The size of the circles is proportional to the number of identifications.

  4. MAYU-estimated FDRs with respect to iProphet probability thresholds.
    Supplementary Fig. 2: MAYU-estimated FDRs with respect to iProphet probability thresholds.

    The software MAYU estimates the FDRs on PSM (mFDR), peptide (pepFDR), and protein (protFDR) level with respect to the applied iProphet probability threshold. The data is based on the case study described in the protocol, which consists of three whole cell lysates of yeast. As comparison the protein FDR of a very large data set of 331 runs of different human cells and tissues is shown (Pan Human Library as described by Rosenberger and colleagues23).

  5. Comparison of consensus and best replicate spectral libraries.
    Supplementary Fig. 3: Comparison of consensus and best replicate spectral libraries.

    a) Violin plots show the intrinsic fragment ion spectrum variability of three injections of a yeast sample converted with the qtofpeakpicker and summarised using either the best replicate fragment ion spectrum or consensus algorithm implemented in SpectraST11.

    (b) Violin plots show how well the relative intensities of the six most intense fragment ions compare to the relative intensities of the same six fragments in SWATH MS data.

    The spectrum similarity was determined using the normalised spectral contrast angle as described by Toprak and co-workers41. N indicates the number of pairwise comparisons of fragment ion spectra for the same peptide precursors between each data file and S indicates the estimated level of dissimilarity obtained by comparing the experimental score distribution to the score distribution of a perturbation benchmark dataset (the lower S, the more similar the spectra). Common neutral losses were included in the comparison.

References

  1. Domon, B. & Aebersold, R. Options and considerations when selecting a quantitative proteomics strategy. Nat. Biotechnol. 28, 710721 (2010).
  2. Picotti, P. & Aebersold, R. Selected reaction monitoring-based proteomics: workflows, potential, pitfalls and future directions. Nat. Methods 9, 555566 (2012).
  3. Gillet, L.C. et al. Targeted data extraction of the MS/MS spectra generated by data-independent acquisition: a new concept for consistent and accurate proteome analysis. Mol. Cell. Proteomics 11, O111.016717 (2012).
  4. Venable, J.D., Dong, M.-Q., Wohlschlegel, J., Dillin, A. & Yates, J.R. III. Automated approach for quantitative analysis of complex peptide mixtures from tandem mass spectra. Nat. Methods 1, 3945 (2004).
  5. Chapman, J.D., Goodlett, D.R. & Masselon, C.D. Multiplexed and data-independent tandem mass spectrometry for global proteome profiling. Mass Spectrom. Rev. 33, 452470 (2014).
  6. Röst, H.L. et al. OpenSWATH enables automated, targeted analysis of data-independent acquisition MS data. Nat. Biotechnol. 32, 219223 (2014).
  7. Bernhardt, O.M. et al. Spectronaut: a fast and efficient algorithm for MRM-like processing of data independent acquisition (SWATH-MS) data. F1000Posters Presented at the 60th American Society for Mass Spectrometry Conference, 20–24 May 2012 5, 1092 (2014).
  8. MacLean, B. et al. Skyline: an open source document editor for creating and analyzing targeted proteomics experiments. Bioinformatics 26, 966968 (2010).
  9. Reiter, L. et al. mProphet: automated data processing and statistical validation for large-scale SRM experiments. Nat. Methods 8, 430435 (2011).
  10. Zi, J. et al. Expansion of the ion library for mining SWATH-MS data through fractionation proteomics. Anal. Chem. 86, 72427246 (2014).
  11. Lam, H. et al. Building consensus spectral libraries for peptide identification in proteomics. Nat. Methods 5, 873875 (2008).
  12. Hughes, M.A., Silva, J.C., Geromanos, S.J. & Townsend, C.A. Quantitative proteomic analysis of drug-induced changes in mycobacteria. J. Proteome Res. 5, 5463 (2006).
  13. Frewen, B.E., Merrihew, G.E., Wu, C.C., Noble, W.S. & MacCoss, M.J. Analysis of peptide MS/MS spectra from large-scale proteomics experiments using spectrum libraries. Anal. Chem. 78, 56785684 (2006).
  14. Picotti, P. et al. A database of mass spectrometric assays for the yeast proteome. Nat. Methods 5, 913914 (2008).
  15. Prakash, A. et al. Expediting the development of targeted SRM assays: using data from shotgun proteomics to automate method development. J. Proteome Res. 8, 27332739 (2009).
  16. Picotti, P. et al. High-throughput generation of selected reaction-monitoring assays for proteins and proteomes. Nat. Methods 7, 4346 (2010).
  17. Collins, B.C. et al. Quantifying protein interaction dynamics by SWATH mass spectrometry: application to the 14-3-3 system. Nat. Methods 10, 12461253 (2013).
  18. Picotti, P. et al. A complete mass-spectrometric map of the yeast proteome applied to quantitative trait analysis. Nature 494, 266270 (2013).
  19. Schubert, O.T. et al. The Mtb Proteome library: a resource of assays to quantify the complete proteome of Mycobacterium tuberculosis. Cell Host Microbe 13, 602612 (2013).
  20. Karlsson, C., Malmström, L., Aebersold, R. & Malmström, J.A. Proteome-wide selected reaction monitoring assays for the human pathogen Streptococcus pyogenes. Nat. Commun. 3, 1301 (2012).
  21. Hüttenhain, R. et al. N-Glycoprotein SRMAtlas: a resource of mass-spectrometric assays for N-glycosites enabling consistent and multiplexed protein quantification for clinical applications. Mol. Cell. Proteomics 12, 10051016 (2013).
  22. Hüttenhain, R. et al. Reproducible quantification of cancer-associated proteins in body fluids using targeted proteomics. Sci. Transl. Med. 4, 142ra94 (2012).
  23. Rosenberger, G. et al. A repository of assays to quantify 10,000 human proteins by SWATH-MS. Sci. Data 1, 140031 (2014).
  24. Deutsch, E.W. et al. A guided tour of the trans-proteomic pipeline. Proteomics 10, 11501159 (2010).
  25. Chambers, M.C. et al. A cross-platform toolkit for mass spectrometry and proteomics. Nat. Biotechnol. 30, 918920 (2012).
  26. Sturm, M. et al. OpenMS: an open-source software framework for mass spectrometry. BMC Bioinformatics 9, 163 (2008).
  27. Cox, J. & Mann, M. MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat. Biotechnol. 26, 13671372 (2008).
  28. Lam, H. et al. Development and validation of a spectral library searching method for peptide identification from MS/MS. Proteomics 7, 655667 (2007).
  29. Lam, H. & Aebersold, R. Building and searching tandem mass (MS/MS) spectral libraries for peptide identification in proteomics. Methods 54, 424431 (2011).
  30. Weisbrod, C.R., Eng, J.K., Hoopmann, M.R., Baker, T. & Bruce, J.E. Accurate peptide fragment mass analysis: multiplexed peptide identification and quantification. J. Proteome Res. 11, 16211632 (2012).
  31. Selevsek, N. et al. Reproducible and consistent quantification of the Saccharomyces cerevisiae proteome by SWATH-MS. Mol. Cell. Proteomics, http://dx.doi.org/10.1074/mcp.M113.035550 (2015).
  32. Heller, M. et al. Added value for tandem mass spectrometry shotgun proteomics data validation through isoelectric focusing of peptides. J. Proteome Res. 4, 22732282 (2005).
  33. Stergachis, A.B., MacLean, B., Lee, K., Stamatoyannopoulos, J.A. & MacCoss, M.J. Rapid empirical discovery of optimal peptides for targeted proteomics. Nat. Methods 8, 10411043 (2011).
  34. Qeli, E. et al. Improved prediction of peptide detectability for targeted proteomics using a rank-based algorithm and organism-specific data. Proteomics 108, 269283 (2014).
  35. Mallick, P. et al. Computational prediction of proteotypic peptides for quantitative proteomics. Nat. Biotechnol. 25, 125131 (2006).
  36. Eyers, C.E. et al. CONSeQuence: prediction of reference peptides for absolute quantitative proteomics using consensus machine learning approaches. Mol. Cell. Proteomics 10, M110.003384 (2011).
  37. Fusaro, V.A., Mani, D.R., Mesirov, J.P. & Carr, S.A. Prediction of high-responding peptides for targeted protein assays by mass spectrometry. Nat. Biotechnol. 27, 190198 (2009).
  38. Tang, H. et al. A computational approach toward label-free protein quantification using predicted peptide detectability. Bioinformatics 22, e481e488 (2006).
  39. Webb-Robertson, B.-J.M. et al. A support vector machine model for the prediction of proteotypic peptides for accurate mass and time proteomics. Bioinformatics 24, 15031509 (2008).
  40. Li, S., Arnold, R.J., Tang, H. & Radivojac, P. On the accuracy and limits of peptide fragmentation spectrum prediction. Anal. Chem. 83, 790796 (2011).
  41. Escher, C. et al. Using iRT, a normalized retention time for more targeted measurement of peptides. Proteomics 12, 11111121 (2012).
  42. Toprak, U.H. et al. Conserved peptide fragmentation as a benchmarking tool for mass spectrometers and a discriminating feature for targeted proteomics. Mol. Cell. Proteomics 13, 20562071 (2014).
  43. de Graaf, E.L., Altelaar, A.F.M., van Breukelen, B., Mohammed, S. & Heck, A.J.R. Improving SRM assay development: a global comparison between triple quadrupole, ion trap, and higher energy CID peptide fragmentation spectra. J. Proteome Res. 10, 43344341 (2011).
  44. Deutsch, E. mzML: a single, unifying data format for mass spectrometer output. Proteomics 8, 27762777 (2008).
  45. Keller, A., Eng, J., Zhang, N., Li, X.-J. & Aebersold, R. A uniform proteomics MS/MS analysis platform utilizing open XML file formats. Mol. Syst. Biol. 1, 2005.0017 (2005).
  46. Elias, J.E. & Gygi, S.P. Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nat. Methods 4, 207214 (2007).
  47. Shteynberg, D. et al. iProphet: multi-level integrative analysis of shotgun proteomic data improves peptide and protein identification rates and error estimates. Mol. Cell. Proteomics 10, M111.007690 (2011).
  48. Shteynberg, D., Nesvizhskii, A.I., Moritz, R.L. & Deutsch, E.W. Combining results of multiple search engines in proteomics. Mol. Cell. Proteomics 12, 23832393 (2013).
  49. Picotti, P., Aebersold, R. & Domon, B. The implications of proteolytic background for shotgun proteomics. Mol. Cell. Proteomics 6, 15891598 (2007).
  50. Walmsley, S.J. et al. Comprehensive analysis of protein digestion using six trypsins reveals the origin of trypsin as a significant source of variability in proteomics. J. Proteome Res. 12, 56665680 (2013).
  51. Kim, J.-S., Monroe, M.E., Camp, D.G., Smith, R.D. & Qian, W.-J. In-source fragmentation and the sources of partially tryptic peptides in shotgun proteomics. J. Proteome Res. 12, 910916 (2013).
  52. Eng, J.K., Searle, B.C., Clauser, K.R. & Tabb, D.L. A face in the crowd: recognizing peptides through database search. Mol. Cell. Proteomics 10, R111.009522 (2011).
  53. Keller, A., Nesvizhskii, A.I., Kolker, E. & Aebersold, R. Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal. Chem. 74, 53835392 (2002).
  54. Reiter, L. et al. Protein identification false discovery rates for very large proteomics data sets generated by tandem mass spectrometry. Mol. Cell. Proteomics 8, 24052417 (2009).
  55. Liu, J. et al. Methods for peptide identification by spectral comparison. Proteome Sci. 5, 3 (2007).
  56. Röst, H.L., Malmström, L. & Aebersold, R. A computational tool to detect and avoid redundancy in selected reaction monitoring. Mol. Cell. Proteomics 11, 540549 (2012).
  57. Deutsch, E.W. et al. TraML--a standard format for exchange of selected reaction monitoring transition lists. Mol. Cell. Proteomics 11, R111.015040 (2012).
  58. Vizcaíno, J.A. et al. ProteomeXchange provides globally coordinated proteomics data submission and dissemination. Nat. Biotechnol. 32, 223226 (2014).
  59. Picotti, P., Bodenmiller, B., Mueller, L.N., Domon, B. & Aebersold, R. Full dynamic range proteome analysis of S. cerevisiae by targeted proteomics. Cell 138, 795806 (2009).
  60. Nesvizhskii, A.I., Keller, A., Kolker, E. & Aebersold, R. A statistical model for identifying proteins by tandem mass spectrometry. Anal. Chem. 75, 46464658 (2003).

Download references

Author information

  1. These authors contributed equally to this work.

    • Olga T Schubert,
    • Ludovic C Gillet &
    • Ben C Collins

Affiliations

  1. Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland.

    • Olga T Schubert,
    • Ludovic C Gillet,
    • Ben C Collins,
    • George Rosenberger,
    • Witold E Wolski &
    • Ruedi Aebersold
  2. PhD Program in Systems Biology, University of Zurich and ETH Zurich, Zurich, Switzerland.

    • Olga T Schubert &
    • George Rosenberger
  3. Institute for Immunology, University Medical Center of the Johannes-Gutenberg University Mainz, Mainz, Germany.

    • Pedro Navarro
  4. SystemsX.ch Biology IT (SyBIT), SystemsX.ch, Zurich, Switzerland.

    • Witold E Wolski
  5. Division of Biomedical Engineering and Department of Chemical and Biomolecular Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, China.

    • Henry Lam
  6. Department of Radiology, Stanford University School of Medicine, Stanford, California, USA.

    • Dario Amodei &
    • Parag Mallick
  7. Department of Genome Sciences, University of Washington, Seattle, Washington, USA.

    • Brendan MacLean
  8. Faculty of Science, University of Zurich, Zurich, Switzerland.

    • Ruedi Aebersold

Contributions

O.T.S., L.C.G. and B.C.C. developed the workflow and wrote the manuscript; P.N. developed the tools spectrast2tsv.py and spectrast_cluster.py; G.R. and H.L. developed and implemented the retention time normalization and iRT calibration in SpectraST; W.E.W. developed the qtofpeakpicker; D.A., P.M. and B.M. implemented automated SWATH library import into Skyline; and R.A. directed the project and contributed to writing the manuscript.

Competing financial interests

R.A. holds shares of Biognosys AG, which operates in the field covered by the article (products are Spectronaut software and iRT-kit).

Corresponding author

Correspondence to:

Author details

Supplementary information

Supplementary Figures

  1. Supplementary Figure 1: Comparison of converters. (217 KB)

    (a) Violin plots show the intrinsic fragment ion spectrum variability of three yeast sample injections converted with three DDA centroiding algorithms.

    (b) Violin plots show how well the relative intensities of the six most intense fragment ions obtained by a certain centroiding algorithm compare to the relative intensities of the same six fragments in SWATH MS data.

    The spectrum similarity was determined using the normalised spectral contrast angle as described by Toprak and co-workers41. N indicates the number of pairwise comparisons of fragment ion spectra for the same peptide precursors between each data file and S indicates the estimated level of dissimilarity obtained by comparing the experimental score distribution to the score distribution of a perturbation benchmark dataset (the lower S, the more similar the spectra).

    (c) Depending on the centroiding algorithm implemented in the different peak pickers, the number of peptide and protein identifications from a database search vary slightly (after filtering for protein-level FDR=1%). The size of the circles is proportional to the number of identifications.

  2. Supplementary Figure 2: MAYU-estimated FDRs with respect to iProphet probability thresholds. (272 KB)

    The software MAYU estimates the FDRs on PSM (mFDR), peptide (pepFDR), and protein (protFDR) level with respect to the applied iProphet probability threshold. The data is based on the case study described in the protocol, which consists of three whole cell lysates of yeast. As comparison the protein FDR of a very large data set of 331 runs of different human cells and tissues is shown (Pan Human Library as described by Rosenberger and colleagues23).

  3. Supplementary Figure 3: Comparison of consensus and best replicate spectral libraries. (92 KB)

    a) Violin plots show the intrinsic fragment ion spectrum variability of three injections of a yeast sample converted with the qtofpeakpicker and summarised using either the best replicate fragment ion spectrum or consensus algorithm implemented in SpectraST11.

    (b) Violin plots show how well the relative intensities of the six most intense fragment ions compare to the relative intensities of the same six fragments in SWATH MS data.

    The spectrum similarity was determined using the normalised spectral contrast angle as described by Toprak and co-workers41. N indicates the number of pairwise comparisons of fragment ion spectra for the same peptide precursors between each data file and S indicates the estimated level of dissimilarity obtained by comparing the experimental score distribution to the score distribution of a perturbation benchmark dataset (the lower S, the more similar the spectra). Common neutral losses were included in the comparison.

PDF files

  1. Supplementary Text and Figures (7,125 KB)

    Supplementary Figures 1–3, Supplementary Notes 1–5, Supplementary Tables 1–6 and Supplementary Tutorial

Additional data