Abstract

Liquid chromatography coupled to tandem mass spectrometry (LC-MS/MS) is the main method for high-throughput identification and quantification of peptides and inferred proteins. Within this field, data-independent acquisition (DIA) combined with peptide-centric scoring, as exemplified by the technique SWATH-MS, has emerged as a scalable method to achieve deep and consistent proteome coverage across large-scale data sets. We demonstrate that statistical concepts developed for discovery proteomics based on spectrum-centric scoring can be adapted to large-scale DIA experiments that have been analyzed with peptide-centric scoring strategies, and we provide guidance on their application. We show that optimal tradeoffs between sensitivity and specificity require careful considerations of the relationship between proteins in the samples and proteins represented in the spectral library. We propose the application of a global analyte constraint to prevent the accumulation of false positives across large-scale data sets. Furthermore, to increase the quality and reproducibility of published proteomic results, well-established confidence criteria should be reported for the detected peptide queries, peptides and inferred proteins.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

References

  1. 1.

    & Options and considerations when selecting a quantitative proteomics strategy. Nat. Biotechnol. 28, 710–721 (2010).

  2. 2.

    , & Multiplexed and data-independent tandem mass spectrometry for global proteome profiling. Mass Spectrom. Rev. 33, 452–470 (2014).

  3. 3.

    , & Mass spectrometry applied to bottom-up proteomics: entering the high-throughput era for hypothesis testing. Annu. Rev. Anal. Chem. (Palo Alto Calif.) 9, 449–472 (2016).

  4. 4.

    et al. Peptide-centric proteome analysis: an alternative strategy for the analysis of tandem mass spectrometry data. Mol. Cell. Proteomics 14, 2301–2307 (2015).

  5. 5.

    et al. Quantitative proteomic analysis by accurate mass-retention-time pairs. Anal. Chem. 77, 2187–2200 (2005).

  6. 6.

    et al. DIA-Umpire: comprehensive computational framework for data-independent acquisition proteomics. Nat. Methods 12, 258–264 (2015).

  7. 7.

    et al. MSPLIT-DIA: sensitive peptide identification for data-independent acquisition. Nat. Methods 12, 1106–1108 (2015).

  8. 8.

    et al. Group-DIA: analyzing multiple data-independent acquisition mass spectrometry data files. Nat. Methods 12, 1105–1106 (2015).

  9. 9.

    et al. Targeted data extraction of the MS/MS spectra generated by data-independent acquisition: a new concept for consistent and accurate proteome analysis. Mol. Cell. Proteomics 11, O111.016717 (2012).

  10. 10.

    et al. OpenSWATH enables automated, targeted analysis of data-independent acquisition MS data. Nat. Biotechnol. 32, 219–223 (2014).

  11. 11.

    et al. DIANA—algorithmic improvements for analysis of data-independent acquisition MS data. Bioinformatics 31, 555–562 (2015).

  12. 12.

    et al. Skyline: an open-source document editor for creating and analyzing targeted proteomics experiments. Bioinformatics 26, 966–968 (2010).

  13. 13.

    et al. Extending the limits of quantitative proteome profiling with data-independent acquisition and application to acetaminophen-treated three-dimensional liver microtissues. Mol. Cell. Proteomics 14, 1400–1410 (2015).

  14. 14.

    et al. Targeted peptide measurements in biology and medicine: best practices for mass-spectrometry-based assay development using a fit-for-purpose approach. Mol. Cell. Proteomics 13, 907–917 (2014).

  15. 15.

    & Controlling the false discovery rate—a practical and powerful approach to multiple testing. J. R. Stat. Soc. B Stat. Methodol. 57, 289–300 (1995).

  16. 16.

    , , & Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal. Chem. 74, 5383–5392 (2002).

  17. 17.

    & Semi-supervised model-based validation of peptide identifications in mass-spectrometry-based proteomics. J. Proteome Res. 7, 254–265 (2008).

  18. 18.

    , , & Posterior error probabilities and false discovery rates: two sides of the same coin. J. Proteome Res. 7, 40–44 (2008).

  19. 19.

    & Operating characteristics and extensions of the false discovery rate procedure. J. R. Stat. Soc. B Stat. Methodol. 64, 499–517 (2002).

  20. 20.

    & An adaptive single-step FDR procedure with applications to DNA microarray analysis. Biom. J. 49, 127–135 (2007).

  21. 21.

    The positive false discovery rate: a Bayesian interpretation and the q-value. Ann. Stat. 31, 2013–2035 (2003).

  22. 22.

    A survey of computational methods and error-rate estimation procedures for peptide and protein identification in shotgun proteomics. J. Proteomics 73, 2092–2123 (2010).

  23. 23.

    , , , & Semi-supervised learning for peptide identification from shotgun proteomics data sets. Nat. Methods 4, 923–925 (2007).

  24. 24.

    & A review of statistical methods for protein identification using tandem mass spectrometry. Stat. Interface 5, 3–20 (2012).

  25. 25.

    , & How to talk about protein-level false discovery rates in shotgun proteomics. Proteomics 16, 2461–2469 (2016).

  26. 26.

    et al. iProphet: multilevel integrative analysis of shotgun proteomic data improves peptide and protein identification rates and error estimates. Mol. Cell. Proteomics 10, M111.007690 (2011).

  27. 27.

    et al. Protein identification false discovery rates for very large proteomics data sets generated by tandem mass spectrometry. Mol. Cell. Proteomics 8, 2405–2417 (2009).

  28. 28.

    , , , & A scalable approach for protein false discovery rate estimation in large proteomic data sets. Mol. Cell. Proteomics 14, 2394–2404 (2015).

  29. 29.

    , , & Fast and accurate protein false discovery rates on large-scale proteomics data sets with Percolator 3.0. J. Am. Soc. Mass Spectrom. 27, 1719–1727 (2016).

  30. 30.

    , & Statistical validation of peptide identifications in large-scale proteomics using the target-decoy database search strategy and flexible mixture modeling. J. Proteome Res. 7, 286–292 (2008).

  31. 31.

    , , , & Generating and navigating proteome maps using mass spectrometry. Nat. Rev. Mol. Cell Biol. 11, 789–801 (2010).

  32. 32.

    et al. mProphet: automated data processing and statistical validation for large-scale SRM experiments. Nat. Methods 8, 430–435 (2011).

  33. 33.

    , , & Proteome-wide selected reaction monitoring assays for the human pathogen Streptococcus pyogenes. Nat. Commun. 3, 1301 (2012).

  34. 34.

    et al. The Mtb proteome library: a resource of assays to quantify the complete proteome of Mycobacterium tuberculosis. Cell Host Microbe 13, 602–612 (2013).

  35. 35.

    et al. A complete mass spectrometric map of the yeast proteome applied to quantitative trait analysis. Nature 494, 266–270 (2013).

  36. 36.

    et al. A repository of assays to quantify 10,000 human proteins by SWATH-MS. Sci. Data 1, 140031 (2014).

  37. 37.

    et al. Multi-laboratory assessment of reproducibility, qualitative and quantitative performance of SWATH–mass spectrometry. Nat. Commun. 8, DOI: 10.1038/s41467-017-00249-5 (2017).

  38. 38.

    et al. Quantitative variability of 342 plasma proteins in a human twin population. Mol. Syst. Biol. 11, 786 (2015).

  39. 39.

    et al. Reproducible and consistent quantification of the Saccharomyces cerevisiae proteome by SWATH-MS. Mol. Cell. Proteomics 14, 739–749 (2015).

  40. 40.

    et al. Rapid mass spectrometric conversion of tissue biopsy samples into permanent quantitative digital proteome maps. Nat. Med. 21, 407–413 (2015).

  41. 41.

    et al. Absolute proteome composition and dynamics during dormancy and resuscitation of Mycobacterium tuberculosis. Cell Host Microbe 18, 96–108 (2015).

  42. 42.

    et al. Building high-quality assay libraries for targeted analysis of SWATH-MS data. Nat. Protoc. 10, 426–441 (2015).

  43. 43.

    & Statistical significance for genome-wide studies. Proc. Natl. Acad. Sci. USA 100, 9440–9445 (2003).

  44. 44.

    & Solution to statistical challenges in proteomics is more statistics, not less. J. Proteome Res. 14, 4099–4103 (2015).

  45. 45.

    , & SWATH2stats: an R/Bioconductor package to process and convert quantitative SWATH-MS proteomics data for downstream analysis tools. PLoS One 11, e0153160 (2016).

  46. 46.

    , , , & Untargeted, spectral library-free analysis of data-independent acquisition proteomics data generated using Orbitrap mass spectrometers. Proteomics 16, 2257–2271 (2016).

  47. 47.

    , , , & Automated validation of results and removal of fragment ion interferences in targeted analysis of data-independent acquisition mass spectrometry (MS) using SWATHProphet. Mol. Cell. Proteomics 14, 1411–1418 (2015).

  48. 48.

    & False discovery rates of protein identifications: a strike against the two-peptide rule. J. Proteome Res. 8, 4173–4181 (2009).

  49. 49.

    et al. Advancing urinary protein biomarker discovery by data-independent acquisition on a quadrupole-orbitrap mass spectrometer. J. Proteome Res. 14, 4752–4762 (2015).

  50. 50.

    et al. The PRoteomics IDEntifications (PRIDE) database and associated tools: status in 2013. Nucleic Acids Res. 41, D1063–D1069 (2013).

Download references

Acknowledgements

Please note that M.H., C.L.H., Y.L., M.J.M., B.X.M., A.I.N., P.G.A.P., L.R., H.L.R., S.T. and Y.S.T. were added to the author list in alphabetical order. We thank the authors of the SWATH-MS interlaboratory study and of the human blood plasma data set for providing the data to conduct this study. We also thank the Scientific IT Support (ID SIS) and the high-performance computing (HPC) teams of ETH Zurich for support and maintenance of the computing infrastructure. M.H. was supported by a grant from the Institut Mérieux; A.I.N. was funded by the US National Institutes of Health (NIH; grant R01GM094231); H.L.R. was funded by the Swiss National Science Foundation (SNSF; grant P2EZP3 162268); B.C.C. was supported by a SNSF Ambizione grant (PZ00P3_161435); and R.A. was supported by ERC Proteomics v3.0 (AdG-233226 Proteomics v.3.0) and AdG-670821 Proteomics 4D), the PhosphonetX project of SystemsX.ch and the Swiss National Science Foundation (SNSF) grant 31003A_166435.

Author information

Author notes

    • George Rosenberger
    •  & Isabell Bludau

    These authors contributed equally to this work.

Affiliations

  1. Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland.

    • George Rosenberger
    • , Isabell Bludau
    • , Moritz Heusel
    • , Yansheng Liu
    • , Patrick G A Pedrioli
    • , Hannes L Röst
    • , Ben C Collins
    •  & Ruedi Aebersold
  2. PhD Program in Systems Biology, University of Zurich and ETH Zurich, Zurich, Switzerland.

    • George Rosenberger
    •  & Isabell Bludau
  3. ID Scientific IT Services, ETH Zurich, Zurich, Switzerland.

    • Uwe Schmitt
  4. PhD program in Molecular and Translational Biomedicine, Competence Center Personalized Medicine (CC-PM), ETH Zurich and University of Zurich, Zurich, Switzerland.

    • Moritz Heusel
  5. SCIEX, Redwood City, California, USA.

    • Christie L Hunter
  6. Department of Genome Sciences, University of Washington, Seattle, Washington, USA.

    • Michael J MacCoss
    • , Brendan X MacLean
    •  & Ying S Ting
  7. Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA.

    • Alexey I Nesvizhskii
  8. Department of Pathology, University of Michigan, Ann Arbor, Michigan, USA.

    • Alexey I Nesvizhskii
  9. Biognosys, Schlieren, Switzerland.

    • Lukas Reiter
  10. SCIEX, Concord, Ontario, Canada.

    • Stephen Tate
  11. Faculty of Science, University of Zurich, Zurich, Switzerland.

    • Ruedi Aebersold

Authors

  1. Search for George Rosenberger in:

  2. Search for Isabell Bludau in:

  3. Search for Uwe Schmitt in:

  4. Search for Moritz Heusel in:

  5. Search for Christie L Hunter in:

  6. Search for Yansheng Liu in:

  7. Search for Michael J MacCoss in:

  8. Search for Brendan X MacLean in:

  9. Search for Alexey I Nesvizhskii in:

  10. Search for Patrick G A Pedrioli in:

  11. Search for Lukas Reiter in:

  12. Search for Hannes L Röst in:

  13. Search for Stephen Tate in:

  14. Search for Ying S Ting in:

  15. Search for Ben C Collins in:

  16. Search for Ruedi Aebersold in:

Contributions

G.R., I.B. and R.A. wrote the paper with feedback from all authors; G.R. and B.C.C. developed the methods; I.B. analyzed the data set; U.S. and G.R. implemented the PyProphet extension; M.H., C.L.H., Y.L., M.J.M., B.X.M., A.I.N., P.G.A.P., L.R., H.L.R., S.T. and Y.S.T. provided critical input on the project; and B.C.C. and R.A. designed and supervised the study.

Competing interests

C.L.H. and S.T. are employees of SCIEX, which operates in the field of quantitative proteomics by data-independent acquisition covered by the article. M.J.M. is a paid consultant for Thermo Fisher Scientific, which operates in the field of quantitative proteomics by data-independent acquisition covered by the article. L.R. is employee of Biognosys AG, which operates in the field of quantitative proteomics by data-independent acquisition covered by the article. R.A. holds shares of Biognosys AG.

Corresponding authors

Correspondence to Ben C Collins or Ruedi Aebersold.

Integrated supplementary information

Supplementary information

PDF files

  1. 1.

    Supplementary Text and Figures

    Supplementary Figures 1–7, Supplementary Table 1 and Supplementary Notes 1–6.

  2. 2.

    Life Sciences Reporting Summary

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/nmeth.4398

Further reading