Abstract

Consistent detection and quantification of protein post-translational modifications (PTMs) across sample cohorts is a prerequisite for functional analysis of biological processes. Data-independent acquisition (DIA) is a bottom-up mass spectrometry approach that provides complete information on precursor and fragment ions. However, owing to the convoluted structure of DIA data sets, confident, systematic identification and quantification of peptidoforms has remained challenging. Here, we present inference of peptidoforms (IPF), a fully automated algorithm that uses spectral libraries to query, validate and quantify peptidoforms in DIA data sets. The method was developed on data acquired by the DIA method SWATH-MS and benchmarked using a synthetic phosphopeptide reference data set and phosphopeptide-enriched samples. IPF reduced false site-localization by more than sevenfold compared with previous approaches, while recovering 85.4% of the true signals. Using IPF, we quantified peptidoforms in DIA data acquired from >200 samples of blood plasma of a human twin cohort and assessed the contribution of heritable, environmental and longitudinal effects on their PTMs.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

References

  1. 1.

    et al. State of the human proteome in 2014/2015 as viewed through PeptideAtlas: enhancing accuracy and coverage through the AtlasProphet. J. Proteome Res. 14, 3461–3473 (2015).

  2. 2.

    & Proteoform: a single term describing protein complexity. Nat. Methods 10, 186–187 (2013).

  3. 3.

    et al. neXtProt: a knowledge platform for human proteins. Nucleic Acids Res. 40, D76–D83 (2012).

  4. 4.

    et al. Proteomics. Tissue-based map of the human proteome. Science 347, 1260419 (2015).

  5. 5.

    et al. Too many roads not taken. Nature 470, 163–165 (2011).

  6. 6.

    Finding the right antibody for the job. Nat. Methods 10, 703–707 (2013).

  7. 7.

    Chemistry. Mass spectrometry: bottom-up or top-down? Science 314, 65–66 (2006).

  8. 8.

    & Mass spectrometry-based proteomics. Nature 422, 198–207 (2003).

  9. 9.

    & Mass-spectrometric exploration of proteome structure and function. Nature 537, 347–355 (2016).

  10. 10.

    & Options and considerations when selecting a quantitative proteomics strategy. Nat. Biotechnol. 28, 710–721 (2010).

  11. 11.

    & Selected reaction monitoring-based proteomics: workflows, potential, pitfalls and future directions. Nat. Methods 9, 555–566 (2012).

  12. 12.

    , , , & Parallel reaction monitoring for high resolution and high mass accuracy quantitative, targeted proteomics. Mol. Cell. Proteomics 11, 1475–1488 (2012).

  13. 13.

    , & Multiplexed and data-independent tandem mass spectrometry for global proteome profiling. Mass Spectrom. Rev. 33, 452–470 (2014).

  14. 14.

    et al. Targeted data extraction of the MS/MS spectra generated by data-independent acquisition: a new concept for consistent and accurate proteome analysis. Mol. Cell. Proteomics 11, 016717 (2012).

  15. 15.

    et al. A multicenter study benchmarks software tools for label-free proteome quantification. Nat. Biotechnol. 34, 1130–1136 (2016).

  16. 16.

    DIA mass spectrometry. Nat. Methods 12, 35 (2015).

  17. 17.

    & Software eyes for protein post-translational modifications. Mass Spectrom. Rev. 34, 133–147 (2015).

  18. 18.

    & Modification site localization scoring: strategies and performance. Mol. Cell. Proteomics 11, 3–14 (2012).

  19. 19.

    et al. Regulation of yeast central metabolism by enzyme phosphorylation. Mol. Syst. Biol. 8, 623 (2012).

  20. 20.

    et al. Reduced-representation phosphosignatures measured by quantitative targeted ms capture cellular states and enable large-scale comparison of drug-induced phenotypes. Mol. Cell. Proteomics 15, 1622–1641 (2016).

  21. 21.

    et al. Quantitative proteomic analysis by accurate mass retention time pairs. Anal. Chem. 77, 2187–2200 (2005).

  22. 22.

    et al. DIA-Umpire: comprehensive computational framework for data-independent acquisition proteomics. Nat. Methods 12, 258–264, 7, 264 (2015).

  23. 23.

    et al. Group-DIA: analyzing multiple data-independent acquisition mass spectrometry data files. Nat. Methods 12, 1105–1106 (2015).

  24. 24.

    et al. MSPLIT-DIA: sensitive peptide identification for data-independent acquisition. Nat. Methods 12, 1106–1108 (2015).

  25. 25.

    et al. Peptide-centric proteome analysis: an alternative strategy for the analysis of tandem mass spectrometry data. Mol. Cell. Proteomics 14, 2301–2307 (2015).

  26. 26.

    et al. OpenSWATH enables automated, targeted analysis of data-independent acquisition MS data. Nat. Biotechnol. 32, 219–223 (2014).

  27. 27.

    et al. Sequential window acquisition of all theoretical mass spectra (SWATH) analysis for characterization and quantification of histone post-translational modifications. Mol. Cell. Proteomics 14, 2420–2428 (2015).

  28. 28.

    , , & Quantification of SAHA-dependent changes in histone modifications using data-independent acquisition mass spectrometry. J. Proteome Res. 14, 3252–3262 (2015).

  29. 29.

    & Data-independent-acquisition mass spectrometry for identification of targeted-peptide site-specific modifications. Anal. Bioanal. Chem. 407, 6627–6635 (2015).

  30. 30.

    , , & Plug-and-play analysis of the human phosphoproteome by targeted high-resolution mass spectrometry. Nat. Methods 13, 431–434 (2016).

  31. 31.

    et al. Opening a SWATH window on posttranslational modifications: automated pursuit of modified peptides. Mol. Cell. Proteomics 15, 1151–1163 (2016).

  32. 32.

    et al. Building high-quality assay libraries for targeted analysis of SWATH MS data. Nat. Protoc. 10, 426–441 (2015).

  33. 33.

    et al. Targeted peptide measurements in biology and medicine: best practices for mass spectrometry-based assay development using a fit-for-purpose approach. Mol. Cell. Proteomics 13, 907–917 (2014).

  34. 34.

    et al. TRIC: an automated alignment strategy for reproducible protein quantification in targeted proteomics. Nat. Methods 13, 777–783 (2016).

  35. 35.

    et al. A sentinel protein assay for simultaneously quantifying cellular processes. Nat. Methods 11, 1045–1048 (2014).

  36. 36.

    , & Statistical validation of peptide identifications in large-scale proteomics using the target-decoy database search strategy and flexible mixture modeling. J. Proteome Res. 7, 286–292 (2008).

  37. 37.

    & Semisupervised model-based validation of peptide identifications in mass spectrometry-based proteomics. J. Proteome Res. 7, 254–265 (2008).

  38. 38.

    , , , & LuciPHOr: algorithm for phosphorylation site localization with false localization rate estimation using modified target-decoy approach. Mol. Cell. Proteomics 12, 3409–3419 (2013).

  39. 39.

    , , & LuciPHOr2: site localization of generic post-translational modifications from tandem mass spectrometry data. Bioinformatics 31, 1141–1143 (2015).

  40. 40.

    et al. Phosphoproteomic analysis of distinct tumor cell lines in response to nocodazole treatment. Proteomics 9, 2861–2874 (2009).

  41. 41.

    et al. mapDIA: Preprocessing and statistical analysis of quantitative proteomics data from data independent acquisition mass spectrometry. J. Proteomics 129, 108–120 (2015).

  42. 42.

    et al. Quantifying protein interaction dynamics by SWATH mass spectrometry: application to the 14-3-3 system. Nat. Methods 10, 1246–1253 (2013).

  43. 43.

    et al. Quantitative variability of 342 plasma proteins in a human twin population. Mol. Syst. Biol. 11, 786 (2015).

  44. 44.

    , , & Mass spectrometry of peptides and proteins from human blood. Mass Spectrom. Rev. 30, 685–732 (2011).

  45. 45.

    & Mass spectrometric analysis of asparagine deamidation and aspartate isomerization in polypeptides. Electrophoresis 31, 1764–1772 (2010).

  46. 46.

    et al. Variation and quantification among a target set of phosphopeptides in human plasma by multiple reaction monitoring and SWATH-MS2 data-independent acquisition. Electrophoresis 35, 3487–3497 (2014).

  47. 47.

    , , & Protein tyrosine nitration: selectivity, physicochemical and biological consequences, denitration, and proteomics methods for the identification of tyrosine-nitrated proteins. J. Proteome Res. 8, 3222–3238 (2009).

  48. 48.

    et al. Quantitative profiling of post-translational modifications by immunoaffinity enrichment and LC-MS/MS in cancer serum without immunodepletion. Mol. Cell. Proteomics 15, 692–702 (2016).

  49. 49.

    et al. Reproducible quantification of cancer-associated proteins in body fluids using targeted proteomics. Sci. Transl. Med. 4, 142ra94 (2012).

  50. 50.

    & The human plasma proteome: history, character, and diagnostic prospects. Mol. Cell. Proteomics 1, 845–867 (2002).

  51. 51.

    et al. Myeloperoxidase, paraoxonase-1, and HDL form a functional ternary complex. J. Clin. Invest. 123, 3815–3828 (2013).

  52. 52.

    et al. An abundant dysfunctional apolipoprotein A1 in human atheroma. Nat. Med. 20, 193–203 (2014).

  53. 53.

    , , & Crystal structure of truncated human apolipoprotein A-I suggests a lipid-bound conformation. Proc. Natl. Acad. Sci. USA 94, 12291–12296 (1997).

  54. 54.

    et al. Characterization of specifically oxidized apolipoproteins in mildly oxidized high density lipoprotein. J. Lipid Res. 44, 349–355 (2003).

  55. 55.

    et al. Multiplexed MS/MS for improved data-independent acquisition. Nat. Methods 10, 744–746 (2013).

  56. 56.

    , , , & Untargeted, spectral library-free analysis of data-independent acquisition proteomics data generated using Orbitrap mass spectrometers. Proteomics 16, 2257–2271 (2016).

  57. 57.

    et al. Evaluation of multiprotein immunoaffinity subtraction for plasma proteomics and candidate biomarker discovery using mass spectrometry. Mol. Cell. Proteomics 5, 2167–2174 (2006).

  58. 58.

    et al. A clean, more efficient method for in-solution digestion of protein mixtures without detergent or urea. J. Proteome Res. 5, 3446–3452 (2006).

  59. 59.

    et al. Using iRT, a normalized retention time for more targeted measurement of peptides. Proteomics 12, 1111–1121 (2012).

  60. 60.

    et al. Quantitative measurements of N-linked glycoproteins in human plasma by SWATH-MS. Proteomics 13, 1247–1256 (2013).

  61. 61.

    et al. A cross-platform toolkit for mass spectrometry and proteomics. Nat. Biotechnol. 30, 918–920 (2012).

  62. 62.

    , & Comet: an open-source MS/MS sequence database search tool. Proteomics 13, 22–24 (2013).

  63. 63.

    , , & Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal. Chem. 74, 5383–5392 (2002).

  64. 64.

    et al. A guided tour of the Trans-Proteomic Pipeline. Proteomics 10, 1150–1159 (2010).

  65. 65.

    et al. iProphet: multi-level integrative analysis of shotgun proteomic data improves peptide and protein identification rates and error estimates. Mol. Cell. Proteomics 10, 007690 (2011).

  66. 66.

    et al. Building consensus spectral libraries for peptide identification in proteomics. Nat. Methods 5, 873–875 (2008).

  67. 67.

    et al. OpenMS: a flexible open-source software platform for mass spectrometry data analysis. Nat. Methods 13, 741–748 (2016).

  68. 68.

    et al. Saccharomyces Genome Database: the genomics resource of budding yeast. Nucleic Acids Res. 40, D700–D705 (2012).

  69. 69.

    & UniProt Consortium UniProt Knowledgebase: a hub of integrated protein data. Database (Oxford) 2011, bar009 (2011).

  70. 70.

    et al. Robust phosphoproteome enrichment using monodisperse microsphere-based immobilized titanium (IV) ion affinity chromatography. Nat. Protoc. 8, 461–480 (2013).

  71. 71.

    , & The MaxQuant computational platform for mass spectrometry-based shotgun proteomics. Nat. Protoc. 11, 2301–2319 (2016).

  72. 72.

    & Multipoint quantitative-trait linkage analysis in general pedigrees. Am. J. Hum. Genet. 62, 1198–1211 (1998).

  73. 73.

    & Statistical significance for genomewide studies. Proc. Natl. Acad. Sci. USA 100, 9440–9445 (2003).

  74. 74.

    et al. DIANA--algorithmic improvements for analysis of data-independent acquisition MS data. Bioinformatics 31, 555–562 (2015).

  75. 75.

    et al. 2016 update of the PRIDE database and its related tools. Nucleic Acids Res. 44D1, D447–D456 (2016).

Download references

Acknowledgements

H.L.R. was funded by the Swiss National Science Foundation (SNSF grant P2EZP3 162268). R.A. was supported by ERC Proteomics v3.0 (AdG-233226 Proteomics v.3.0 and AdG-670821 Proteomics 4D) and the Swiss National Science Foundation (SNSF) (31003A_166435). We would like to thank L. Gillet and A. Leitner for insightful discussions on post-translational modification and SWATH-MS. We are grateful to all twin registry participants recruited in this study. For the unit of Twins UK, this study was funded by the Wellcome Trust and EC's Seventh Framework Programme (FP7/2007-2013) and also received support from the National Institute for Health Research-funded BioResource, Clinical Research Facility and Biomedical Research Centre based at Guy's and St Thomas' NHS Foundation Trust in partnership with King's College London. Further acknowledgments go to the Scientific IT Support team of ETH Zurich for support and maintenance of the lab-internal computing infrastructure, the HPC team (Brutus) and the OpenMS and PyProphet developers for including IPF in the OpenMS and PyProphet frameworks. We thank the PRIDE team for proteomic data deposition.

Author information

Author notes

    • George Rosenberger
    •  & Yansheng Liu

    These authors contributed equally to this work.

Affiliations

  1. Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland.

    • George Rosenberger
    • , Yansheng Liu
    • , Hannes L Röst
    • , Christina Ludwig
    • , Ariel Bensimon
    • , Ben C Collins
    • , Lars Malmström
    •  & Ruedi Aebersold
  2. PhD Program in Systems Biology, University of Zurich and ETH Zurich, Zurich, Switzerland.

    • George Rosenberger
  3. Department of Genetics, Stanford University, Stanford, California, USA.

    • Hannes L Röst
  4. Bavarian Biomolecular Mass Spectrometry Center (BayBioMS), Technical University Munich, Freising, Germany.

    • Christina Ludwig
  5. Research Institute of Biological Psychiatry, Mental Health Center Sct. Hans, Roskilde, Denmark.

    • Alfonso Buil
  6. Department of Biology, Institute of Biochemistry, ETH Zurich, Zurich, Switzerland.

    • Martin Soste
  7. Department of Twin Research and Genetic Epidemiology, King's College London, St Thomas' Hospital Campus, London, UK.

    • Tim D Spector
  8. Department of Genetic Medicine and Development, University of Geneva Medical School, Geneva, Switzerland.

    • Emmanouil T Dermitzakis
  9. S3IT, University of Zurich, Zurich, Switzerland.

    • Lars Malmström
  10. Faculty of Science, University of Zurich, Zurich, Switzerland.

    • Ruedi Aebersold

Authors

  1. Search for George Rosenberger in:

  2. Search for Yansheng Liu in:

  3. Search for Hannes L Röst in:

  4. Search for Christina Ludwig in:

  5. Search for Alfonso Buil in:

  6. Search for Ariel Bensimon in:

  7. Search for Martin Soste in:

  8. Search for Tim D Spector in:

  9. Search for Emmanouil T Dermitzakis in:

  10. Search for Ben C Collins in:

  11. Search for Lars Malmström in:

  12. Search for Ruedi Aebersold in:

Contributions

G.R. developed and implemented IPF and analyzed the synthetic phosphopeptide reference, enriched phosphopeptide, 14-3-3β and twin study data. Y.L. provided and analyzed the twin study data. H.L.R. developed and implemented the MS1 scoring and quantification in OpenSWATH. C.L. provided the synthetic phosphopeptide reference sample and acquired the data. A. Buil and G.R. conducted the heritability analysis of the twin study. A. Bensimon and Y.L. conducted the enriched phosphopeptide experiment and acquired the data. M.S. provided the synthetic phosphopeptide reference sample. B.C.C. analyzed the 14-3-3β data. T.D.S. and E.T.D. designed and supervised the twin study. All authors provided critical input on the project. G.R., Y.L. and R.A. wrote the paper with feedback from all authors. L.M. supervised the development of IPF and conducted the protein-level PTM meta-analysis. R.A. designed and supervised the study.

Competing interests

R.A. holds shares of Biognosys AG, which operates in the field covered by the article. The remaining authors declare no competing financial interest.

Corresponding author

Correspondence to Ruedi Aebersold.

Integrated supplementary information

Supplementary information

PDF files

  1. 1.

    Supplementary Text and Figures

    Supplementary Figures 1–14 and Supplementary Notes

Text files

  1. 1.

    Supplementary Table 1

    Synthetic phosphopeptide reference data set: Peptide sequences

CSV files

  1. 1.

    Supplementary Table 2

    Twin plasma data set: Summary statistics of spectral library coverage and IPF detectability

  2. 2.

    Supplementary Table 3

    Twin plasma data set: Quantitative variance components of peptidoforms

  3. 3.

    Supplementary Table 4

    Twin plasma data set: Peptide coverage of proteins

Zip files

  1. 1.

    Supplementary Data 1

    Twin plasma data set: Protein-level graphical reports

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/nbt.3908

Further reading