Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

A data analysis framework for combining multiple batches increases the power of isobaric proteomics experiments

Abstract

We present a framework for the analysis of multiplexed mass spectrometry proteomics data that reduces estimation error when combining multiple isobaric batches. Variations in the number and quality of observations have long complicated the analysis of isobaric proteomics data. Here we show that the power to detect statistical associations is substantially improved by utilizing models that directly account for known sources of variation in the number and quality of observations that occur across batches.

In a multibatch benchmarking experiment, our open-source software (msTrawler) increases the power to detect changes, especially in the range of less than twofold changes, while simultaneously increasing quantitative proteome coverage by utilizing more low-signal observations. Further analyses of previously published multiplexed datasets of 4 and 23 batches highlight both increased power and the ability to navigate complex missing data patterns without relying on unverifiable imputations or discarding reliable measurements.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: msTrawler workflow.
Fig. 2: The interbatch benchmarking experiment.
Fig. 3: Benchmarking performance.
Fig. 4: Application of msTrawler to a 4-batch TMT senescence experiment.
Fig. 5: msTrawler enables complete case analyses of a 23 TMT batch study without discarding data.

Similar content being viewed by others

Data availability

The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE33 partner repository with the dataset identifier PXD036799.

Code availability

The msTrawler R package, is available at www.github.com/Calico/msTrawler and the supplementary data and code used to generate the analyses in this paper are available at https://console.cloud.google.com/storage/browser/mstrawler_paper. A patent has been filed for the msTrawler data analysis framework and workflows.

References

  1. Gaun, A. et al. Automated 16-plex plasma proteomics with real-time search and ion mobility mass spectrometry enables large-scale profiling in naked mole-rats and mice.J. Proteome Res. 20, 1280–1295 (2021).

    Article  CAS  PubMed  Google Scholar 

  2. Muntel, J. et al. Comparison of protein quantification in a complex background by DIA and TMT workflows with fixed instrument time. J. Proteome Res. 18, 1340–1351 (2019).

    Article  CAS  PubMed  Google Scholar 

  3. Li, J. et al. TMTpro-18plex: the expanded and complete set of TMTpro reagents for sample multiplexing. J. Proteome Res. 20, 2964–2972 (2021).

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  4. Nusinow, D. P. et al. Quantitative proteomics of the cancer cell line encyclopedia. Cell 180, 387–402.e16 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Petralia, F. et al. Integrated proteogenomic characterization across major histological types of pediatric brain cancer. Cell 183, 1962–1985.e31 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Keele, G. R. et al. Regulation of protein abundance in genetically diverse mouse populations. Cell Genom. 1, 100003 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Brenes, A., Hukelmann, J., Bensaddek, D. & Lamond, A. I. Multibatch TMT reveals false positives, batch effects and missing values. Mol. Cell. Proteomics 18, 1967–1980 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. O’Brien, J. J. et al. Compositional proteomics: effects of spatial constraints on protein quantification utilizing isobaric tags. J. Proteome Res. 17, 590–599 (2018).

    Article  PubMed  Google Scholar 

  9. Huang, T. et al. MSstatsTMT: statistical detection of differentially abundant proteins in experiments with isobaric labeling and multiple mixtures. Mol. Cell. Proteomics 19, 1706–1723 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Clark, D. J. et al. Integrated proteogenomic characterization of clear cell renal cell carcinoma. Cell 179, 964–983.e31 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. O’Brien, J. J. et al. The effects of nonignorable missing data on label-free mass spectrometry proteomics experiments. Ann. Appl. Stat. 12, 2075–2095 (2018).

    Article  MathSciNet  PubMed  PubMed Central  Google Scholar 

  12. Lazar, C. et al. Accounting for the multiple natures of missing values in label-free quantitative proteomics data sets to compare imputation strategies. J. Proteome Res. 15, 1116–1125 (2016).

  13. O’Connell, J. D., Paulo, J. A., O’Brien, J. J. & Gygi, S. P. Proteome-wide evaluation of two common protein quantification methods. J. Proteome Res. 17, 1934–1942 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  14. Chan, M. et al. Novel insights from a multiomics dissection of the Hayflick limit. eLife 11, e70283 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Schweppe, D. K. et al. Characterization and optimization of multiplexed quantitative analyses using high-field asymmetric-waveform ion mobility mass spectrometry. Anal. Chem. 91, 4010–4016 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Navarrete-Perea, J., Gygi, S. P. & Paulo, J. A. HYpro16: a two-proteome mixture to assess interference in isobaric tag-based sample multiplexing experiments. J. Am. Soc. Mass Spectrom. 32, 247–254 (2021).

    Article  CAS  PubMed  Google Scholar 

  17. Paulo, J. A. et al. Quantitative mass spectrometry-based multiplexing compares the abundance of 5000 S. cerevisiae proteins across 10 carbon sources. J. Proteomics 148, 85–93 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Peshkin, L., Gupta, M., Ryazanova, L. & Wühr, M. Bayesian confidence intervals for multiplexed proteomics integrate ion-statistics with peptide quantification concordance. Mol. Cell Proteomics 18, 2108–2120 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  19. Karp, N. A. et al. Addressing accuracy and precision issues in iTRAQ quantitation. Mol. Cell. Proteomics 9, 1885–1897 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. McAlister, G. C. et al. MultiNotch MS3 enables accurate, sensitive, and multiplexed detection of differential expression across cancer cell line proteomes. Anal. Chem. 86, 7150–7158 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Herbrich, S. M. et al. Statistical inference from multiple iTRAQ experiments without using common reference standards. J. Proteome Res. 12, 594–604 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Erickson, B. K. et al. Evaluating multiplexed quantitative phosphopeptide analysis on a hybrid quadrupole mass filter/linear ion trap/orbitrap mass spectrometer. Anal. Chem. 87, 1241–1249 (2015).

    Article  CAS  PubMed  Google Scholar 

  23. Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Series B (Methodol.) 57, 289–300 (1995).

    MathSciNet  Google Scholar 

  24. Wagner, K. D. & Wagner, N. The senescence markers p16INK4A, p14ARF/p19ARF, and p21 in organ development and homeostasis. Cells 11, 1966 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Ma, W. et al. DreamAI: algorithm for the imputation of proteomics data. Preprint at bioRxiv https://doi.org/10.1101/2020.07.21.214205 (2020).

  26. Pereira, M. S. L., Klamt, F., Thomé, C. C., Worm, P. V. & de Oliveira, D. L. Metabotropic glutamate receptors as a new therapeutic target for malignant gliomas.Oncotarget 8, 22279–22298 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  27. O’Brien, J. J., Gunawardena, H. P. & Qaqish, B. F. Row versus column correlations: avoiding the ecological fallacy in RNA/protein expression studies. Brief Bioinform. 19, 946–953 (2017).

    Article  PubMed Central  Google Scholar 

  28. Schweppe, D. K. et al. Full-featured, real-time database searching platform enables fast and accurate multiplexed quantitative proteomics. J. Proteome Res. 19, 2026–2034 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Huttlin, E. L. et al. A tissue-specific atlas of mouse protein phosphorylation and expression. Cell 143, 1174–1189 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Elias, J. E. & Gygi, S. P. Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nat. Methods 4, 207–214 (2007).

    Article  CAS  PubMed  Google Scholar 

  31. Eng, J. K., McCormack, A. L. & Yates, J. R. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom. 5, 976–989 (1994).

    Article  CAS  PubMed  Google Scholar 

  32. Käll, L., Canterbury, J. D., Weston, J., Noble, W. S. & MacCoss, M. J. Semi-supervised learning for peptide identification from shotgun proteomics datasets. Nat. Methods 4, 923–925 (2007).

    Article  PubMed  Google Scholar 

  33. Perez-Riverol, Y. et al. The PRIDE database and related tools and resources in 2019: improving support for quantification data. Nucleic Acids Res. 47, D442–D450 (2019).

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

We thank all members of the mass spectrometry and computational teams at Calico Life Sciences LLC for assistance and helpful discussions, in particular E. Melamud, B. Bennett, L. Chan, T. Nguyen, P. Seitzer and J. Xu. Also, we thank the IT teams for their help with the support of in-house data analysis software and in particular A. Chekholko. We also thank S. Gygi, D. Schweppe, J. Mintseris, E. Huttlin, M. Wühr and B. Qaqish for helping us to clarify the key messages in the paper. Funding for this work was provided by Calico Life Sciences LLC.

Author information

Authors and Affiliations

Authors

Contributions

Experiments were conceived and planned by J.J.O., A.R., A.W. and F.E.M. Experiments were carried out by A.W., A.G. and N.O. The new algorithms were created by J.J.O. The software was developed by J.J.O. and W.L. Data analyses and interpretations were performed by J.J.O., W.L., A.R., A.G., D.G.H., N.O. and F.E.M. The paper was written by J.J.O. with input from all authors.

Corresponding authors

Correspondence to Jonathon J. O’Brien or Fiona E. McAllister.

Ethics declarations

Competing interests

All authors were employees of Calico Life Sciences LLC at the time of submission.

Peer review

Peer review information

Nature Methods thanks Samuel Payne and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available. Primary Handling Editor: Arunima Singh, in collaboration with the Nature Methods team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Appendix 1 (Supplementary Table 1 and Fig. 1), Appendix 2 (Supplementary Figs. 2 and 3), Appendix 3 (Supplementary Fig. 4 and Tables 2 and 3), Appendix 4 (Supplementary Fig. 5), Appendix 5 (Supplementary Fig. 6), Supplementary Figs. 7–10, Appendix 6 (supplementary methods and Table 4) and references.

Reporting Summary

Peer Review File

Supplementary Dataset 1

msTrawler results from the interbatch experiment.

Supplementary Dataset 2

Worksheet containing results from the msTrawler reanalysis of the Hayflick time course.

Supplementary Dataset 3

msTrawler results from the reanalysis of the pediatric brain tumor data.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

O’Brien, J.J., Raj, A., Gaun, A. et al. A data analysis framework for combining multiple batches increases the power of isobaric proteomics experiments. Nat Methods 21, 290–300 (2024). https://doi.org/10.1038/s41592-023-02120-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41592-023-02120-6

Search

Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics