A data analysis framework for combining multiple batches increases the power of isobaric proteomics experiments

O’Brien, Jonathon J.; Raj, Anil; Gaun, Aleksandr; Waite, Adam; Li, Wenzhou; Hendrickson, David G.; Olsson, Niclas; McAllister, Fiona E.

doi:10.1038/s41592-023-02120-6

Article
Published: 18 December 2023

A data analysis framework for combining multiple batches increases the power of isobaric proteomics experiments

Nature Methods volume 21, pages 290–300 (2024)Cite this article

2818 Accesses
17 Altmetric
Metrics details

Subjects

Abstract

We present a framework for the analysis of multiplexed mass spectrometry proteomics data that reduces estimation error when combining multiple isobaric batches. Variations in the number and quality of observations have long complicated the analysis of isobaric proteomics data. Here we show that the power to detect statistical associations is substantially improved by utilizing models that directly account for known sources of variation in the number and quality of observations that occur across batches.

In a multibatch benchmarking experiment, our open-source software (msTrawler) increases the power to detect changes, especially in the range of less than twofold changes, while simultaneously increasing quantitative proteome coverage by utilizing more low-signal observations. Further analyses of previously published multiplexed datasets of 4 and 23 batches highlight both increased power and the ability to navigate complex missing data patterns without relying on unverifiable imputations or discarding reliable measurements.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 2: The interbatch benchmarking experiment.**

**Fig. 4: Application of msTrawler to a 4-batch TMT senescence experiment.**

**Fig. 5: msTrawler enables complete case analyses of a 23 TMT batch study without discarding data.**

HarmonizR enables data harmonization across independent proteomic datasets with appropriate handling of missing values

Article Open access 20 June 2022

Strategies to enable large-scale proteomics for reproducible research

Article Open access 30 July 2020

MassIVE.quant: a community resource of quantitative mass spectrometry–based proteomics datasets

Article 14 September 2020

Data availability

The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE³³ partner repository with the dataset identifier PXD036799.

Code availability

The msTrawler R package, is available at www.github.com/Calico/msTrawler and the supplementary data and code used to generate the analyses in this paper are available at https://console.cloud.google.com/storage/browser/mstrawler_paper. A patent has been filed for the msTrawler data analysis framework and workflows.

References

Gaun, A. et al. Automated 16-plex plasma proteomics with real-time search and ion mobility mass spectrometry enables large-scale profiling in naked mole-rats and mice.J. Proteome Res. 20, 1280–1295 (2021).
Article CAS PubMed Google Scholar
Muntel, J. et al. Comparison of protein quantification in a complex background by DIA and TMT workflows with fixed instrument time. J. Proteome Res. 18, 1340–1351 (2019).
Article CAS PubMed Google Scholar
Li, J. et al. TMTpro-18plex: the expanded and complete set of TMTpro reagents for sample multiplexing. J. Proteome Res. 20, 2964–2972 (2021).
Article CAS PubMed PubMed Central ADS Google Scholar
Nusinow, D. P. et al. Quantitative proteomics of the cancer cell line encyclopedia. Cell 180, 387–402.e16 (2020).
Article CAS PubMed PubMed Central Google Scholar
Petralia, F. et al. Integrated proteogenomic characterization across major histological types of pediatric brain cancer. Cell 183, 1962–1985.e31 (2020).
Article CAS PubMed PubMed Central Google Scholar
Keele, G. R. et al. Regulation of protein abundance in genetically diverse mouse populations. Cell Genom. 1, 100003 (2021).
Article CAS PubMed PubMed Central Google Scholar
Brenes, A., Hukelmann, J., Bensaddek, D. & Lamond, A. I. Multibatch TMT reveals false positives, batch effects and missing values. Mol. Cell. Proteomics 18, 1967–1980 (2019).
Article CAS PubMed PubMed Central Google Scholar
O’Brien, J. J. et al. Compositional proteomics: effects of spatial constraints on protein quantification utilizing isobaric tags. J. Proteome Res. 17, 590–599 (2018).
Article PubMed Google Scholar
Huang, T. et al. MSstatsTMT: statistical detection of differentially abundant proteins in experiments with isobaric labeling and multiple mixtures. Mol. Cell. Proteomics 19, 1706–1723 (2020).
Article CAS PubMed PubMed Central Google Scholar
Clark, D. J. et al. Integrated proteogenomic characterization of clear cell renal cell carcinoma. Cell 179, 964–983.e31 (2019).
Article CAS PubMed PubMed Central Google Scholar
O’Brien, J. J. et al. The effects of nonignorable missing data on label-free mass spectrometry proteomics experiments. Ann. Appl. Stat. 12, 2075–2095 (2018).
Article MathSciNet PubMed PubMed Central Google Scholar
Lazar, C. et al. Accounting for the multiple natures of missing values in label-free quantitative proteomics data sets to compare imputation strategies. J. Proteome Res. 15, 1116–1125 (2016).
O’Connell, J. D., Paulo, J. A., O’Brien, J. J. & Gygi, S. P. Proteome-wide evaluation of two common protein quantification methods. J. Proteome Res. 17, 1934–1942 (2018).
Article PubMed PubMed Central Google Scholar
Chan, M. et al. Novel insights from a multiomics dissection of the Hayflick limit. eLife 11, e70283 (2022).
Article CAS PubMed PubMed Central Google Scholar
Schweppe, D. K. et al. Characterization and optimization of multiplexed quantitative analyses using high-field asymmetric-waveform ion mobility mass spectrometry. Anal. Chem. 91, 4010–4016 (2019).
Article CAS PubMed PubMed Central Google Scholar
Navarrete-Perea, J., Gygi, S. P. & Paulo, J. A. HYpro16: a two-proteome mixture to assess interference in isobaric tag-based sample multiplexing experiments. J. Am. Soc. Mass Spectrom. 32, 247–254 (2021).
Article CAS PubMed Google Scholar
Paulo, J. A. et al. Quantitative mass spectrometry-based multiplexing compares the abundance of 5000 S. cerevisiae proteins across 10 carbon sources. J. Proteomics 148, 85–93 (2016).
Article CAS PubMed PubMed Central Google Scholar
Peshkin, L., Gupta, M., Ryazanova, L. & Wühr, M. Bayesian confidence intervals for multiplexed proteomics integrate ion-statistics with peptide quantification concordance. Mol. Cell Proteomics 18, 2108–2120 (2019).
Article PubMed PubMed Central Google Scholar
Karp, N. A. et al. Addressing accuracy and precision issues in iTRAQ quantitation. Mol. Cell. Proteomics 9, 1885–1897 (2010).
Article CAS PubMed PubMed Central Google Scholar
McAlister, G. C. et al. MultiNotch MS3 enables accurate, sensitive, and multiplexed detection of differential expression across cancer cell line proteomes. Anal. Chem. 86, 7150–7158 (2014).
Article CAS PubMed PubMed Central Google Scholar
Herbrich, S. M. et al. Statistical inference from multiple iTRAQ experiments without using common reference standards. J. Proteome Res. 12, 594–604 (2013).
Article CAS PubMed PubMed Central Google Scholar
Erickson, B. K. et al. Evaluating multiplexed quantitative phosphopeptide analysis on a hybrid quadrupole mass filter/linear ion trap/orbitrap mass spectrometer. Anal. Chem. 87, 1241–1249 (2015).
Article CAS PubMed Google Scholar
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Series B (Methodol.) 57, 289–300 (1995).
MathSciNet Google Scholar
Wagner, K. D. & Wagner, N. The senescence markers p16INK4A, p14ARF/p19ARF, and p21 in organ development and homeostasis. Cells 11, 1966 (2022).
Article CAS PubMed PubMed Central Google Scholar
Ma, W. et al. DreamAI: algorithm for the imputation of proteomics data. Preprint at bioRxiv https://doi.org/10.1101/2020.07.21.214205 (2020).
Pereira, M. S. L., Klamt, F., Thomé, C. C., Worm, P. V. & de Oliveira, D. L. Metabotropic glutamate receptors as a new therapeutic target for malignant gliomas.Oncotarget 8, 22279–22298 (2017).
Article PubMed PubMed Central Google Scholar
O’Brien, J. J., Gunawardena, H. P. & Qaqish, B. F. Row versus column correlations: avoiding the ecological fallacy in RNA/protein expression studies. Brief Bioinform. 19, 946–953 (2017).
Article PubMed Central Google Scholar
Schweppe, D. K. et al. Full-featured, real-time database searching platform enables fast and accurate multiplexed quantitative proteomics. J. Proteome Res. 19, 2026–2034 (2020).
Article CAS PubMed PubMed Central Google Scholar
Huttlin, E. L. et al. A tissue-specific atlas of mouse protein phosphorylation and expression. Cell 143, 1174–1189 (2010).
Article CAS PubMed PubMed Central Google Scholar
Elias, J. E. & Gygi, S. P. Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nat. Methods 4, 207–214 (2007).
Article CAS PubMed Google Scholar
Eng, J. K., McCormack, A. L. & Yates, J. R. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom. 5, 976–989 (1994).
Article CAS PubMed Google Scholar
Käll, L., Canterbury, J. D., Weston, J., Noble, W. S. & MacCoss, M. J. Semi-supervised learning for peptide identification from shotgun proteomics datasets. Nat. Methods 4, 923–925 (2007).
Article PubMed Google Scholar
Perez-Riverol, Y. et al. The PRIDE database and related tools and resources in 2019: improving support for quantification data. Nucleic Acids Res. 47, D442–D450 (2019).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

We thank all members of the mass spectrometry and computational teams at Calico Life Sciences LLC for assistance and helpful discussions, in particular E. Melamud, B. Bennett, L. Chan, T. Nguyen, P. Seitzer and J. Xu. Also, we thank the IT teams for their help with the support of in-house data analysis software and in particular A. Chekholko. We also thank S. Gygi, D. Schweppe, J. Mintseris, E. Huttlin, M. Wühr and B. Qaqish for helping us to clarify the key messages in the paper. Funding for this work was provided by Calico Life Sciences LLC.

Author information

Authors and Affiliations

Calico Life Sciences LLC, South San Francisco, CA, USA
Jonathon J. O’Brien, Anil Raj, Aleksandr Gaun, Adam Waite, Wenzhou Li, David G. Hendrickson, Niclas Olsson & Fiona E. McAllister

Authors

Jonathon J. O’Brien
View author publications
You can also search for this author in PubMed Google Scholar
Anil Raj
View author publications
You can also search for this author in PubMed Google Scholar
Aleksandr Gaun
View author publications
You can also search for this author in PubMed Google Scholar
Adam Waite
View author publications
You can also search for this author in PubMed Google Scholar
Wenzhou Li
View author publications
You can also search for this author in PubMed Google Scholar
David G. Hendrickson
View author publications
You can also search for this author in PubMed Google Scholar
Niclas Olsson
View author publications
You can also search for this author in PubMed Google Scholar
Fiona E. McAllister
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Experiments were conceived and planned by J.J.O., A.R., A.W. and F.E.M. Experiments were carried out by A.W., A.G. and N.O. The new algorithms were created by J.J.O. The software was developed by J.J.O. and W.L. Data analyses and interpretations were performed by J.J.O., W.L., A.R., A.G., D.G.H., N.O. and F.E.M. The paper was written by J.J.O. with input from all authors.

Corresponding authors

Correspondence to Jonathon J. O’Brien or Fiona E. McAllister.

Ethics declarations

Competing interests

All authors were employees of Calico Life Sciences LLC at the time of submission.

Peer review

Peer review information

Nature Methods thanks Samuel Payne and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available. Primary Handling Editor: Arunima Singh, in collaboration with the Nature Methods team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Appendix 1 (Supplementary Table 1 and Fig. 1), Appendix 2 (Supplementary Figs. 2 and 3), Appendix 3 (Supplementary Fig. 4 and Tables 2 and 3), Appendix 4 (Supplementary Fig. 5), Appendix 5 (Supplementary Fig. 6), Supplementary Figs. 7–10, Appendix 6 (supplementary methods and Table 4) and references.

Reporting Summary

Peer Review File

Supplementary Dataset 1

msTrawler results from the interbatch experiment.

Supplementary Dataset 2

Worksheet containing results from the msTrawler reanalysis of the Hayflick time course.

Supplementary Dataset 3

msTrawler results from the reanalysis of the pediatric brain tumor data.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

O’Brien, J.J., Raj, A., Gaun, A. et al. A data analysis framework for combining multiple batches increases the power of isobaric proteomics experiments. Nat Methods 21, 290–300 (2024). https://doi.org/10.1038/s41592-023-02120-6

Download citation

Received: 13 October 2022
Accepted: 31 October 2023
Published: 18 December 2023
Issue Date: February 2024
DOI: https://doi.org/10.1038/s41592-023-02120-6

A data analysis framework for combining multiple batches increases the power of isobaric proteomics experiments

Subjects

Abstract

Access options

Similar content being viewed by others

HarmonizR enables data harmonization across independent proteomic datasets with appropriate handling of missing values

Strategies to enable large-scale proteomics for reproducible research

MassIVE.quant: a community resource of quantitative mass spectrometry–based proteomics datasets

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Supplementary information

Supplementary Information

Reporting Summary

Peer Review File

Supplementary Dataset 1

Supplementary Dataset 2

Supplementary Dataset 3

Rights and permissions

About this article

Cite this article

Increasing the analytic power for multi-batch proteome profiling with isobaric mass tags

Search

Quick links

Subjects

Abstract

Access options

Similar content being viewed by others

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links