Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

A modification-centric assessment tool for the performance of chemoproteomic probes

Abstract

Chemoproteomics has emerged as a key technology to expand the functional space in complex proteomes for probing fundamental biology and for discovering new small-molecule-based therapies. Here we report a modification-centric computational tool termed pChem to provide a streamlined pipeline for unbiased performance assessment of chemoproteomic probes. The pipeline starts with an experimental setting for isotopically coding probe-derived modifications that can be automatically recognized by pChem, with masses accurately calculated and sites precisely localized. pChem exports on-demand reports by scoring the profiling efficiency, modification homogeneity and proteome-wide residue selectivity of a tested probe. The performance and robustness of pChem were benchmarked by applying it to eighteen bioorthogonal probes. These analyses reveal that the formation of unexpected probe-derived modifications can be driven by endogenous reactive metabolites (for example, bioactive aldehydes and glutathione). pChem is a powerful and user-friendly tool that aims to facilitate the development of probes for the ever-growing field of chemoproteomics.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Principle and work flow of pChem.
Fig. 2: Performance of pChem.
Fig. 3: Benchmarking pChem with residue-reactive probes.
Fig. 4: Benchmarking pChem with SOH probes.
Fig. 5: Benchmarking pChem with lipid electrophile-derived probes.
Fig. 6: Comparison of pChem with FragPipe for chemoproteomics.

Similar content being viewed by others

Data availability

The newly generated chemoproteomic datasets have been deposited to the ProteomeXchange Consortium via the PRIDE59 partner repository with the dataset identifiers PXD027755, PXD027758, PXD030064, PXD027789, and PXD027756; previously published data were also used to benchmark pChem in repositories with identifiers PXD027591, PXD027764, PXD027762, PXD027760, and PXD032392. The FASTA canonical protein sequence databases from various species were embedded into the pChem software and could also be obtained from Uniprot60. Source data are provided with this paper.

Code availability

pChem is open-source and is freely available at https://github.com/pFindStudio/pChem under a permissive license.

References

  1. Parker, C. G. & Pratt, M. R. Click chemistry in proteomic investigations. Cell 180, 605–632 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Grammel, M. & Hang, H. C. Chemical reporters for biological discovery. Nat. Chem. Biol. 9, 475–484 (2013).

    Article  CAS  PubMed  Google Scholar 

  3. Grams, R. J. & Hsu, K. L. Reactive chemistry for covalent probe and therapeutic development. Trends Pharmacol. Sci. 43, 249–262 (2022).

    Article  CAS  PubMed  Google Scholar 

  4. den Ridder, M., Daran-Lapujade, P. & Pabst, M. Shot-gun proteomics: why thousands of unidentified signals matter. FEMS Yeast Res. 20, foz088 (2020).

    Article  CAS  Google Scholar 

  5. Dasari, S. et al. TagRecon: high-throughput mutation identification through sequence tagging. J. Proteome Res. 9, 1716–1726 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Akter, S. et al. Chemical proteomics reveals new targets of cysteine sulfinic acid reductase. Nat. Chem. Biol. 14, 995–1004 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Gupta, V., Yang, J., Liebler, D. C. & Carroll, K. S. Diverse redoxome reactivity profiles of carbon nucleophiles. J. Am. Chem. Soc. 139, 5588–5595 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Tian, C., Liu, K., Sun, R., Fu, L. & Yang, J. Chemoproteomics reveals unexpected lysine/arginine-specific cleavage of peptide chains as a potential protein degradation machinery. Anal. Chem. 90, 794–800 (2018).

    Article  CAS  PubMed  Google Scholar 

  9. Sun, R. et al. Chemoproteomics reveals chemical diversity and dynamics of 4-Oxo-2-nonenal modifications in cells. Mol. Cell Proteomics 16, 1789–1800 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Kong, A. T., Leprevost, F. V., Avtonomov, D. M., Mellacheruvu, D. & Nesvizhskii, A. I. MSFragger: ultrafast and comprehensive peptide identification in mass spectrometry-based proteomics. Nat. Methods 14, 513–520 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Zanon, P. R. A. et al. Profiling the proteome-wide selectivity of diverse electrophiles. Preprint at ChemRxiv https://doi.org/10.26434/chemrxiv.14186561.v1 (2021).

  12. Chi, H. et al. Comprehensive identification of peptides in tandem mass spectra using an efficient open search engine. Nat. Biotechnol. 36, 1059–1061 (2018).

    Article  CAS  Google Scholar 

  13. Creasy, D. M. & Cottrell, J. S. Unimod: protein modifications for mass spectrometry. Proteomics 4, 1534–1536 (2004).

    Article  CAS  PubMed  Google Scholar 

  14. Yang, H. et al. pSite: amino acid confidence evaluation for quality control of de novo peptide sequencing and modification site localization. J. Proteome Res. 17, 119–128 (2018).

    Article  CAS  PubMed  Google Scholar 

  15. Dancik, V., Addona, T. A., Clauser, K. R., Vath, J. E. & Pevzner, P. A. De novo peptide sequencing via tandem mass spectrometry. J. Comput. Biol. 6, 327–342 (1999).

    Article  CAS  PubMed  Google Scholar 

  16. Fu, L. et al. A quantitative thiol reactivity profiling platform to analyze redox and electrophile reactive cysteine proteomes. Nat. Protoc. 15, 2891–2919 (2020).

    Article  CAS  PubMed  Google Scholar 

  17. Chambers, M. C. et al. A cross-platform toolkit for mass spectrometry and proteomics. Nat. Biotechnol. 30, 918–920 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Wu, C. W. & Yarbrough, L. R. N-(1-pyrene)maleimide: a fluorescent cross-linking reagent. Biochemistry 15, 2863–2868 (1976).

    Article  CAS  PubMed  Google Scholar 

  19. Halloran, M. W. & Lumb, J. P. Recent applications of diazirines in chemical proteomics. Chemistry 25, 4885–4898 (2019).

    Article  CAS  PubMed  Google Scholar 

  20. West, A. V. et al. Labeling preferences of diazirines with protein biomolecules. J. Am. Chem. Soc. 143, 6691–6700 (2021).

    Article  CAS  PubMed  Google Scholar 

  21. Qin, W., Cho, K. F., Cavanagh, P. E. & Ting, A. Y. Deciphering molecular interactions by proximity labeling. Nat. Methods 18, 133–143 (2021).

    Article  CAS  PubMed  Google Scholar 

  22. Bar, D. Z. et al. Biotinylation by antibody recognition—a method for proximity labeling. Nat. Methods 15, 127–133 (2018).

    Article  CAS  PubMed  Google Scholar 

  23. Udeshi, N. D. et al. Antibodies to biotin enable large-scale detection of biotinylation sites on proteins. Nat. Methods 14, 1167–1170 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Massari, J. et al. Acetyl radical production by the methylglyoxal-peroxynitrite system: a possible route for l-lysine acetylation. Chem. Res. Toxicol. 23, 1762–1770 (2010).

    Article  CAS  PubMed  Google Scholar 

  25. Sadler, A., Subrahmanyam, V. V. & Ross, D. Oxidation of catechol by horseradish peroxidase and human leukocyte peroxidase: reactions of o-benzoquinone and o-benzosemiquinone. Toxicol. Appl. Pharmacol. 93, 62–71 (1988).

    Article  CAS  PubMed  Google Scholar 

  26. Paulsen, C. E. & Carroll, K. S. Cysteine-mediated redox signaling: chemistry, biology, and tools for discovery. Chem. Rev. 113, 4633–4679 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Alcock, L. J., Perkins, M. V. & Chalker, J. M. Chemical methods for mapping cysteine oxidation. Chem. Soc. Rev. 47, 231–268 (2018).

    Article  CAS  PubMed  Google Scholar 

  28. Huang, J. et al. Mining for protein S-sulfenylation in Arabidopsis uncovers redox-sensitive sites. Proc. Natl Acad. Sci. USA 116, 21256–21261 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Meng, J. et al. Global profiling of distinct cysteine redox forms reveals wide-ranging redox regulation in C. elegans. Nat. Commun. 12, 1415 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Pople, J. M. M. & Chalker, J. M. A critical evaluation of probes for cysteine sulfenic acid. Curr. Opin. Chem. Biol. 60, 55–65 (2021).

    Article  CAS  PubMed  Google Scholar 

  31. Shi, Y. & Carroll, K. S. Comments on ‘A critical evaluation of probes for cysteine sulfenic acid’. Curr. Opin. Chem. Biol. 60, 131–133 (2021).

    Article  CAS  PubMed  Google Scholar 

  32. Yang, J. et al. Global, in situ, site-specific analysis of protein S-sulfenylation. Nat. Protoc. 10, 1022–1037 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Fu, L., Liu, K., Ferreira, R. B., Carroll, K. S. & Yang, J. Proteome-wide analysis of cysteine S-sulfenylation using a benzothiazine-based probe. Curr. Protoc. Protein Sci. 95, e76 (2019).

    Article  PubMed  CAS  Google Scholar 

  34. Shi, Y., Fu, L., Yang, J. & Carroll, K. S. Wittig reagents for chemoselective sulfenic acid ligation enables global site stoichiometry analysis and redox-controlled mitochondrial targeting. Nat. Chem. 13, 1140–1150 (2021).

    Article  CAS  PubMed  Google Scholar 

  35. Ohata, J., Bruemmer, K. J. & Chang, C. J. Activity-based sensing methods for monitoring the reactive carbon species carbon monoxide and formaldehyde in living systems. Acc. Chem. Res. 52, 2841–2848 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Ramasamy, R., Yan, S. F. & Schmidt, A. M. Methylglyoxal comes of AGE. Cell 124, 258–260 (2006).

    Article  CAS  PubMed  Google Scholar 

  37. Palaniappan, K. K. & Bertozzi, C. R. Chemical glycoproteomics. Chem. Rev. 116, 14277–14306 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Qin, W. et al. Artificial cysteine S-glycosylation induced by per-O-acetylated unnatural monosaccharides during metabolic glycan labeling. Angew. Chem. Int. Ed. Engl. 57, 1817–1820 (2018).

    Article  CAS  PubMed  Google Scholar 

  39. Sayre, L. M., Lin, D., Yuan, Q., Zhu, X. & Tang, X. Protein adducts generated from products of lipid oxidation: focus on HNE and one. Drug Metab. Rev. 38, 651–675 (2006).

    Article  CAS  PubMed  Google Scholar 

  40. Yang, J., Tallman, K. A., Porter, N. A. & Liebler, D. C. Quantitative chemoproteomics for site-specific analysis of protein alkylation by 4-hydroxy-2-nonenal in cells. Anal. Chem. 87, 2535–2541 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Zhu, X., Gallogly, M. M., Mieyal, J. J., Anderson, V. E. & Sayre, L. M. Covalent cross-linking of glutathione and carnosine to proteins by 4-oxo-2-nonenal. Chem. Res. Toxicol. 22, 1050–1059 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Zanon, P. R. A., Lewald, L. & Hacker, S. M. Isotopically labeled desthiobiotin azide (isoDTB) tags enable global profiling of the bacterial cysteinome. Angew. Chem. Int. Ed. Engl. 59, 2829–2836 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Qin, K. et al. Quantitative profiling of protein O-GlcNAcylation sites by an isotope-tagged cleavable linker. ACS Chem. Biol. 13, 1983–1989 (2018).

    Article  CAS  PubMed  Google Scholar 

  44. Abo, M., Li, C. & Weerapana, E. Isotopically-labeled iodoacetamide-alkyne probes for quantitative cysteine-reactivity profiling. Mol. Pharm. 15, 743–749 (2018).

    Article  CAS  PubMed  Google Scholar 

  45. Li, J. et al. An isotope-coded photocleavable probe for quantitative profiling of protein O-GlcNAcylation. ACS Chem. Biol. 14, 4–10 (2019).

    Article  CAS  PubMed  Google Scholar 

  46. Patiny, L. & Borel, A. ChemCalc: a building block for tomorrow’s chemical infrastructure. J. Chem. Inf. Model. 53, 1223–1228 (2013).

    Article  CAS  PubMed  Google Scholar 

  47. Dave, K. A. et al. Sulfonation and phosphorylation of regions of the dioxin receptor susceptible to methionine modifications. Mol. Cell Proteomics 8, 706–719 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Wang, M. R., He, J. Y., He, J. X., Liu, K. K. & Yang, J. A Paal–Knorr agent for chemoproteomic profiling of targets of isoketals in cells. Chem. Sci. 12, 14557–14563 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).

    Google Scholar 

  50. Elias, J. E. & Gygi, S. P. Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nat. Methods 4, 207–214 (2007).

    Article  CAS  PubMed  Google Scholar 

  51. Petrova, B. et al. Dynamic redox balance directs the oocyte-to-embryo transition via developmentally controlled reactive cysteine changes. Proc. Natl Acad. Sci. USA 115, E7978–E7986 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Wang, W. et al. An Arabidopsis secondary metabolite directly targets expression of the bacterial type III secretion system to inhibit bacterial virulence. Cell Host Microbe 27, 601–613.e7 (2020).

    Article  CAS  PubMed  Google Scholar 

  53. Sun, R. et al. A chemoproteomic platform to assess bioactivation potential of drugs. Chem. Res. Toxicol. 30, 1797–1803 (2017).

    Article  CAS  PubMed  Google Scholar 

  54. Lin, D., Saleh, S. & Liebler, D. C. Reversibility of covalent electrophile-protein adducts and chemical toxicity. Chem. Res. Toxicol. 21, 2361–2369 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Weerapana, E., Simon, G. M. & Cravatt, B. F. Disparate proteome reactivity profiles of carbon electrophiles. Nat. Chem. Biol. 4, 405–407 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Pan, S. et al. A suite of “minimalist” photo-crosslinkers for live-cell imaging and chemical proteomics: case study with BRD4 inhibitors. Angew. Chem. Int. Ed. Engl. 56, 11816–11821 (2017).

    Article  CAS  PubMed  Google Scholar 

  57. Yoo, S. D., Cho, Y. H. & Sheen, J. Arabidopsis mesophyll protoplasts: a versatile cell system for transient gene expression analysis. Nat. Protoc. 2, 1565–1572 (2007).

    Article  CAS  PubMed  Google Scholar 

  58. Liu, C. et al. pQuant improves quantitation by keeping out interfering signals and evaluating the accuracy of calculated ratios. Anal. Chem. 86, 5286–5294 (2014).

    Article  CAS  PubMed  Google Scholar 

  59. Perez-Riverol, Y. et al. The PRIDE database and related tools and resources in 2019: improving support for quantification data. Nucleic Acids Res. 47, D442–D450 (2019).

    Article  CAS  PubMed  Google Scholar 

  60. UniProt Consortium. UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res. 47, D506–D515 (2019).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

We thank S.-X. Gao and C. C. L. Wong from Peking University for their help on the use of timsTOF; Z.-Q. Li from Jinan University for providing the alkyl diazirine probe and for performing beta-testing; K. Qin, X. Chen from Peking University for providing the Ac4ManNAz-based glycoproteomic dataset; P.-Y. Gong and C. Liu for their help on the quantitative analyses of aHNE/aONE datasets; H.-F. Jin from Peking University for providing rat CRL-1444 cells; X. Xiao from Shanghai Jiaotong University for providing E. coli strain MG1655; G.-Zhi Bi from Institute of Genetics and Developmental Biology, CAS, for his help in preparation of Arabidopsis protoplasts; C. Wang and X.-G. Lei from Peking University, G. Li from Shenzhen Bay lab, H. Ye from China Pharmaceutical University, Y.-Y. Zhang from Shanghai Institute of Organic Chemistry, CAS, M.-L. Ye from Dalian Institute of Chemical Physics, CAS, N. Chen from the ChomIX Biotech, and U. Seneviratne from Pfizer for performing beta-besting. C.-R. Liu from Central China Normal University, Y.-L. Shi and K. S. Carroll from the Scripps Research Institute, P. Zou from Peking University, Q. Zhang from Fudan University, S.-M. He from Institute of Computing Technology, CAS, Y. Fu from Academy of Mathematics and Systems Science, CAS, and R.-X. Sun from National Institute of Biological Sciences, Beijing, for many insightful discussions and/or for proofreading the manuscript, S. M. Hacker from Technical University of Munich and A. I. Nesvizhskii from University of Michigan for their generous guidance in the use of FragPipe; K. Vicari for designing the graphical abstract. The work was supported by grants from the Natural Science Foundation of China (21922702, 81973279, and 31770885) to J.Y., (32022046) to H.C., and (32088101) to F.C.H., the National Key R&D Program of China (2016YFA0501303) to J.Y., and (2016YFA0501301) to H.C., and the State Key Laboratory of Proteomics (SKLP-K201703 and SKLP-K201804) to J.Y.

Author information

Authors and Affiliations

Authors

Contributions

J.X.H. performed the experiments, analyzed the data, and wrote the protocol; Z.C.F. designed and implemented pChem, analyzed the data, and wrote the protocol; L.F., performed the QTRP experiments for various species; C.P.T. generated the AP and Diazir datasets; F.C.H. acquired funding; H.C. supervised the work, advised on pChem design and revised the manuscript; J.Y. conceived the project, supervised the work, advised on pChem design, analyzed data and wrote the manuscript with input from all the other authors.

Corresponding authors

Correspondence to Hao Chi or Jing Yang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Chemical Biology thanks Alexander Leitner and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Identification of an unexpected but minor modification that is derived from IPM.

a, Heatmaps showing the amino acid localization distribution of the IPM-derived modification of ∆268.12. ∆268.12 on cysteine can be a sulfoxide product derived from the targeted PDM of ∆252.12, which is further supported by the DFLs (Supplementary Data 1). Regardless, ∆268.12 is often assigned to methionine when it is adjacent to a cysteine, and that such PSMs prone to yield a classic oxidation[M]-derived neutral loss of 64.0 Da. b, Representative MS/MS spectrum wrongly assigning the targeted PDM (∆252.12) to oxidized methionine due to the presence of adjacent cysteine. c, Representative MS/MS spectrum of a peptide bearing the PDM ∆268.12 assigning to a non-Cys peptide of MIF protein. This initially puzzling outcome was explained when we noted that this peptide contains a catalytic N-term proline with a high nucleophilicity (that is, with an unusually low pKa of 5.6) and an adjacent methionine.

Extended Data Fig. 2 Robustness of pChem.

Representative heatmaps generated by pChem showing the amino acid localization distribution of the IPM-derived modifications. a, IPM-based QTRP data for pChem search were produced on eight different LC-MS/MS instruments from three vendors. The running times of pChem search are shown under each instrument. All analyses were performed on a desktop PC running Microsoft Window 10 Home (v.19041.110), with one 2.90 GHz CPU Intel i7-10700 processor with 64 GB of installed RAM. b, IPM-based QTRP data for pChem search were produced previously from light and heavy IPM-tagged samples mixed in different ratios (L/H = 1:10, 1:5, 1:2, 1:1, 2:1, 5:1 and 10:1). MS data were acquired on a Q-Exactive Plus instrument. c, Data for pChem search were produced from the IPM-based QTRP applications in various species. MS data were acquired on either a Q-Exactive Plus or a Q-Exactive HF-X instrument.

Extended Data Fig. 3 The effect of statistics-based filtering for localization refinement on pChem search of negative control data sets.

a, Workflows for generating negative control data sets. b, The increase of statistic power (that is, the decrease of p value) reduced the chances for outputting false-positive identifications.

Source data

Extended Data Fig. 4 Benchmarking pChem with DiaAlk, a SO2H probe.

a, Chemical structure of DiaAlk. b, Representative heatmaps showing the amino acid localization distribution of the pChem-defined PDMs for DiaAlk. c, Pie charts showing the abundance distribution (that is, number of PSMs) of PDMs from DiaAlk. d, Representative MS/MS spectrum of a peptide bearing the PDM of ∆471.23 on tryptophan. e, Representative MS/MS spectrum of a peptide bearing the PDM of ∆371.18 on tryptophan. f, Plausible mechanism for the formation of two tryptophan-targeting PDMs (that is, ∆471.23 and ∆371.18) both derived from DiaAlk.

Extended Data Fig. 5 Re-analyses of protein adduction by aHNE and aONE in RKO cells.

Raw data sets for such re-analyses were retrieved from Ref. 40 (Yang, et al., Anal Chem, 2015) and Ref. 9 (Sun, et al., Mol Cell Proteomics, 2017). a, Schematic of the workflow for quantitative chemoproteomic analyses of dynamic aHNE/aONE-derived protein adducts in RKO cells. Cells were first treated with either aHNE or aONE. After treatment, cells were either harvested immediately and used as controls or placed in probe-free medium for another 1 and 4 h recovery period. The probe-labeled proteomes were digested with trypsin and then biotinylated by click chemistry with the light (L, recovery) or heavy (H, control) labeled UV-cleavable azido-biotin, followed by streptavidin enrichment, photorelease, and LC-MS/MS analysis. Identification and quantification were performed using the pFind studio (See Methods for more details). b-c, Venn diagrams revealing that pChem-based identification of previously unknown PDMs substantially expanded the target spectrum of aHNE (b) and aONE (c). Note, cysteines on those N-term ketoamide peptide adducts (∆307.15) are assigned as aHNE-modified sites, since such a PDM is most likely generated through an intramolecular rearrangement from Cys to N-term. d-e, Dynamics of aHNE- and aONE-based PDMs in RKO cells. d, Violin plots of L/H ratios determined from two types of aHNE-derived protein adducts in dynamic adduction analyses. e, Violin plots of L/H ratios determined from four types of aONE-derived protein adducts in dynamic adduction analyses. Dash lines denote median value and dot lines denote the 25th and 75th percentiles.

Source data

Extended Data Fig. 6 pChem automatically defines high-confidence PDMs.

Overlaps of unfiltered mass shifts higher than 200 Da, isotope-paired PDMs (no cutoffs for heavy to light mass tolerance and PSM counts) and high-confidence PDMs (Heavy to light mass tolerance ≤ 0.005 Da, %PSM ≥ 5%).

Extended Data Fig. 7 The predominance of high-confidence PDMs.

Stack column plots showing that, for all probes tested herein, the PSMs of the corresponding high-confidence PDMs (Heavy to light mass tolerance ≤ 0.005 Da, %PSM ≥ 5%) account for 88.4 ± 6.3% of those of all isotope-paired PDMs (no cutoffs for heavy to light mass tolerance and PSM counts). Note that herein the PSMs used for statistics are provided by blind search for each data set.

Source data

Extended Data Fig. 8 Comparison of different approaches for accurate mass calculation.

a-c, Histograms showing the distribution of mass shifts determined at the PSM-level using three different methods as follows: a, Mass shift average, all unknown mass shifts are assigned to the corresponding mass labels being kept in a fixed two decimal places. Those with the same mass label are unified by averaging the mass shifts from multiple PSMs as follows: \(\bar M_t = \frac{1}{K}\mathop {\sum }\limits_{i = 1}^K M_i\), where K denotes the number of spectra include the target mass label. Note that the modification mass Mi employed here has been corrected with system error. b, Mass range average, those unknown mass shifts within a fixed size window (0.01 Da of Mw by default) are considered the same unknown modification, averaging to give accurate mass of candidate modification as follows: \(\bar M_t = \frac{1}{Q}\mathop {\sum }\limits_{i = 1}^Q M_i,\forall \,M_t - M_w < M_i < M_t + M_w\), where Mw is the window radius size, Mt is the target mass shift and Q is the number of spectra belongs to the mass range. c, Window-based Iterative refinement builds upon the Mass range average method, the mass shifts are initially unified within a predefined window (0.01 Da by default), and gradually converged to stable and accurate ones through multiple iterations (See Methods for details). For a-c, orange vertical line represents the accurate mass of ground-truth unknow modification, while red dash line denotes the mass estimated by each approach. d, Violin plots showing mass accuracy achieved by above methods. Middle lines denote median values, while left- and right-end lines denote the 25th and 75th percentiles, respectively. Note that only the data sets where the number of spectra used for mass calculation is larger than 500 were used in this comparative analysis.

Source data

Extended Data Fig. 9 Box plots showing the influence of unknown modification mass error on the molecular formula (MF) inference.

The number of MF candidates for light and heavy modifications (n=60) derived from18 tested probes were calculated to meet the allowable error ranges (ppm value from 1 to 20). Note that when the error belongs to 1.26 ppm (the median mass accuracy achieved by pChem, highlighted in red), the number of MF candidates is typically less than three, and when the error comes to 2.95 ppm (the median mass accuracy achieved by FragPipe, highlighted in blue), more candidate MFs will be generated. Elements of each box plot indicate the median (orange line segment), the 25th and 75th percentiles (lower-end and upper-end of the box, respectively), and the 1.5x interquartile range (whiskers).

Source data

Extended Data Fig. 10 Quality control of modification site localization.

a, Prior to the p-value filtering process, many ‘inert’ amino acids were all mis-localized with p-values ranging from 0.203 to 1.0, though it is theoretically impossible for them to react with the probes as indicated. The size of each dot is proportional to the p value (Right-tailed hypothesis test, see Methods for details) of the corresponding PDM site localization. b, Comparison of localization probability on ‘inert’ amino acids for every tested probe before (Bottom-left triangles) and after (Top-right triangles) a statistics-based curation. A default cutoff (p-value < 0.001) is sufficient to reduce localization probability of such mis-localized sites to null.

Source data

Supplementary information

Supplementary Information

Legends to Supplementary Data 1–10, Supplementary Figs 1–13 and Supplementary Tables 1 and 2.

Reporting Summary

Supplementary Data 1

Reporting summaries from pChem searches of the datasets generated from three independent IPM-based QTRP experiments. pChem search results at a PSM level are provided for each replicate.

Supplementary Data 2

Reporting summaries from pChem searches of the IPM-based QTRP datasets for benchmarking the robustness of pChem. The sheet named ‘Diff instrument’ shows the representative reporting summaries of pChem searches of IPM-based QTRP data produced on eight different LC–MS/MS instruments from three vendors. The sheet named ‘Diff ratios’ shows the representative reporting summaries of pChem searches of IPM-based QTRP data produced previously from light- and heavy-IPM-tagged samples mixed in different ratios (light-to-heavy ratios of 1:10, 1:5, 1:2, 1:1, 2:1, 5:1 and 10:1). The sheet named ‘Diff species’ shows the representative reporting summaries by pChem searches of those data generated from the IPM-based QTRP applications in various species.

Supplementary Data 3

Reporting summaries from pChem searches of the datasets generated from chemoproteomic analyses using residue-reactive probes. Representative pChem search results at a PSM level are provided for each probe as indicated.

Supplementary Data 4

Reporting summaries from pChem search of the datasets generated from chemoproteomic analyses using oxoform-specific probes. Representative pChem search results at a PSM level are provided for each probe as indicated.

Supplementary Data 5

Reporting summaries from pChem search of the datasets generated from chemoproteomic analyses using metabolite-derived probes. Representative pChem search results at a PSM level are provided for each probe as indicated.

Supplementary Data 6

Reanalyses of protein adduction by aHNE and aONE in RKO cells. Raw datasets for such reanalyses were retrieved from ref. 9 and ref. 40. RKO cells were first treated with either aHNE or aONE. After treatment, cells were either harvested immediately and used as controls or placed in probe-free medium for another 1 and 4 h recovery period. The probe-labeled proteomes were digested with trypsin and then biotinylated by click chemistry with the light (recovery)- or heavy (control)-labeled UV-cleavable azido-biotin, followed by streptavidin enrichment, photorelease, and LC–MS/MS analysis. Identification and quantification were performed using the pFind studio (Methods). Quantification results were obtained from three biological replicates with two LC–MS/MS runs for each. Note that cysteines on those N-terminal ketoamide peptide adducts (∆309.17) are assigned as aHNE-modified sites, since such a PDM is most likely generated through an intramolecular rearrangement from cysteine to N terminus.

Supplementary Data 7

Representative FragPipe outputs. The same MS datasets used in this study were searched with FragPipe (v17.0) and the results are retrieved from the output files for each probe as indicated.

Supplementary Data 8

Representative data files used for benchmarking pChem. Among them, 20 raw files are retrieved from previous publications, while 24 raw files are newly generated for this study.

Supplementary Data 9

LC–MS/MS setups. The whole study involves ten different instruments by three vendors from three independent laboratories. A = 0.1% formic acid in water, B = 0.1% formic acid in acetonitrile.

Supplementary Data 10

Statistical source data for Supplementary Figs 2 and 13b,c

Source data

Source Data Fig. 6

Statistical source data for Fig. 6.

Source Data Extended Data Fig. 3

Statistical source data for Extended Data Fig. 3b.

Source Data Extended Data Fig. 5

Statistical source data for Extended Data Fig. 5d,e.

Source Data Extended Data Fig. 7

Statistical source data for Extended Data Fig. 7.

Source Data Extended Data Fig. 8

Statistical source data for Extended Data Fig. 8.

Source Data Extended Data Fig. 9

Statistical source data for Extended Data Fig. 9.

Source Data Extended Data Fig. 10

Statistical source data for Extended Data Fig. 10a.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

He, JX., Fei, ZC., Fu, L. et al. A modification-centric assessment tool for the performance of chemoproteomic probes. Nat Chem Biol 18, 904–912 (2022). https://doi.org/10.1038/s41589-022-01074-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41589-022-01074-8

This article is cited by

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research