Structure-based validation can drastically underestimate error rate in proteome-wide cross-linking mass spectrometry studies

Yugandhar, Kumar; Wang, Ting-Yi; Wierbowski, Shayne D.; Shayhidin, Elnur Elyar; Yu, Haiyuan

doi:10.1038/s41592-020-0959-9

Brief Communication
Published: 29 September 2020

Structure-based validation can drastically underestimate error rate in proteome-wide cross-linking mass spectrometry studies

Nature Methods volume 17, pages 985–988 (2020)Cite this article

3451 Accesses
18 Citations
15 Altmetric
Metrics details

Subjects

Abstract

Thorough quality assessment of novel interactions identified by proteome-wide cross-linking mass spectrometry (XL-MS) studies is critical. Almost all current XL-MS studies have validated cross-links against known three-dimensional structures of representative protein complexes. Here, we provide theoretical and experimental evidence demonstrating that this approach can drastically underestimate error rates for proteome-wide XL-MS datasets, and propose a comprehensive set of four data-quality metrics to address this issue.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Evaluation of the conventional 3D structure-based validation approach for proteome-wide XL-MS using human K562 DSSO XL-MS data².**

**Fig. 2: Demonstration of our set of validation metrics on a publicly available *E. coli* proteome-wide XL-MS dataset¹³.**

A synthetic peptide library for benchmarking crosslinking-mass spectrometry search engines for proteins and protein complexes

Article Open access 06 February 2020

A high-speed search engine pLink 2 with systematic evaluation for proteome-scale identification of cross-linked peptides

Article Open access 30 July 2019

Mimicked synthetic ribosomal protein complex for benchmarking crosslinking mass spectrometry workflows

Article Open access 08 July 2022

Data availability

The human K562 XL-MS raw files (122 raw files (97 HILIC and 25 SCX fractions) from our recent proteome-wide human K562 XL-MS study²) analyzed in this study have been deposited to the ProteomeXchange Consortium via the PRIDE⁴⁰ partner repository with the dataset identifier PXD018771. Raw data from our PCA experiments are available from the corresponding author upon request. Protein sequences were obtained from the Uniprot database (https://www.uniprot.org/). Residue-level mapping was performed using data from the SIFTS database (https://www.ebi.ac.uk/pdbe/docs/sifts/index.html). Protein three-dimensional structures utilized in this study were obtained from the PDB (accession codes: 5GJQ, 1EUC, 1T9G, 5LNK, 1ZOY, 1NTM, 1V54, 5MY1, 5ADY, 5ME0, 2RDO, 2VRH, 4JK2, 4YLN, 4YLO, 4XO2, 4YFH and 4YF0). Source data are provided with this paper.

References

Yu, C. & Huang, L. Cross-linking mass spectrometry: an emerging technology for interactomics and structural biology. Anal. Chem. 90, 144–165 (2018).
Article CAS Google Scholar
Yugandhar, K. et al. MaXLinker: proteome-wide cross-link identifications with high specificity and sensitivity. Mol. Cell. Proteomics 19, 554–568 (2020).
Article Google Scholar
Iacobucci, C., Götze, M. & Sinz, A. Cross-linking/mass spectrometry to get a closer view on protein interaction networks. Curr. Opin. Biotechnol. 63, 48–53 (2020).
Article CAS Google Scholar
Ferber, M. et al. Automated structure modeling of large protein assemblies using crosslinks as distance restraints. Nat. Methods 13, 515–520 (2016).
Article CAS Google Scholar
Karaca, E., Rodrigues, J. P. G. L. M., Graziadei, A., Bonvin, A. M. J. J. & Carlomagno, T. M3: an integrative framework for structure determination of molecular machines. Nat. Methods 14, 897–902 (2017).
Article CAS Google Scholar
Hauri, S. et al. Rapid determination of quaternary protein structures in complex biological samples. Nat. Commun. 10, 192 (2019).
Article Google Scholar
Fischer, L. & Rappsilber, J. Quirks of error estimation in cross-linking/mass spectrometry. Anal. Chem. 89, 3829–3833 (2017).
Article CAS Google Scholar
O’Reilly, F. J. & Rappsilber, J. Cross-linking mass spectrometry: methods and applications in structural, molecular and systems biology. Nat. Struct. Mol. Biol. 25, 1000–1008 (2018).
Article Google Scholar
Klykov, O. et al. Efficient and robust proteome-wide approaches for cross-linking mass spectrometry. Nat. Protoc. 13, 2964–2990 (2018).
Article CAS Google Scholar
Liu, F., Lössl, P., Rabbitts, B. M., Balaban, R. S. & Heck, A. J. R. The interactome of intact mitochondria by cross-linking mass spectrometry provides evidence for coexisting respiratory supercomplexes. Mol. Cell. Proteomics 17, 216–232 (2018).
Article CAS Google Scholar
Keller, A., Chavez, J. D., Felt, K. C. & Bruce, J. E. Prediction of an upper limit for the fraction of interprotein cross-links in large-scale in vivo cross-linking studies. J. Proteome Res. 18, 3077–3085 (2019).
Article CAS Google Scholar
Bartolec, T. K. et al. Cross-linking mass spectrometry analysis of the yeast nucleus reveals extensive protein–protein interactions not detected by systematic two-hybrid or affinity purification-mass spectrometry. Anal. Chem. 92, 1874–1882 (2020).
Article CAS Google Scholar
Liu, F., Lössl, P., Scheltema, R., Viner, R. & Heck, A. J. R. Optimized fragmentation schemes and data analysis strategies for proteome-wide cross-link identification. Nat. Commun. 8, 15473 (2017).
Article CAS Google Scholar
Chen, Z.-L. et al. A high-speed search engine pLink 2 with systematic evaluation for proteome-scale identification of cross-linked peptides. Nat. Commun. 10, 3404 (2019).
Article Google Scholar
Götze, M., Iacobucci, C., Ihling, C. H. & Sinz, A. A simple cross-linking/mass spectrometry workflow for studying system-wide protein interactions. Anal. Chem. 91, 10236–10244 (2019).
Article Google Scholar
Yu, H. et al. High-quality binary protein interaction map of the yeast interactome network. Science 322, 104–110 (2008).
Article CAS Google Scholar
Vo, TommyV. et al. A proteome-wide fission yeast interactome reveals network evolution principles from yeasts to human. Cell 164, 310–323 (2016).
Article CAS Google Scholar
Nyfeler, B., Michnick, S. W. & Hauri, H.-P. Capturing protein interactions in the secretory pathway of living cells. Proc. Natl Acad. Sci. USA 102, 6350–6355 (2005).
Article CAS Google Scholar
Braun, P. et al. An experimentally derived confidence score for binary protein-protein interactions. Nat. Methods 6, 91–97 (2008).
Article Google Scholar
Beveridge, R., Stadlmann, J., Penninger, J. M. & Mechtler, K. A synthetic peptide library for benchmarking crosslinking-mass spectrometry search engines for proteins and protein complexes. Nat. Commun. 11, 742 (2020).
Article CAS Google Scholar
Rual, J.-F. et al. Towards a proteome-scale map of the human protein–protein interaction network. Nature 437, 1173–1178 (2005).
Article CAS Google Scholar
Makowski, M. M., Willems, E., Jansen, P. W. T. C. & Vermeulen, M. Cross-linking immunoprecipitation-MS (xIP-MS): topological analysis of chromatin-associated protein complexes using single affinity purification. Mol. Cell. Proteomics 15, 854–865 (2016).
Article CAS Google Scholar
The UniProt Consortium. UniProt: the universal protein knowledgebase. Nucleic Acids Res. 45, D158–D169 (2017).
Dana, J. M. et al. SIFTS: updated Structure Integration with Function, Taxonomy and Sequences resource allows 40-fold increase in coverage of structure-based annotations for proteins. Nucleic Acids Res. 47, D482–D489 (2018).
Article Google Scholar
Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).
Article CAS Google Scholar
Gupta, N., Bandeira, N., Keich, U. & Pevzner, P. A. Target-decoy approach and false discovery rate: when things may go wrong. J. Am. Soc. Mass Spectrom. 22, 1111–1120 (2011).
Article CAS Google Scholar
Orchard, S. et al. Protein interaction data curation: the International Molecular Exchange (IMEx) consortium. Nat. Methods 9, 345–350 (2012).
Article CAS Google Scholar
Kerrien, S. et al. The IntAct molecular interaction database in 2012. Nucleic Acids Res. 40, D841–D846 (2012).
Article CAS Google Scholar
Licata, L. et al. MINT, the molecular interaction database: 2012 update. Nucleic Acids Res. 40, D857–D861 (2012).
Article CAS Google Scholar
Salwinski, L. et al. The database of interacting proteins: 2004 update. Nucleic Acids Res. 32, D449–D451 (2004).
Article CAS Google Scholar
Chatr-aryamontri, A. et al. The BioGRID interaction database: 2015 update. Nucleic Acids Res. 43, D470–D478 (2015).
Article CAS Google Scholar
Keshava Prasad, T. S. et al. Human protein reference database—2009 update. Nucleic Acids Res. 37, D767–D772 (2009).
Article CAS Google Scholar
Pagel, P. et al. The MIPS mammalian protein–protein interaction database. Bioinformatics 21, 832–834 (2005).
Article CAS Google Scholar
Turner, B. et al. iRefWeb: interactive analysis of consolidated protein interaction data and their supporting evidence. Database 2010, baq023–baq023 (2010).
Article Google Scholar
Ruepp, A. et al. CORUM: the comprehensive resource of mammalian protein complexes—2009. Nucleic Acids Res. 38, D497–D501 (2010).
Article CAS Google Scholar
Alfarano, C. et al. The biomolecular interaction network database and related tools 2005 update. Nucleic Acids Res. 33, D418–D424 (2005).
Article CAS Google Scholar
Brown, K. R. & Jurisica, I. Online predicted human interaction database. Bioinformatics 21, 2076–2082 (2005).
Article CAS Google Scholar
Yang, X. et al. A public genome-scale lentiviral expression library of human ORFs. Nat. Methods 8, 659–661 (2011).
Article CAS Google Scholar
Venkatesan, K. et al. An empirical framework for binary interactome mapping. Nat. Methods 6, 83–90 (2008).
Article Google Scholar
Perez-Riverol, Y. et al. The PRIDE database and related tools and resources in 2019: improving support for quantification data. Nucleic Acids Res. 47, D442–D450 (2018).
Article Google Scholar

Download references

Acknowledgements

We thank R. Viner for support in data processing with XlinkX workflow in Proteome Discoverer. K.Y. thanks the Sam and Nancy Fleming Research Fellowship. This work was supported by grants from the National Institutes of Health (grant nos. GM124559 and GM125639) and the National Science Foundation (grant no. DBI-1661380) to H.Y.

Author information

Authors and Affiliations

Department of Computational Biology, Cornell University, Ithaca, NY, USA
Kumar Yugandhar, Ting-Yi Wang, Shayne D. Wierbowski, Elnur Elyar Shayhidin & Haiyuan Yu
Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY, USA
Kumar Yugandhar, Ting-Yi Wang, Shayne D. Wierbowski, Elnur Elyar Shayhidin & Haiyuan Yu

Authors

Kumar Yugandhar
View author publications
You can also search for this author in PubMed Google Scholar
Ting-Yi Wang
View author publications
You can also search for this author in PubMed Google Scholar
Shayne D. Wierbowski
View author publications
You can also search for this author in PubMed Google Scholar
Elnur Elyar Shayhidin
View author publications
You can also search for this author in PubMed Google Scholar
Haiyuan Yu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

H.Y. conceived and oversaw all aspects of the study. K.Y. performed the computational analyses with assistance from S.D.W. T.-Y.W. performed laboratory experiments with assistance from E.E.S. K.Y. and H.Y. wrote the manuscript with inputs from all of the authors.

Corresponding author

Correspondence to Haiyuan Yu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Editor recognition statement Allison Doerr was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Analysis of the human proteome-wide XL-MS dataset using MaXLinker software.

(a) Table showing the number of interprotein cross-links obtained at different filtering criteria, and upon mapping to a representative 3D structure of a human 26S proteasome (PDB id: 5GJQ). (b) Comparison of the fraction of validated cross-links using the conventional structure-based approach (n = 49 XLs for ‘1% FDR’; n = 65 XLs for ‘10% FDR). (c) Comparison using the fraction of structure-corroborating identifications (FSI) (n = 63 XLs for ‘1% FDR’; n = 125 XLs for ‘10% FDR). (d) Comparison using the fraction of mis-identifications (FMI) (n = 8127 XLs for ‘1% FDR’; n = 15110 XLs for ‘10% FDR). (e) Comparison using the fraction of interprotein cross-links from known interactions (FKI) (n = 1144 XLs for ‘1% FDR’; n = 5158 XLs for ‘10% FDR). for (b–e), the P values were calculated using a two-sided Z-test and the error bars indicate +/- SE of proportion.

Source data

Extended Data Fig. 2 Demonstration of the utility of our comprehensive set of validation metrics on a publicly available mouse mitochondrial XL-MS dataset.

(a) Table showing the number of interprotein cross-links obtained at different filtering criteria, and upon mapping to representative 3D structures. (b) Conventional structure-based validation (n = 47 XLs for ‘1% FDR with ΔXlinkX score≥50’; n = 59 XLs for ‘1% FDR’; n = 63 XLs for ‘10% FDR’). (c) Fraction of structure-corroborating identifications (FSI) (n = 360 XLs for ‘1% FDR with ΔXlinkX score≥50’; n = 1402 XLs for ‘1% FDR’; n = 2097 XLs for ‘10% FDR’). (d) Fraction of mis-identifications (FMI) (n = 4814 XLs for ‘1% FDR with ΔXlinkX score≥50’; n = 15323 XLs for ‘1% FDR’; n = 24317 XLs for ‘10% FDR’). (e) Fraction of interprotein cross-links from known interactions (FKI) (n = 2368 XLs for ‘1% FDR with ΔXlinkX score≥50’; n = 11418 XLs for ‘1% FDR’; n = 19665 XLs for ‘10% FDR’). P values in (b-e) were calculated using a two-sided Z-test and the error bars indicate +/- SE of proportion.

Source data

Extended Data Fig. 3 Estimated precision using PCA experiments for the three datasets of different quality from our human K562 proteome-wide XL-MS study.

Derived from Fig. 1g (n = 3 independent experiments; See Methods). The error bars indicate +/- SE of proportion (see Supplementary Note 2 for a detailed description of the methodology).

Source data

Extended Data Fig. 4 Structure-based mapping analysis at 20% FDR, extension to the analysis shown in Fig. 1, Fig. 2, and Extended Data Fig. 2.

a. Human proteome-wide XL-MS study: (i) Conventional structure-based validation (n = 43 XLs for ‘1% FDR with ΔXlinkX score≥50’; n = 72 XLs for ‘1% FDR’; n = 73 XLs for ‘10% FDR’; n = 73 XLs for ‘20% FDR’). (ii) Fraction of structure-corroborating identifications (FSI) (n = 52 XLs for ‘1% FDR with ΔXlinkX score≥50’; n = 262 XLs for ‘1% FDR’; n = 426 XLs for ‘10% FDR’; n = 605 XLs for ‘20% FDR’). b. E. coli proteome-wide XL-MS study: (i) Conventional structure-based validation (n = 14 XLs for ‘1% FDR with ΔXlinkX score≥50’; n = 17 XLs for ‘1% FDR’; n = 17 XLs for ‘10% FDR’; n = 17 XLs for ‘20% FDR’). (ii) Fraction of structure-corroborating identifications (FSI) (n = 31 XLs for ‘1% FDR with ΔXlinkX score≥50’; n = 55 XLs for ‘1% FDR’; n = 101 XLs for ‘10% FDR’; n = 123 XLs for ‘20% FDR’). c. Mouse mitochondrial XL-MS study: (i) Conventional structure-based validation (n = 47 XLs for ‘1% FDR with ΔXlinkX score≥50’; n = 59 XLs for ‘1% FDR’; n = 63 XLs for ‘10% FDR’; n = 63 XLs for ‘20% FDR’). (ii) Fraction of structure-corroborating identifications (FSI) (n = 360 XLs for ‘1% FDR with ΔXlinkX score≥50’; n = 1402 XLs for ‘1% FDR’; n = 2097 XLs for ‘10% FDR’; n = 2751 XLs for ‘20% FDR’). P values in all the panels were calculated using a two-sided Z-test and the error bars indicate +/- SE of proportion.

Source data

Extended Data Fig. 5 Corrected FMI for the three datasets analyzed in the study (Utilizing Equation 3 from Methods section).

(a) Human proteome-wide XL-MS (n = 668 XLs for ‘1% FDR with ΔXlinkX score≥50’; n = 3029 XLs for ‘1% FDR’; n = 4957 XLs for ‘10% FDR). (b) E. coli proteome-wide XL-MS (n = 340 XLs for ‘1% FDR with ΔXlinkX score≥50’; n = 553 XLs for ‘1% FDR’; n = 755 XLs for ‘10% FDR). (c) Mouse mitochondrial XL-MS (n = 4814 XLs for ‘1% FDR with ΔXlinkX score≥50’; n = 15323 XLs for ‘1% FDR’; n = 24317 XLs for ‘10% FDR). P values in all the panels were calculated using a two-sided Z-test and the error bars indicate +/- SE of proportion.

Source data

Supplementary information

Supplementary Information

Supplementary Notes 1–5 and Table 1.

Reporting Summary

Source data

Source Data Fig. 1

Statistical Source Data

Source Data Fig. 2

Statistical Source Data

Source Data Extended Data Fig. 1

Statistical Source Data

Source Data Extended Data Fig. 2

Statistical Source Data

Source Data Extended Data Fig. 3

Statistical Source Data

Source Data Extended Data Fig. 4

Statistical Source Data

Source Data Extended Data Fig. 5

Statistical Source Data

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yugandhar, K., Wang, TY., Wierbowski, S.D. et al. Structure-based validation can drastically underestimate error rate in proteome-wide cross-linking mass spectrometry studies. Nat Methods 17, 985–988 (2020). https://doi.org/10.1038/s41592-020-0959-9

Download citation

Received: 03 April 2019
Accepted: 20 August 2020
Published: 29 September 2020
Issue Date: October 2020
DOI: https://doi.org/10.1038/s41592-020-0959-9

This article is cited by

Mimicked synthetic ribosomal protein complex for benchmarking crosslinking mass spectrometry workflows
- Manuel Matzinger
- Adrian Vasiu
- Karl Mechtler
Nature Communications (2022)
Retention time prediction using neural networks increases identifications in crosslinking mass spectrometry
- Sven H. Giese
- Ludwig R. Sinn
- Juri Rappsilber
Nature Communications (2021)
Reliable identification of protein-protein interactions by crosslinking mass spectrometry
- Swantje Lenz
- Ludwig R. Sinn
- Juri Rappsilber
Nature Communications (2021)

Subjects

Abstract

Access options

Similar content being viewed by others

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Extended data

Extended Data Fig. 4 Structure-based mapping analysis at 20% FDR, extension to the analysis shown in Fig. 1, Fig. 2, and Extended Data Fig. 2.

Extended Data Fig. 5 Corrected FMI for the three datasets analyzed in the study (Utilizing Equation 3 from Methods section).

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links