Structure-based validation can drastically underestimate error rate in proteome-wide cross-linking mass spectrometry studies

Abstract

Thorough quality assessment of novel interactions identified by proteome-wide cross-linking mass spectrometry (XL-MS) studies is critical. Almost all current XL-MS studies have validated cross-links against known three-dimensional structures of representative protein complexes. Here, we provide theoretical and experimental evidence demonstrating that this approach can drastically underestimate error rates for proteome-wide XL-MS datasets, and propose a comprehensive set of four data-quality metrics to address this issue.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: Evaluation of the conventional 3D structure-based validation approach for proteome-wide XL-MS using human K562 DSSO XL-MS data2.
Fig. 2: Demonstration of our set of validation metrics on a publicly available E. coli proteome-wide XL-MS dataset13.

Data availability

The human K562 XL-MS raw files (122 raw files (97 HILIC and 25 SCX fractions) from our recent proteome-wide human K562 XL-MS study2) analyzed in this study have been deposited to the ProteomeXchange Consortium via the PRIDE40 partner repository with the dataset identifier PXD018771. Raw data from our PCA experiments are available from the corresponding author upon request. Protein sequences were obtained from the Uniprot database (https://www.uniprot.org/). Residue-level mapping was performed using data from the SIFTS database (https://www.ebi.ac.uk/pdbe/docs/sifts/index.html). Protein three-dimensional structures utilized in this study were obtained from the PDB (accession codes: 5GJQ, 1EUC, 1T9G, 5LNK, 1ZOY, 1NTM, 1V54, 5MY1, 5ADY, 5ME0, 2RDO, 2VRH, 4JK2, 4YLN, 4YLO, 4XO2, 4YFH and 4YF0). Source data are provided with this paper.

References

  1. 1.

    Yu, C. & Huang, L. Cross-linking mass spectrometry: an emerging technology for interactomics and structural biology. Anal. Chem. 90, 144–165 (2018).

    CAS  Article  Google Scholar 

  2. 2.

    Yugandhar, K. et al. MaXLinker: proteome-wide cross-link identifications with high specificity and sensitivity. Mol. Cell. Proteomics 19, 554–568 (2020).

    Article  Google Scholar 

  3. 3.

    Iacobucci, C., Götze, M. & Sinz, A. Cross-linking/mass spectrometry to get a closer view on protein interaction networks. Curr. Opin. Biotechnol. 63, 48–53 (2020).

    CAS  Article  Google Scholar 

  4. 4.

    Ferber, M. et al. Automated structure modeling of large protein assemblies using crosslinks as distance restraints. Nat. Methods 13, 515–520 (2016).

    CAS  Article  Google Scholar 

  5. 5.

    Karaca, E., Rodrigues, J. P. G. L. M., Graziadei, A., Bonvin, A. M. J. J. & Carlomagno, T. M3: an integrative framework for structure determination of molecular machines. Nat. Methods 14, 897–902 (2017).

    CAS  Article  Google Scholar 

  6. 6.

    Hauri, S. et al. Rapid determination of quaternary protein structures in complex biological samples. Nat. Commun. 10, 192 (2019).

    Article  Google Scholar 

  7. 7.

    Fischer, L. & Rappsilber, J. Quirks of error estimation in cross-linking/mass spectrometry. Anal. Chem. 89, 3829–3833 (2017).

    CAS  Article  Google Scholar 

  8. 8.

    O’Reilly, F. J. & Rappsilber, J. Cross-linking mass spectrometry: methods and applications in structural, molecular and systems biology. Nat. Struct. Mol. Biol. 25, 1000–1008 (2018).

    Article  Google Scholar 

  9. 9.

    Klykov, O. et al. Efficient and robust proteome-wide approaches for cross-linking mass spectrometry. Nat. Protoc. 13, 2964–2990 (2018).

    CAS  Article  Google Scholar 

  10. 10.

    Liu, F., Lössl, P., Rabbitts, B. M., Balaban, R. S. & Heck, A. J. R. The interactome of intact mitochondria by cross-linking mass spectrometry provides evidence for coexisting respiratory supercomplexes. Mol. Cell. Proteomics 17, 216–232 (2018).

    CAS  Article  Google Scholar 

  11. 11.

    Keller, A., Chavez, J. D., Felt, K. C. & Bruce, J. E. Prediction of an upper limit for the fraction of interprotein cross-links in large-scale in vivo cross-linking studies. J. Proteome Res. 18, 3077–3085 (2019).

    CAS  Article  Google Scholar 

  12. 12.

    Bartolec, T. K. et al. Cross-linking mass spectrometry analysis of the yeast nucleus reveals extensive protein–protein interactions not detected by systematic two-hybrid or affinity purification-mass spectrometry. Anal. Chem. 92, 1874–1882 (2020).

    CAS  Article  Google Scholar 

  13. 13.

    Liu, F., Lössl, P., Scheltema, R., Viner, R. & Heck, A. J. R. Optimized fragmentation schemes and data analysis strategies for proteome-wide cross-link identification. Nat. Commun. 8, 15473 (2017).

    CAS  Article  Google Scholar 

  14. 14.

    Chen, Z.-L. et al. A high-speed search engine pLink 2 with systematic evaluation for proteome-scale identification of cross-linked peptides. Nat. Commun. 10, 3404 (2019).

    Article  Google Scholar 

  15. 15.

    Götze, M., Iacobucci, C., Ihling, C. H. & Sinz, A. A simple cross-linking/mass spectrometry workflow for studying system-wide protein interactions. Anal. Chem. 91, 10236–10244 (2019).

    Article  Google Scholar 

  16. 16.

    Yu, H. et al. High-quality binary protein interaction map of the yeast interactome network. Science 322, 104–110 (2008).

    CAS  Article  Google Scholar 

  17. 17.

    Vo, TommyV. et al. A proteome-wide fission yeast interactome reveals network evolution principles from yeasts to human. Cell 164, 310–323 (2016).

    CAS  Article  Google Scholar 

  18. 18.

    Nyfeler, B., Michnick, S. W. & Hauri, H.-P. Capturing protein interactions in the secretory pathway of living cells. Proc. Natl Acad. Sci. USA 102, 6350–6355 (2005).

    CAS  Article  Google Scholar 

  19. 19.

    Braun, P. et al. An experimentally derived confidence score for binary protein-protein interactions. Nat. Methods 6, 91–97 (2008).

    Article  Google Scholar 

  20. 20.

    Beveridge, R., Stadlmann, J., Penninger, J. M. & Mechtler, K. A synthetic peptide library for benchmarking crosslinking-mass spectrometry search engines for proteins and protein complexes. Nat. Commun. 11, 742 (2020).

    CAS  Article  Google Scholar 

  21. 21.

    Rual, J.-F. et al. Towards a proteome-scale map of the human protein–protein interaction network. Nature 437, 1173–1178 (2005).

    CAS  Article  Google Scholar 

  22. 22.

    Makowski, M. M., Willems, E., Jansen, P. W. T. C. & Vermeulen, M. Cross-linking immunoprecipitation-MS (xIP-MS): topological analysis of chromatin-associated protein complexes using single affinity purification. Mol. Cell. Proteomics 15, 854–865 (2016).

    CAS  Article  Google Scholar 

  23. 23.

    The UniProt Consortium. UniProt: the universal protein knowledgebase. Nucleic Acids Res. 45, D158–D169 (2017).

  24. 24.

    Dana, J. M. et al. SIFTS: updated Structure Integration with Function, Taxonomy and Sequences resource allows 40-fold increase in coverage of structure-based annotations for proteins. Nucleic Acids Res. 47, D482–D489 (2018).

    Article  Google Scholar 

  25. 25.

    Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).

    CAS  Article  Google Scholar 

  26. 26.

    Gupta, N., Bandeira, N., Keich, U. & Pevzner, P. A. Target-decoy approach and false discovery rate: when things may go wrong. J. Am. Soc. Mass Spectrom. 22, 1111–1120 (2011).

    CAS  Article  Google Scholar 

  27. 27.

    Orchard, S. et al. Protein interaction data curation: the International Molecular Exchange (IMEx) consortium. Nat. Methods 9, 345–350 (2012).

    CAS  Article  Google Scholar 

  28. 28.

    Kerrien, S. et al. The IntAct molecular interaction database in 2012. Nucleic Acids Res. 40, D841–D846 (2012).

    CAS  Article  Google Scholar 

  29. 29.

    Licata, L. et al. MINT, the molecular interaction database: 2012 update. Nucleic Acids Res. 40, D857–D861 (2012).

    CAS  Article  Google Scholar 

  30. 30.

    Salwinski, L. et al. The database of interacting proteins: 2004 update. Nucleic Acids Res. 32, D449–D451 (2004).

    CAS  Article  Google Scholar 

  31. 31.

    Chatr-aryamontri, A. et al. The BioGRID interaction database: 2015 update. Nucleic Acids Res. 43, D470–D478 (2015).

    CAS  Article  Google Scholar 

  32. 32.

    Keshava Prasad, T. S. et al. Human protein reference database—2009 update. Nucleic Acids Res. 37, D767–D772 (2009).

    CAS  Article  Google Scholar 

  33. 33.

    Pagel, P. et al. The MIPS mammalian protein–protein interaction database. Bioinformatics 21, 832–834 (2005).

    CAS  Article  Google Scholar 

  34. 34.

    Turner, B. et al. iRefWeb: interactive analysis of consolidated protein interaction data and their supporting evidence. Database 2010, baq023–baq023 (2010).

    Article  Google Scholar 

  35. 35.

    Ruepp, A. et al. CORUM: the comprehensive resource of mammalian protein complexes—2009. Nucleic Acids Res. 38, D497–D501 (2010).

    CAS  Article  Google Scholar 

  36. 36.

    Alfarano, C. et al. The biomolecular interaction network database and related tools 2005 update. Nucleic Acids Res. 33, D418–D424 (2005).

    CAS  Article  Google Scholar 

  37. 37.

    Brown, K. R. & Jurisica, I. Online predicted human interaction database. Bioinformatics 21, 2076–2082 (2005).

    CAS  Article  Google Scholar 

  38. 38.

    Yang, X. et al. A public genome-scale lentiviral expression library of human ORFs. Nat. Methods 8, 659–661 (2011).

    CAS  Article  Google Scholar 

  39. 39.

    Venkatesan, K. et al. An empirical framework for binary interactome mapping. Nat. Methods 6, 83–90 (2008).

    Article  Google Scholar 

  40. 40.

    Perez-Riverol, Y. et al. The PRIDE database and related tools and resources in 2019: improving support for quantification data. Nucleic Acids Res. 47, D442–D450 (2018).

    Article  Google Scholar 

Download references

Acknowledgements

We thank R. Viner for support in data processing with XlinkX workflow in Proteome Discoverer. K.Y. thanks the Sam and Nancy Fleming Research Fellowship. This work was supported by grants from the National Institutes of Health (grant nos. GM124559 and GM125639) and the National Science Foundation (grant no. DBI-1661380) to H.Y.

Author information

Affiliations

Authors

Contributions

H.Y. conceived and oversaw all aspects of the study. K.Y. performed the computational analyses with assistance from S.D.W. T.-Y.W. performed laboratory experiments with assistance from E.E.S. K.Y. and H.Y. wrote the manuscript with inputs from all of the authors.

Corresponding author

Correspondence to Haiyuan Yu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Editor recognition statement Allison Doerr was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Analysis of the human proteome-wide XL-MS dataset using MaXLinker software.

(a) Table showing the number of interprotein cross-links obtained at different filtering criteria, and upon mapping to a representative 3D structure of a human 26S proteasome (PDB id: 5GJQ). (b) Comparison of the fraction of validated cross-links using the conventional structure-based approach (n = 49 XLs for ‘1% FDR’; n = 65 XLs for ‘10% FDR). (c) Comparison using the fraction of structure-corroborating identifications (FSI) (n = 63 XLs for ‘1% FDR’; n = 125 XLs for ‘10% FDR). (d) Comparison using the fraction of mis-identifications (FMI) (n = 8127 XLs for ‘1% FDR’; n = 15110 XLs for ‘10% FDR). (e) Comparison using the fraction of interprotein cross-links from known interactions (FKI) (n = 1144 XLs for ‘1% FDR’; n = 5158 XLs for ‘10% FDR). for (be), the P values were calculated using a two-sided Z-test and the error bars indicate +/- SE of proportion. Source data

Extended Data Fig. 2 Demonstration of the utility of our comprehensive set of validation metrics on a publicly available mouse mitochondrial XL-MS dataset.

(a) Table showing the number of interprotein cross-links obtained at different filtering criteria, and upon mapping to representative 3D structures. (b) Conventional structure-based validation (n = 47 XLs for ‘1% FDR with ΔXlinkX score≥50’; n = 59 XLs for ‘1% FDR’; n = 63 XLs for ‘10% FDR’). (c) Fraction of structure-corroborating identifications (FSI) (n = 360 XLs for ‘1% FDR with ΔXlinkX score≥50’; n = 1402 XLs for ‘1% FDR’; n = 2097 XLs for ‘10% FDR’). (d) Fraction of mis-identifications (FMI) (n = 4814 XLs for ‘1% FDR with ΔXlinkX score≥50’; n = 15323 XLs for ‘1% FDR’; n = 24317 XLs for ‘10% FDR’). (e) Fraction of interprotein cross-links from known interactions (FKI) (n = 2368 XLs for ‘1% FDR with ΔXlinkX score≥50’; n = 11418 XLs for ‘1% FDR’; n = 19665 XLs for ‘10% FDR’). P values in (b-e) were calculated using a two-sided Z-test and the error bars indicate +/- SE of proportion. Source data

Extended Data Fig. 3 Estimated precision using PCA experiments for the three datasets of different quality from our human K562 proteome-wide XL-MS study.

Derived from Fig. 1g (n = 3 independent experiments; See Methods). The error bars indicate +/- SE of proportion (see Supplementary Note 2 for a detailed description of the methodology). Source data

Extended Data Fig. 4 Structure-based mapping analysis at 20% FDR, extension to the analysis shown in Fig. 1, Fig. 2, and Extended Data Fig. 2.

a. Human proteome-wide XL-MS study: (i) Conventional structure-based validation (n = 43 XLs for ‘1% FDR with ΔXlinkX score≥50’; n = 72 XLs for ‘1% FDR’; n = 73 XLs for ‘10% FDR’; n = 73 XLs for ‘20% FDR’). (ii) Fraction of structure-corroborating identifications (FSI) (n = 52 XLs for ‘1% FDR with ΔXlinkX score≥50’; n = 262 XLs for ‘1% FDR’; n = 426 XLs for ‘10% FDR’; n = 605 XLs for ‘20% FDR’). b. E. coli proteome-wide XL-MS study: (i) Conventional structure-based validation (n = 14 XLs for ‘1% FDR with ΔXlinkX score≥50’; n = 17 XLs for ‘1% FDR’; n = 17 XLs for ‘10% FDR’; n = 17 XLs for ‘20% FDR’). (ii) Fraction of structure-corroborating identifications (FSI) (n = 31 XLs for ‘1% FDR with ΔXlinkX score≥50’; n = 55 XLs for ‘1% FDR’; n = 101 XLs for ‘10% FDR’; n = 123 XLs for ‘20% FDR’). c. Mouse mitochondrial XL-MS study: (i) Conventional structure-based validation (n = 47 XLs for ‘1% FDR with ΔXlinkX score≥50’; n = 59 XLs for ‘1% FDR’; n = 63 XLs for ‘10% FDR’; n = 63 XLs for ‘20% FDR’). (ii) Fraction of structure-corroborating identifications (FSI) (n = 360 XLs for ‘1% FDR with ΔXlinkX score≥50’; n = 1402 XLs for ‘1% FDR’; n = 2097 XLs for ‘10% FDR’; n = 2751 XLs for ‘20% FDR’). P values in all the panels were calculated using a two-sided Z-test and the error bars indicate +/- SE of proportion. Source data

Extended Data Fig. 5 Corrected FMI for the three datasets analyzed in the study (Utilizing Equation 3 from Methods section).

(a) Human proteome-wide XL-MS (n = 668 XLs for ‘1% FDR with ΔXlinkX score≥50’; n = 3029 XLs for ‘1% FDR’; n = 4957 XLs for ‘10% FDR). (b) E. coli proteome-wide XL-MS (n = 340 XLs for ‘1% FDR with ΔXlinkX score≥50’; n = 553 XLs for ‘1% FDR’; n = 755 XLs for ‘10% FDR). (c) Mouse mitochondrial XL-MS (n = 4814 XLs for ‘1% FDR with ΔXlinkX score≥50’; n = 15323 XLs for ‘1% FDR’; n = 24317 XLs for ‘10% FDR). P values in all the panels were calculated using a two-sided Z-test and the error bars indicate +/- SE of proportion. Source data

Supplementary information

Supplementary Information

Supplementary Notes 1–5 and Table 1.

Reporting Summary

Source data

Source Data Fig. 1

Statistical Source Data

Source Data Fig. 2

Statistical Source Data

Source Data Extended Data Fig. 1

Statistical Source Data

Source Data Extended Data Fig. 2

Statistical Source Data

Source Data Extended Data Fig. 3

Statistical Source Data

Source Data Extended Data Fig. 4

Statistical Source Data

Source Data Extended Data Fig. 5

Statistical Source Data

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Yugandhar, K., Wang, T., Wierbowski, S.D. et al. Structure-based validation can drastically underestimate error rate in proteome-wide cross-linking mass spectrometry studies. Nat Methods 17, 985–988 (2020). https://doi.org/10.1038/s41592-020-0959-9

Download citation

Search

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing