Abstract
Thorough quality assessment of novel interactions identified by proteome-wide cross-linking mass spectrometry (XL-MS) studies is critical. Almost all current XL-MS studies have validated cross-links against known three-dimensional structures of representative protein complexes. Here, we provide theoretical and experimental evidence demonstrating that this approach can drastically underestimate error rates for proteome-wide XL-MS datasets, and propose a comprehensive set of four data-quality metrics to address this issue.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Rent or buy this article
Prices vary by article type
from$1.95
to$39.95
Prices may be subject to local taxes which are calculated during checkout


Similar content being viewed by others
Data availability
The human K562 XL-MS raw files (122 raw files (97 HILIC and 25 SCX fractions) from our recent proteome-wide human K562 XL-MS study2) analyzed in this study have been deposited to the ProteomeXchange Consortium via the PRIDE40 partner repository with the dataset identifier PXD018771. Raw data from our PCA experiments are available from the corresponding author upon request. Protein sequences were obtained from the Uniprot database (https://www.uniprot.org/). Residue-level mapping was performed using data from the SIFTS database (https://www.ebi.ac.uk/pdbe/docs/sifts/index.html). Protein three-dimensional structures utilized in this study were obtained from the PDB (accession codes: 5GJQ, 1EUC, 1T9G, 5LNK, 1ZOY, 1NTM, 1V54, 5MY1, 5ADY, 5ME0, 2RDO, 2VRH, 4JK2, 4YLN, 4YLO, 4XO2, 4YFH and 4YF0). Source data are provided with this paper.
References
Yu, C. & Huang, L. Cross-linking mass spectrometry: an emerging technology for interactomics and structural biology. Anal. Chem. 90, 144–165 (2018).
Yugandhar, K. et al. MaXLinker: proteome-wide cross-link identifications with high specificity and sensitivity. Mol. Cell. Proteomics 19, 554–568 (2020).
Iacobucci, C., Götze, M. & Sinz, A. Cross-linking/mass spectrometry to get a closer view on protein interaction networks. Curr. Opin. Biotechnol. 63, 48–53 (2020).
Ferber, M. et al. Automated structure modeling of large protein assemblies using crosslinks as distance restraints. Nat. Methods 13, 515–520 (2016).
Karaca, E., Rodrigues, J. P. G. L. M., Graziadei, A., Bonvin, A. M. J. J. & Carlomagno, T. M3: an integrative framework for structure determination of molecular machines. Nat. Methods 14, 897–902 (2017).
Hauri, S. et al. Rapid determination of quaternary protein structures in complex biological samples. Nat. Commun. 10, 192 (2019).
Fischer, L. & Rappsilber, J. Quirks of error estimation in cross-linking/mass spectrometry. Anal. Chem. 89, 3829–3833 (2017).
O’Reilly, F. J. & Rappsilber, J. Cross-linking mass spectrometry: methods and applications in structural, molecular and systems biology. Nat. Struct. Mol. Biol. 25, 1000–1008 (2018).
Klykov, O. et al. Efficient and robust proteome-wide approaches for cross-linking mass spectrometry. Nat. Protoc. 13, 2964–2990 (2018).
Liu, F., Lössl, P., Rabbitts, B. M., Balaban, R. S. & Heck, A. J. R. The interactome of intact mitochondria by cross-linking mass spectrometry provides evidence for coexisting respiratory supercomplexes. Mol. Cell. Proteomics 17, 216–232 (2018).
Keller, A., Chavez, J. D., Felt, K. C. & Bruce, J. E. Prediction of an upper limit for the fraction of interprotein cross-links in large-scale in vivo cross-linking studies. J. Proteome Res. 18, 3077–3085 (2019).
Bartolec, T. K. et al. Cross-linking mass spectrometry analysis of the yeast nucleus reveals extensive protein–protein interactions not detected by systematic two-hybrid or affinity purification-mass spectrometry. Anal. Chem. 92, 1874–1882 (2020).
Liu, F., Lössl, P., Scheltema, R., Viner, R. & Heck, A. J. R. Optimized fragmentation schemes and data analysis strategies for proteome-wide cross-link identification. Nat. Commun. 8, 15473 (2017).
Chen, Z.-L. et al. A high-speed search engine pLink 2 with systematic evaluation for proteome-scale identification of cross-linked peptides. Nat. Commun. 10, 3404 (2019).
Götze, M., Iacobucci, C., Ihling, C. H. & Sinz, A. A simple cross-linking/mass spectrometry workflow for studying system-wide protein interactions. Anal. Chem. 91, 10236–10244 (2019).
Yu, H. et al. High-quality binary protein interaction map of the yeast interactome network. Science 322, 104–110 (2008).
Vo, TommyV. et al. A proteome-wide fission yeast interactome reveals network evolution principles from yeasts to human. Cell 164, 310–323 (2016).
Nyfeler, B., Michnick, S. W. & Hauri, H.-P. Capturing protein interactions in the secretory pathway of living cells. Proc. Natl Acad. Sci. USA 102, 6350–6355 (2005).
Braun, P. et al. An experimentally derived confidence score for binary protein-protein interactions. Nat. Methods 6, 91–97 (2008).
Beveridge, R., Stadlmann, J., Penninger, J. M. & Mechtler, K. A synthetic peptide library for benchmarking crosslinking-mass spectrometry search engines for proteins and protein complexes. Nat. Commun. 11, 742 (2020).
Rual, J.-F. et al. Towards a proteome-scale map of the human protein–protein interaction network. Nature 437, 1173–1178 (2005).
Makowski, M. M., Willems, E., Jansen, P. W. T. C. & Vermeulen, M. Cross-linking immunoprecipitation-MS (xIP-MS): topological analysis of chromatin-associated protein complexes using single affinity purification. Mol. Cell. Proteomics 15, 854–865 (2016).
The UniProt Consortium. UniProt: the universal protein knowledgebase. Nucleic Acids Res. 45, D158–D169 (2017).
Dana, J. M. et al. SIFTS: updated Structure Integration with Function, Taxonomy and Sequences resource allows 40-fold increase in coverage of structure-based annotations for proteins. Nucleic Acids Res. 47, D482–D489 (2018).
Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).
Gupta, N., Bandeira, N., Keich, U. & Pevzner, P. A. Target-decoy approach and false discovery rate: when things may go wrong. J. Am. Soc. Mass Spectrom. 22, 1111–1120 (2011).
Orchard, S. et al. Protein interaction data curation: the International Molecular Exchange (IMEx) consortium. Nat. Methods 9, 345–350 (2012).
Kerrien, S. et al. The IntAct molecular interaction database in 2012. Nucleic Acids Res. 40, D841–D846 (2012).
Licata, L. et al. MINT, the molecular interaction database: 2012 update. Nucleic Acids Res. 40, D857–D861 (2012).
Salwinski, L. et al. The database of interacting proteins: 2004 update. Nucleic Acids Res. 32, D449–D451 (2004).
Chatr-aryamontri, A. et al. The BioGRID interaction database: 2015 update. Nucleic Acids Res. 43, D470–D478 (2015).
Keshava Prasad, T. S. et al. Human protein reference database—2009 update. Nucleic Acids Res. 37, D767–D772 (2009).
Pagel, P. et al. The MIPS mammalian protein–protein interaction database. Bioinformatics 21, 832–834 (2005).
Turner, B. et al. iRefWeb: interactive analysis of consolidated protein interaction data and their supporting evidence. Database 2010, baq023–baq023 (2010).
Ruepp, A. et al. CORUM: the comprehensive resource of mammalian protein complexes—2009. Nucleic Acids Res. 38, D497–D501 (2010).
Alfarano, C. et al. The biomolecular interaction network database and related tools 2005 update. Nucleic Acids Res. 33, D418–D424 (2005).
Brown, K. R. & Jurisica, I. Online predicted human interaction database. Bioinformatics 21, 2076–2082 (2005).
Yang, X. et al. A public genome-scale lentiviral expression library of human ORFs. Nat. Methods 8, 659–661 (2011).
Venkatesan, K. et al. An empirical framework for binary interactome mapping. Nat. Methods 6, 83–90 (2008).
Perez-Riverol, Y. et al. The PRIDE database and related tools and resources in 2019: improving support for quantification data. Nucleic Acids Res. 47, D442–D450 (2018).
Acknowledgements
We thank R. Viner for support in data processing with XlinkX workflow in Proteome Discoverer. K.Y. thanks the Sam and Nancy Fleming Research Fellowship. This work was supported by grants from the National Institutes of Health (grant nos. GM124559 and GM125639) and the National Science Foundation (grant no. DBI-1661380) to H.Y.
Author information
Authors and Affiliations
Contributions
H.Y. conceived and oversaw all aspects of the study. K.Y. performed the computational analyses with assistance from S.D.W. T.-Y.W. performed laboratory experiments with assistance from E.E.S. K.Y. and H.Y. wrote the manuscript with inputs from all of the authors.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Editor recognition statement Allison Doerr was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Analysis of the human proteome-wide XL-MS dataset using MaXLinker software.
(a) Table showing the number of interprotein cross-links obtained at different filtering criteria, and upon mapping to a representative 3D structure of a human 26S proteasome (PDB id: 5GJQ). (b) Comparison of the fraction of validated cross-links using the conventional structure-based approach (n = 49 XLs for ‘1% FDR’; n = 65 XLs for ‘10% FDR). (c) Comparison using the fraction of structure-corroborating identifications (FSI) (n = 63 XLs for ‘1% FDR’; n = 125 XLs for ‘10% FDR). (d) Comparison using the fraction of mis-identifications (FMI) (n = 8127 XLs for ‘1% FDR’; n = 15110 XLs for ‘10% FDR). (e) Comparison using the fraction of interprotein cross-links from known interactions (FKI) (n = 1144 XLs for ‘1% FDR’; n = 5158 XLs for ‘10% FDR). for (b–e), the P values were calculated using a two-sided Z-test and the error bars indicate +/- SE of proportion.
Extended Data Fig. 2 Demonstration of the utility of our comprehensive set of validation metrics on a publicly available mouse mitochondrial XL-MS dataset.
(a) Table showing the number of interprotein cross-links obtained at different filtering criteria, and upon mapping to representative 3D structures. (b) Conventional structure-based validation (n = 47 XLs for ‘1% FDR with ΔXlinkX score≥50’; n = 59 XLs for ‘1% FDR’; n = 63 XLs for ‘10% FDR’). (c) Fraction of structure-corroborating identifications (FSI) (n = 360 XLs for ‘1% FDR with ΔXlinkX score≥50’; n = 1402 XLs for ‘1% FDR’; n = 2097 XLs for ‘10% FDR’). (d) Fraction of mis-identifications (FMI) (n = 4814 XLs for ‘1% FDR with ΔXlinkX score≥50’; n = 15323 XLs for ‘1% FDR’; n = 24317 XLs for ‘10% FDR’). (e) Fraction of interprotein cross-links from known interactions (FKI) (n = 2368 XLs for ‘1% FDR with ΔXlinkX score≥50’; n = 11418 XLs for ‘1% FDR’; n = 19665 XLs for ‘10% FDR’). P values in (b-e) were calculated using a two-sided Z-test and the error bars indicate +/- SE of proportion.
Extended Data Fig. 3 Estimated precision using PCA experiments for the three datasets of different quality from our human K562 proteome-wide XL-MS study.
Derived from Fig. 1g (n = 3 independent experiments; See Methods). The error bars indicate +/- SE of proportion (see Supplementary Note 2 for a detailed description of the methodology).
Extended Data Fig. 4 Structure-based mapping analysis at 20% FDR, extension to the analysis shown in Fig. 1, Fig. 2, and Extended Data Fig. 2.
a. Human proteome-wide XL-MS study: (i) Conventional structure-based validation (n = 43 XLs for ‘1% FDR with ΔXlinkX score≥50’; n = 72 XLs for ‘1% FDR’; n = 73 XLs for ‘10% FDR’; n = 73 XLs for ‘20% FDR’). (ii) Fraction of structure-corroborating identifications (FSI) (n = 52 XLs for ‘1% FDR with ΔXlinkX score≥50’; n = 262 XLs for ‘1% FDR’; n = 426 XLs for ‘10% FDR’; n = 605 XLs for ‘20% FDR’). b. E. coli proteome-wide XL-MS study: (i) Conventional structure-based validation (n = 14 XLs for ‘1% FDR with ΔXlinkX score≥50’; n = 17 XLs for ‘1% FDR’; n = 17 XLs for ‘10% FDR’; n = 17 XLs for ‘20% FDR’). (ii) Fraction of structure-corroborating identifications (FSI) (n = 31 XLs for ‘1% FDR with ΔXlinkX score≥50’; n = 55 XLs for ‘1% FDR’; n = 101 XLs for ‘10% FDR’; n = 123 XLs for ‘20% FDR’). c. Mouse mitochondrial XL-MS study: (i) Conventional structure-based validation (n = 47 XLs for ‘1% FDR with ΔXlinkX score≥50’; n = 59 XLs for ‘1% FDR’; n = 63 XLs for ‘10% FDR’; n = 63 XLs for ‘20% FDR’). (ii) Fraction of structure-corroborating identifications (FSI) (n = 360 XLs for ‘1% FDR with ΔXlinkX score≥50’; n = 1402 XLs for ‘1% FDR’; n = 2097 XLs for ‘10% FDR’; n = 2751 XLs for ‘20% FDR’). P values in all the panels were calculated using a two-sided Z-test and the error bars indicate +/- SE of proportion.
Extended Data Fig. 5 Corrected FMI for the three datasets analyzed in the study (Utilizing Equation 3 from Methods section).
(a) Human proteome-wide XL-MS (n = 668 XLs for ‘1% FDR with ΔXlinkX score≥50’; n = 3029 XLs for ‘1% FDR’; n = 4957 XLs for ‘10% FDR). (b) E. coli proteome-wide XL-MS (n = 340 XLs for ‘1% FDR with ΔXlinkX score≥50’; n = 553 XLs for ‘1% FDR’; n = 755 XLs for ‘10% FDR). (c) Mouse mitochondrial XL-MS (n = 4814 XLs for ‘1% FDR with ΔXlinkX score≥50’; n = 15323 XLs for ‘1% FDR’; n = 24317 XLs for ‘10% FDR). P values in all the panels were calculated using a two-sided Z-test and the error bars indicate +/- SE of proportion.
Supplementary information
Supplementary Information
Supplementary Notes 1–5 and Table 1.
Source data
Source Data Fig. 1
Statistical Source Data
Source Data Fig. 2
Statistical Source Data
Source Data Extended Data Fig. 1
Statistical Source Data
Source Data Extended Data Fig. 2
Statistical Source Data
Source Data Extended Data Fig. 3
Statistical Source Data
Source Data Extended Data Fig. 4
Statistical Source Data
Source Data Extended Data Fig. 5
Statistical Source Data
Rights and permissions
About this article
Cite this article
Yugandhar, K., Wang, TY., Wierbowski, S.D. et al. Structure-based validation can drastically underestimate error rate in proteome-wide cross-linking mass spectrometry studies. Nat Methods 17, 985–988 (2020). https://doi.org/10.1038/s41592-020-0959-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41592-020-0959-9
This article is cited by
-
Mimicked synthetic ribosomal protein complex for benchmarking crosslinking mass spectrometry workflows
Nature Communications (2022)
-
Retention time prediction using neural networks increases identifications in crosslinking mass spectrometry
Nature Communications (2021)
-
Reliable identification of protein-protein interactions by crosslinking mass spectrometry
Nature Communications (2021)