Abstract
Co-fractionation mass spectrometry (CF-MS) has emerged as a powerful technique for interactome mapping. However, there is little consensus on optimal strategies for the design of CF-MS experiments or their computational analysis. Here, we reanalyzed a total of 206 CF-MS experiments to generate a uniformly processed resource containing over 11 million measurements of protein abundance. We used this resource to benchmark experimental designs for CF-MS studies and systematically optimize computational approaches to network inference. We then applied this optimized methodology to reconstruct a draft-quality human interactome by CF-MS and predict over 700,000 protein–protein interactions across 27 eukaryotic species or clades. Our work defines new resources to illuminate proteome organization over evolutionary timescales and establishes best practices for the design and analysis of CF-MS studies.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
A list of all raw mass spectrometry files analyzed in this study and their accession numbers in PRIDE or MassIVE repositories is provided in Supplementary Table 1. All data generated in this study are available at multiple levels of analysis from the following sources: protein chromatograms and protein–protein interaction networks for up to ten proteins can be visualized and downloaded via an interactive web application at http://cf-ms-browser.msl.ubc.ca; processed chromatograms and MaxQuant proteinGroups.txt files are available via Zenodo at https://doi.org/10.5281/zenodo.4499320; complete MaxQuant outputs for all 206 experiments were deposited to the PRIDE repository83 with the dataset identifier PXD022048; predicted interactomes for 27 species and clades, including the consensus human CF-MS interactome, are available via Zenodo at https://doi.org/10.5281/zenodo.4245282. An overview of all publicly available resources generated in this study is provided at the supporting website (https://fosterlab.github.io/CF-MS-analysis).
Code availability
Source code used to download and reanalyze publicly available CF-MS data using MaxQuant is available at https://github.com/skinnider/CF-MS-searches (https://doi.org/10.5281/zenodo.4774750). Source code used to carry out analyses presented in the paper, with relevant intermediate data files, is available at https://github.com/skinnider/CF-MS-analysis (https://doi.org/10.5281/zenodo.4774754). Source code for the CF-MS browser web application is available at https://github.com/skinnider/CF-MS-browser (https://doi.org/10.5281/zenodo.4774752). The CFTK R package is available at https://github.com/fosterlab/CFTK (https://doi.org/10.5281/zenodo.4774771).
References
Rolland, T. et al. A proteome-scale map of the human interactome network. Cell 159, 1212–1226 (2014).
Luck, K. et al. A reference map of the human binary protein interactome. Nature 580, 402–408 (2020).
Huttlin, E. L. et al. The BioPlex network: a systematic exploration of the human interactome. Cell 162, 425–440 (2015).
Huttlin, E. L. et al. Architecture of the human interactome defines protein communities and disease networks. Nature 545, 505–509 (2017).
Hein, M. Y. et al. A human interactome in three quantitative dimensions organized by stoichiometries and abundances. Cell 163, 712–723 (2015).
Kim, Y., Jung, J. P., Pack, C.-G. & Huh, W.-K. Global analysis of protein homomerization in Saccharomyces cerevisiae. Genome Res. 29, 135–145 (2019).
Werner, J. N. et al. Quantitative genome-scale analysis of protein localization in an asymmetric bacterium. Proc. Natl Acad. Sci. USA 106, 7858–7863 (2009).
Kristensen, A. R., Gsponer, J. & Foster, L. J. A high-throughput approach for measuring temporal changes in the interactome. Nat. Methods 9, 907–909 (2012).
Havugimana, P. C. et al. A census of human soluble protein complexes. Cell 150, 1068–1081 (2012).
Wan, C. et al. Panorama of ancient metazoan macromolecular complexes. Nature 525, 339–344 (2015).
McWhite, C. D. et al. A pan-plant protein complex map reveals deep conservation and novel assemblies. Cell 181, 460–474 (2020).
Rosenberger, G. et al. SECAT: quantifying protein complex dynamics across cell states by network-centric analysis of SEC-SWATH-MS profiles. Cell Syst. https://doi.org/10.1016/j.cels.2020.11.006 (2020).
Fossati, A. et al. PCprophet: a framework for protein complex prediction and differential analysis using proteomic data. Nat. Methods https://doi.org/10.1038/s41592-021-01107-5 (2020).
Hu, L. Z. et al. EPIC: software toolkit for elution profile-based inference of protein complexes. Nat. Methods 16, 737–742 (2019).
Tyanova, S., Temu, T. & Cox, J. The MaxQuant computational platform for mass spectrometry-based shotgun proteomics. Nat. Protoc. 11, 2301–2319 (2016).
Giurgiu, M. et al. CORUM: the comprehensive resource of mammalian protein complexes—2019. Nucleic Acids Res. 47, D559–D563 (2019).
Skinnider, M. A. et al. An atlas of protein–protein interactions across mammalian tissues. Preprint at bioRxiv https://doi.org/10.1101/351247 (2018).
Jarzab, A. et al. Meltome atlas—thermal proteome stability across the tree of life. Nat. Methods 17, 495–503 (2020).
Ochoa, D. et al. The functional landscape of the human phosphoproteome. Nat. Biotechnol. 38, 365–373 (2020).
Kustatscher, G. et al. Co-regulation map of the human proteome enables identification of protein functions. Nat. Biotechnol. 37, 1361–1371 (2019).
Mellacheruvu, D. et al. The CRAPome: a contaminant repository for affinity purification–mass spectrometry data. Nat. Methods 10, 730–736 (2013).
Romanov, N. et al. Disentangling genetic and environmental effects on the proteotypes of individuals. Cell 177, 1308–1318 (2019).
Skinnider, M. A., Squair, J. W. & Foster, L. J. Evaluating measures of association for single-cell transcriptomics. Nat. Methods 16, 381–386 (2019).
Stacey, R. G., Skinnider, M. A., Scott, N. E. & Foster, L. J. A rapid and accurate approach for prediction of interactomes from co-elution data (PrInCE). BMC Bioinformatics 18, 457 (2017).
Bludau, I. et al. Complex-centric proteome profiling by SEC-SWATH-MS for the parallel detection of hundreds of protein complexes. Nat. Protoc. 15, 2341–2386 (2020).
Cox, J. et al. Accurate proteome-wide label-free quantification by delayed normalization and maximal peptide ratio extraction, termed MaxLFQ. Mol. Cell. Proteomics 13, 2513–2526 (2014).
Liu, H., Sadygov, R. G. & Yates, J. R. A model for random sampling and estimation of relative protein abundance in shotgun proteomics. Anal. Chem. 76, 4193–4201 (2004).
Al Shweiki, M. R. et al. Assessment of label-free quantification in discovery proteomics and impact of technological factors and natural variability of protein abundance. J. Proteome Res. 16, 1410–1424 (2017).
McIlwain, S. et al. Estimating relative abundances of proteins from shotgun proteomics data. BMC Bioinformatics 13, 308 (2012).
Scott, N. E., Brown, L. M., Kristensen, A. R. & Foster, L. J. Development of a computational framework for the analysis of protein correlation profiling and spatial proteomics experiments. J. Proteomics 118, 112–129 (2015).
Scott, N. E. et al. Interactome disassembly during apoptosis occurs independent of caspase cleavage. Mol. Syst. Biol. 13, 906 (2017).
Pourhaghighi, R. et al. BraInMap elucidates the macromolecular connectivity landscape of mammalian brain. Cell Syst. 10, 333–350 (2020).
Kastritis, P. L. et al. Capturing protein communities by structural proteomics in a thermophilic eukaryote. Mol. Syst. Biol. 13, 936 (2017).
Drew, K. et al. Integration of over 9,000 mass spectrometry experiments builds a global map of human protein complexes. Mol. Syst. Biol. 13, 932 (2017).
Drew, K., Wallingford, J. B. & Marcotte, E. M. hu.MAP 2.0: integration of over 15,000 proteomic experiments builds a global compendium of human multiprotein assemblies. Mol. Syst. Biol. 17, e10016 (2021).
Ballouz, S., Weber, M., Pavlidis, P. & Gillis, J. EGAD: ultra-fast functional analysis of gene networks. Bioinformatics 33, 612–614 (2017).
Lapek, J. D. et al. Detection of dysregulated protein-association networks by high-throughput proteomics predicts cancer vulnerabilities. Nat. Biotechnol. 35, 983–989 (2017).
Orre, L. M. et al. SubCellBarCode: proteome-wide mapping of protein localization and relocalization. Mol. Cell 73, 166–182 (2019).
Geladaki, A. et al. Combining LOPIT with differential ultracentrifugation for high-resolution spatial proteomics. Nat. Commun. 10, 331 (2019).
Cusick, M. E. et al. Literature-curated protein interaction datasets. Nat. Methods 6, 39–46 (2009).
Heide, H. et al. Complexome profiling identifies TMEM126B as a component of the mitochondrial complex I assembly complex. Cell Metab. 16, 538–549 (2012).
von Mering, C. et al. Comparative assessment of large-scale data sets of protein–protein interactions. Nature 417, 399–403 (2002).
Venkatesan, K. et al. An empirical framework for binary interactome mapping. Nat. Methods 6, 83–90 (2009).
Stacey, R. G., Skinnider, M. A. & Foster, L. J. On the robustness of graph-based clustering to random network alterations. Mol. Cell. Proteomics 20, 100002 (2020).
McBride, Z. et al. A label-free mass spectrometry method to predict endogenous protein complex composition. Mol. Cell. Proteomics 18, 1588–1606 (2019).
Heusel, M. et al. Complex-centric proteome profiling by SEC-SWATH-MS. Mol. Syst. Biol. 15, e8438 (2019).
Salas, D., Stacey, R. G., Akinlaja, M. & Foster, L. J. Next-generation interactomics: considerations for the use of co-elution to measure protein interaction networks. Mol. Cell. Proteomics 19, 1–10 (2020).
Pang, C. N. I. et al. Analytical guidelines for co-fractionation mass spectrometry obtained through global profiling of gold standard Saccharomyces cerevisiae protein complexes. Mol. Cell. Proteomics 19, 1876–1895 (2020).
Gorka, M. et al. Protein Complex Identification and quantitative complexome by CN-PAGE. Sci. Rep. 9, 11523 (2019).
Mallam, A. L. et al. Systematic discovery of endogenous human ribonucleoprotein complexes. Cell Rep. 29, 1351–1368 (2019).
Drew, K. et al. A systematic, label-free method for identifying RNA-associated proteins in vivo provides insights into vertebrate ciliary beating machinery. Dev. Biol. 467, 108–117 (2020).
Bludau, I. et al. Systematic detection of functional proteoform groups from bottom–up proteomic datasets. Preprint at bioRxiv https://doi.org/10.1101/2020.12.22.423928 (2020).
Garzón, J. I. et al. A computational interactome and functional annotation for the human proteome. eLife 5, e18715 (2016).
Meyer, M. J. et al. Interactome INSIDER: a structural interactome browser for genomic studies. Nat. Methods 15, 107–114 (2018).
Cunningham, J. M., Koytiger, G., Sorger, P. K. & AlQuraishi, M. Biophysical prediction of protein–peptide interactions and signaling networks using machine learning. Nat. Methods 17, 175–183 (2020).
Hopf, T. A. et al. Sequence co-evolution gives 3D contacts and structures of protein complexes. eLife 3, e03430 (2014).
Wang, M., Herrmann, C. J., Simonovic, M., Szklarczyk, D. & von Mering, C. Version 4.0 of PaxDb: protein abundance data, integrated across model organisms, tissues, and cell-lines. Proteomics 15, 3163–3168 (2015).
Kovalchik, K. A. et al. RawTools: rapid and dynamic interrogation of Orbitrap data files for mass spectrometer system management. J. Proteome Res. 18, 700–708 (2019).
Bogdanow, B., Zauber, H. & Selbach, M. Systematic errors in peptide and protein identification and quantification by modified peptides. Mol. Cell. Proteomics 15, 2791–2801 (2016).
Alexa, A., Rahnenführer, J. & Lengauer, T. Improved scoring of functional groups from gene expression data by decorrelating GO graph structure. Bioinformatics 22, 1600–1607 (2006).
Falcon, S. & Gentleman, R. Using GOstats to test gene lists for GO term association. Bioinformatics 23, 257–258 (2007).
Sánchez-Taltavull, D., Ramachandran, P., Lau, N. & Perkins, T. J. Bayesian correlation analysis for sequence count data. PLoS ONE 11, e0163595 (2016).
Huynh-Thu, V. A., Irrthum, A., Wehenkel, L. & Geurts, P. Inferring regulatory networks from expression data using tree-based methods. PLoS ONE 5, e12776 (2010).
Székely, G. J., Rizzo, M. L. & Bakirov, N. K. Measuring and testing dependence by correlation of distances. Ann. Stat. 35, 2769–2794 (2007).
Simon, N. & Tibshirani, R. Comment on “Detecting novel associations in large data sets” by Reshef et al., Science Dec. 16, 2011. Preprint at https://arxiv.org/abs/1401.7645 (2014).
Kinney, J. B. & Atwal, G. S. Equitability, mutual information, and the maximal information coefficient. Proc. Natl Acad. Sci. USA 111, 3354–3359 (2014).
Dixon, P. VEGAN, a package of R functions for community ecology. J. Veg. Sci. 14, 927–930 (2003).
Foroushani, A. et al. Large-scale gene network analysis reveals the significance of extracellular matrix pathway and homeobox genes in acute myeloid leukemia: an introduction to the Pigengene package and its applications. BMC Med. Genomics 10, 16 (2017).
Brunner, E. & Munzel, U. The nonparametric Behrens–Fisher problem: asymptotic theory and a small-sample approximation. Biomed. J. 42, 17–25 (2000).
Munzel, U. & Brunner, E. An exact paired rank test. Biomed. J. 44, 584–593 (2002).
Skinnider, M. A., Cai, C., Stacey, R. G. & Foster, L. J. PrInCE: an R/Bioconductor package for protein–protein interaction network inference from co-fractionation mass spectrometry data. Bioinformatics https://doi.org/10.1093/bioinformatics/btab022 (2021).
Larance, M. et al. Global membrane protein interactome analysis using in vivo crosslinking and mass spectrometry-based protein correlation profiling. Mol. Cell. Proteomics 15, 2476–2490 (2016).
Crozier, T. W. M., Tinti, M., Larance, M., Lamond, A. I. & Ferguson, M. A. J. Prediction of protein complexes in Trypanosoma brucei by protein correlation profiling mass spectrometry and machine learning. Mol. Cell. Proteomics 16, 2254–2267 (2017).
Hillier, C. et al. Landscape of the Plasmodium interactome reveals both conserved and species-specific functionality. Cell Rep. 28, 1635–1647 (2019).
Kerr, C. H. et al. Dynamic rewiring of the human interactome by interferon signaling. Genome Biol. 21, 140 (2020).
Liebeskind, B. J., Aldrich, R. W. & Marcotte, E. M. Ancestral reconstruction of protein interaction networks. PLoS Comput. Biol. 15, e1007396 (2019).
Skinnider, M. A., Stacey, R. G. & Foster, L. J. Genomic data integration systematically biases interactome mapping. PLoS Comput. Biol. 14, e1006474 (2018).
Carlson, M. L. et al. Profiling the Escherichia coli membrane protein interactome captured in Peptidisc libraries. eLife 8, e46615 (2019).
Oliver, S. Guilt-by-association goes global. Nature 403, 601–603 (2000).
Schwikowski, B., Uetz, P. & Fields, S. A network of protein–protein interactions in yeast. Nat. Biotechnol. 18, 1257–1261 (2000).
Huerta-Cepas, J. et al. Fast genome-wide functional annotation through orthology assignment by eggNOG-Mapper. Mol. Biol. Evol. 34, 2115–2122 (2017).
Kumar, S., Stecher, G., Suleski, M. & Hedges, S. B. TimeTree: a resource for timelines, timetrees, and divergence times. Mol. Biol. Evol. 34, 1812–1819 (2017).
Perez-Riverol, Y. et al. The PRIDE database and related tools and resources in 2019: improving support for quantification data. Nucleic Acids Res. 47, D442–D450 (2019).
Acknowledgements
This work was supported by funding from Genome Canada and Genome British Columbia (project 214PRO). M.A.S. acknowledges support from a CIHR Vanier Canada Graduate Scholarship, an Izaak Walton Killam Memorial Pre-Doctoral Fellowship, a University of British Columbia (UBC) Four Year Fellowship and a Vancouver Coastal Health–CIHR–UBC MD/PhD Studentship. This work was enabled in part by the support provided by WestGrid and Compute Canada and through computational resources and services provided by Advanced Research Computing at the UBC. We thank Microsoft for the donation of cloud computing resources that enabled part of this work, T. Clark and J. Moon for advice on MaxQuant searches and D. Vavilov for assistance with the web server.
Author information
Authors and Affiliations
Contributions
M.A.S. and L.J.F. designed experiments. M.A.S. performed experiments. M.A.S. and L.J.F. wrote the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Peer review information Nature Methods thanks Fridtjof Lund-Johansen and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Arunima Singh was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 A uniformly processed resource of CF–MS data.
a, Approaches to protein quantification employed by published CF–MS experiments. SILAC, stable isotope labelling by amino acids in cell culture; iBAQ, intensity-based absolute quantification. b, Proportion of the organismal proteome quantified in each CF–MS experiment (grey lines, individual datasets; blue line, mean across all datasets). c, Cumulative distribution of the number of proteins quantified per dataset in between one and 25 fractions. d, GO term enrichment among CORUM proteins detected in at least one CF–MS fraction, left, or never detected, right. e, PaxDb consensus protein abundance of mouse proteins detected or never detected by CF–MS. f, Coverage of high, moderate, and low abundance proteins (expressed as a mean proportion of fractions in which these proteins were detected) in published human CF–MS experiments (n = 46). g, As in f, but with CF–MS experiments divided into three groups based on the length of the liquid chromatography gradient. ***, p < 0.001, two-sided Spearman rank correlation. h, PaxDb consensus protein abundance of human proteins in the CORUM database and non-CORUM proteins. i, Difference in the number of protein groups quantified in each CF–MS experiment, compared to the processed chromatogram data accompanying the original publications (grey lines, individual datasets; blue line, mean across all datasets).
Extended Data Fig. 2 Benchmarking computational analysis of individual CF–MS datasets.
a, Measures of association used to quantify the similarity of two protein chromatograms in published CF–MS studies. Bottom row indicates the incorporation of external genomic datasets77. b, Ranks of each measure of association in identifying protein pairs in the same protein complex, left, or annotated to the same GO term, right, across individual CF–MS datasets. c, Number of peaks detected in 20 CF–MS datasets by fitting a mixture of Gaussians to each protein chromatogram. d, Recovery of known protein complexes in the 20 CF–MS datasets from c, scoring only chromatograms that could be fit with a mixture of Gaussians (r2 ≥ 0.5) and comparing the 24 different measures of association shown in Fig. 2 with the co-apex score. Inset text shows the median AUC for each measure of association. e, As in d, but for proteins annotated to the same GO term. f, Recovery of known protein complexes, top, and proportion of originally quantified proteins, bottom, when filtering profiles not detected in some minimum number of fractions, using mutual information as a measure of profile similarity. g, Mean number of protein groups identified, top, and recovery of proteins annotated to the same GO term, bottom, for three approaches to label-free quantification implemented in MaxQuant.
Extended Data Fig. 3 Univariate statistical analysis of computational approaches to individual CF-MS datasets.
a, Difference in the median protein complex AUC between each pair of measures of association. Asterisks indicate pairs of measures of association with a p-value less than 0.05 in a two-sided Brunner-Munzel test. The difference in median AUCs is capped at [–0.1, +0.1] to improve visualization. b, As in a, but for GO terms. c, Difference in the median protein complex AUC between each pair of missing value handling strategies. Asterisks indicate pairs of measures of association with a p-value less than 0.05 in a two-sided Brunner-Munzel test. d, As in c, but for GO terms. e, Difference in the median protein complex AUC between each pair of chromatogram normalization approaches. Asterisks indicate pairs of measures of association with a p-value less than 0.05 in a two-sided Brunner-Munzel test. f, As in e, but for GO terms. g, Difference in the median protein complex AUC between each pair of measures of association, considering only the single best combination of missing value handling and chromatogram normalization for each measure of association. Asterisks indicate pairs of measures of association with a p-value less than 0.05 in a two-sided Brunner-Munzel test. h, As in g, but for GO terms. i, Median difference in the protein complex AUC between matched datasets with label-free protein quantification performed by one of three algorithms within MaxQuant. Asterisks indicate pairs of measures of association with a p-value less than 0.05 in a two-sided paired Brunner-Munzel test. j, As in i, but for GO terms.
Extended Data Fig. 4 Analysis pipelines for individual CF–MS datasets.
a, Recovery of known protein complexes with 163 valid combinations of measures of association, missing value handling, and normalization strategies. b, As in a, but for proteins annotated to the same GO term.
Extended Data Fig. 5 Downsampling analysis of published CF–MS experiments.
a, Recovery of proteins annotated to the same GO term after downsampling CF–MS chromatograms to a fixed number of fractions. b–c, Recovery of protein complexes, b, and GO terms, c, in downsampled CF–MS chromatograms, using mutual information as the measure of profile similarity. d–e, Recovery of protein complexes (AUC, d, and change in AUC, e), in downsampled CF–MS chromatograms, with experiments divided based on the separation method used (IEX, ion exchange chromatography; N-PAGE, native polyacrylamide gel electrophoresis; SEC, size exclusion chromatography). f, Recovery of protein complexes when downsampling windows of adjacent fractions of fixed length, rather than downsampling fractions randomly from the chromatogram matrix. g–h, Comparison of protein complex recovery (AUC, g, and change in AUC, h), in downsampled CF–MS chromatograms when drawing samples of random fractions or adjacent fractions from the chromatogram matrix. a–h, Shaded area shows the standard error.
Extended Data Fig. 6 Downsampling analysis of published CF–MS experiments incorporating multiple biological replicates.
a–b, Recovery of known protein complexes, a, and proteins annotated to the same GO term, b, in downsampled CF–MS chromatograms with fractions sampled from one to five replicates, as shown in Fig. 3b but visualized here as a line graph instead. Error bars show the standard error of the mean. c–d, Recovery of known protein complexes, c, and proteins annotated to the same GO term, d, in downsampled CF–MS chromatograms with fractions sampled from one to five replicates, within individual CF–MS datasets.
Extended Data Fig. 7 Protein quantification and chromatographic separation.
a, Comparison of GO term recovery and proteome coverage between SILAC ratios and iBAQ intensities from individual isotopologue channels in 20 SILAC datasets. Caption and error bars show the mean and standard deviation of the differences in the number of protein groups quantified and the AUC between SILAC ratios and iBAQ intensities. b, Recovery of proteins annotated to the same GO term in CF–MS experiments grouped by fractionation method. c, Difference in the median protein complex AUC between each pair of fractionation methods. Asterisks indicate pairs of measures of association with a p-value less than 0.05 in a two-sided Brunner-Munzel test. d, Regression coefficients for fractionation methods in multivariable statistical analysis, estimated by a linear model fit to the protein complex AUC and including terms for measures of association, missing value handling strategies, approaches to chromatogram normalization, and interactions between them. e, As in c, but for GO terms. f, As in d, but for GO terms. g, Recovery of known protein complexes in published CF–MS experiments grouped by fraction method, with each measure of association shown separately. h, As in g, but for proteins annotated to the same GO term. i, Recovery of individual protein complexes in published CF–MS experiments grouped by fractionation method. j, Number of protein complexes with at least three subunits detected exclusively by one separation method, left, and resolved significantly better by one of the four separation methods, right, across 67 human and mouse CF–MS datasets. k, Examples of protein complexes resolved best by each of the four separation methods. Inset text shows the median AUC.
Extended Data Fig. 8 Machine learning workflows for the integration of multiple CF–MS replicates.
a, Schematic overview of cross-validation approaches for CF–MS data. b, Comparison of cross-validation by protein pairs or individual proteins in network inference from two to four CF–MS experiments using a naive Bayes classifier, with AUCs calculated in cross-validation or in an independent set of held-out protein complexes. c, Impact of feature selection on network inference from two to four CF–MS experiments, comparing between one and six top-performing features, an equivalent number of random features, or five features computed in PrInCE. d, Comparison of top-performing or random features in network inference from two to ten CF–MS experiments, using between one and ten top-performing features. e, Comparison of network inference with features calculated from concatenated matrices of two to four CF–MS experiments, or with features calculated from individual experiments. f, Comparison of network inference from two to four CF–MS experiments using a naive Bayes classifier before and after median imputation of missing values. g, Impact of the number of top-performing or random features provided as input on network inference from two to ten CF–MS experiments. h, Comparison of random forest and naive Bayes classifiers in network inference from two to ten CF–MS replicates, using between one and ten features. i, Network inference from human CF–MS data when integrating varying proportions of SEC and IEX experiments. The total number of CF–MS datasets is shown above the plots, and the number of SEC datasets is shown on the x-axis.
Extended Data Fig. 9 Synergistic and antagonistic feature combinations in network inference from CF–MS data.
a, Performance (AUC, left, and rank, right) of naive Bayes and random forest classifiers trained on 24 measures of association in network inference from combinations of between two and six CF–MS datasets. Each cell reflects the mean AUC from 10 random combinations of datasets. b, Summary of synergistic and antagonistic interactions between features in CF–MS network inference, as shown in detail in panels c–f. Fill reflects the number of times a synergistic (magenta) or antagonistic (cyan) interaction was detected between two features. Network inference was performed using all possible combinations of 24 measures of association from ten random combinations of three or six CF–MS datasets, using either a random forest or naive Bayes classifier. Rows and columns are arranged by the mean performance of individual features across all combinations shown in a (both classifiers, two to six datasets). c, Performance (AUC) of networks inferred from combinations of three CF–MS datasets using a random forest classifier. Rows and columns are arranged by the mean performance of individual features in the same scenario. Text highlights significantly synergistic (+) and antagonistic (–) interactions. Each cell shows the mean AUC from 10 random combinations of datasets. d, As in c, but using a naive Bayes classifier. e, As in c, but for networks inferred from combinations of six CF–MS datasets. f, As in c, but for networks inferred from combinations of six CF–MS datasets, using a naive Bayes classifier.
Extended Data Fig. 10 Saturation analysis of network inference from CF–MS data.
a, Saturation analysis of network inference from two to 40 CF–MS experiments, using variable numbers of top-performing features. Boxplots show n = 10 independent samples. b, Impact of downsampling training set complexes on network inference from two to four CF–MS replicates.
Supplementary information
Supplementary Information
Supplementary Figs. 1–3
Supplementary Table 1
Complete list of all CF-MS experiments, a, and raw mass spectrometry files, b, analyzed in this study.
Supplementary Table 2
ROC analysis of protein complexes, a, and GO term membership, b, within individual CF-MS datasets.
Supplementary Table 3
Univariate statistical analysis of computational approaches to individual CF-MS datasets. a,c, protein complexes; b,d, GO term membership.
Supplementary Table 4
Multivariate statistical analysis of computational approaches to individual CF-MS datasets. a, protein complexes; b, GO term membership.
Supplementary Table 5
Univariate and multivariate statistical analysis of methods for protein complex separation.
Rights and permissions
About this article
Cite this article
Skinnider, M.A., Foster, L.J. Meta-analysis defines principles for the design and analysis of co-fractionation mass spectrometry experiments. Nat Methods 18, 806–815 (2021). https://doi.org/10.1038/s41592-021-01194-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41592-021-01194-4
This article is cited by
-
DIP-MS: ultra-deep interaction proteomics for the deconvolution of protein complexes
Nature Methods (2024)
-
Tapioca: a platform for predicting de novo protein–protein interactions in dynamic contexts
Nature Methods (2024)
-
Deciphering protein interaction network dynamics with a machine learning-based framework
Nature Methods (2024)
-
Next-generation proteomics for quantitative Jumbophage-bacteria interaction mapping
Nature Communications (2023)
-
Mapping protein states and interactions across the tree of life with co-fractionation mass spectrometry
Nature Communications (2023)