Meta-analysis defines principles for the design and analysis of co-fractionation mass spectrometry experiments

Skinnider, Michael A.; Foster, Leonard J.

doi:10.1038/s41592-021-01194-4

Article
Published: 01 July 2021

Meta-analysis defines principles for the design and analysis of co-fractionation mass spectrometry experiments

Nature Methods volume 18, pages 806–815 (2021)Cite this article

4552 Accesses
30 Citations
12 Altmetric
Metrics details

Subjects

Abstract

Co-fractionation mass spectrometry (CF-MS) has emerged as a powerful technique for interactome mapping. However, there is little consensus on optimal strategies for the design of CF-MS experiments or their computational analysis. Here, we reanalyzed a total of 206 CF-MS experiments to generate a uniformly processed resource containing over 11 million measurements of protein abundance. We used this resource to benchmark experimental designs for CF-MS studies and systematically optimize computational approaches to network inference. We then applied this optimized methodology to reconstruct a draft-quality human interactome by CF-MS and predict over 700,000 protein–protein interactions across 27 eukaryotic species or clades. Our work defines new resources to illuminate proteome organization over evolutionary timescales and establishes best practices for the design and analysis of CF-MS studies.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: A comprehensive reanalysis of published CF-MS data.**

**Fig. 2: Benchmarking analysis of individual CF-MS experiments.**

**Fig. 3: Design of CF-MS experiments.**

**Fig. 4: Network inference from multiple CF-MS replicates.**

**Fig. 5: Meta-analysis defines a consensus human CF-MS interactome.**

**Fig. 6: CF-MS interactomes throughout the eukaryotic evolutionary tree.**

Mapping protein states and interactions across the tree of life with co-fractionation mass spectrometry

Article Open access 15 December 2023

Scalable multiplex co-fractionation/mass spectrometry platform for accelerated protein interactome discovery

Article Open access 13 July 2022

Systematic detection of functional proteoform groups from bottom-up proteomic datasets

Article Open access 21 June 2021

Data availability

A list of all raw mass spectrometry files analyzed in this study and their accession numbers in PRIDE or MassIVE repositories is provided in Supplementary Table 1. All data generated in this study are available at multiple levels of analysis from the following sources: protein chromatograms and protein–protein interaction networks for up to ten proteins can be visualized and downloaded via an interactive web application at http://cf-ms-browser.msl.ubc.ca; processed chromatograms and MaxQuant proteinGroups.txt files are available via Zenodo at https://doi.org/10.5281/zenodo.4499320; complete MaxQuant outputs for all 206 experiments were deposited to the PRIDE repository⁸³ with the dataset identifier PXD022048; predicted interactomes for 27 species and clades, including the consensus human CF-MS interactome, are available via Zenodo at https://doi.org/10.5281/zenodo.4245282. An overview of all publicly available resources generated in this study is provided at the supporting website (https://fosterlab.github.io/CF-MS-analysis).

Code availability

Source code used to download and reanalyze publicly available CF-MS data using MaxQuant is available at https://github.com/skinnider/CF-MS-searches (https://doi.org/10.5281/zenodo.4774750). Source code used to carry out analyses presented in the paper, with relevant intermediate data files, is available at https://github.com/skinnider/CF-MS-analysis (https://doi.org/10.5281/zenodo.4774754). Source code for the CF-MS browser web application is available at https://github.com/skinnider/CF-MS-browser (https://doi.org/10.5281/zenodo.4774752). The CFTK R package is available at https://github.com/fosterlab/CFTK (https://doi.org/10.5281/zenodo.4774771).

References

Rolland, T. et al. A proteome-scale map of the human interactome network. Cell 159, 1212–1226 (2014).
Article CAS PubMed PubMed Central Google Scholar
Luck, K. et al. A reference map of the human binary protein interactome. Nature 580, 402–408 (2020).
Article CAS PubMed PubMed Central Google Scholar
Huttlin, E. L. et al. The BioPlex network: a systematic exploration of the human interactome. Cell 162, 425–440 (2015).
Article CAS PubMed PubMed Central Google Scholar
Huttlin, E. L. et al. Architecture of the human interactome defines protein communities and disease networks. Nature 545, 505–509 (2017).
Article CAS PubMed PubMed Central Google Scholar
Hein, M. Y. et al. A human interactome in three quantitative dimensions organized by stoichiometries and abundances. Cell 163, 712–723 (2015).
Article CAS PubMed Google Scholar
Kim, Y., Jung, J. P., Pack, C.-G. & Huh, W.-K. Global analysis of protein homomerization in Saccharomyces cerevisiae. Genome Res. 29, 135–145 (2019).
Article CAS PubMed PubMed Central Google Scholar
Werner, J. N. et al. Quantitative genome-scale analysis of protein localization in an asymmetric bacterium. Proc. Natl Acad. Sci. USA 106, 7858–7863 (2009).
Article CAS PubMed PubMed Central Google Scholar
Kristensen, A. R., Gsponer, J. & Foster, L. J. A high-throughput approach for measuring temporal changes in the interactome. Nat. Methods 9, 907–909 (2012).
Article CAS PubMed PubMed Central Google Scholar
Havugimana, P. C. et al. A census of human soluble protein complexes. Cell 150, 1068–1081 (2012).
Article CAS PubMed PubMed Central Google Scholar
Wan, C. et al. Panorama of ancient metazoan macromolecular complexes. Nature 525, 339–344 (2015).
Article CAS PubMed PubMed Central Google Scholar
McWhite, C. D. et al. A pan-plant protein complex map reveals deep conservation and novel assemblies. Cell 181, 460–474 (2020).
Article CAS PubMed PubMed Central Google Scholar
Rosenberger, G. et al. SECAT: quantifying protein complex dynamics across cell states by network-centric analysis of SEC-SWATH-MS profiles. Cell Syst. https://doi.org/10.1016/j.cels.2020.11.006 (2020).
Fossati, A. et al. PCprophet: a framework for protein complex prediction and differential analysis using proteomic data. Nat. Methods https://doi.org/10.1038/s41592-021-01107-5 (2020).
Hu, L. Z. et al. EPIC: software toolkit for elution profile-based inference of protein complexes. Nat. Methods 16, 737–742 (2019).
Article CAS PubMed PubMed Central Google Scholar
Tyanova, S., Temu, T. & Cox, J. The MaxQuant computational platform for mass spectrometry-based shotgun proteomics. Nat. Protoc. 11, 2301–2319 (2016).
Article CAS PubMed Google Scholar
Giurgiu, M. et al. CORUM: the comprehensive resource of mammalian protein complexes—2019. Nucleic Acids Res. 47, D559–D563 (2019).
Article CAS PubMed Google Scholar
Skinnider, M. A. et al. An atlas of protein–protein interactions across mammalian tissues. Preprint at bioRxiv https://doi.org/10.1101/351247 (2018).
Jarzab, A. et al. Meltome atlas—thermal proteome stability across the tree of life. Nat. Methods 17, 495–503 (2020).
Article CAS PubMed Google Scholar
Ochoa, D. et al. The functional landscape of the human phosphoproteome. Nat. Biotechnol. 38, 365–373 (2020).
Article CAS PubMed Google Scholar
Kustatscher, G. et al. Co-regulation map of the human proteome enables identification of protein functions. Nat. Biotechnol. 37, 1361–1371 (2019).
Article CAS PubMed PubMed Central Google Scholar
Mellacheruvu, D. et al. The CRAPome: a contaminant repository for affinity purification–mass spectrometry data. Nat. Methods 10, 730–736 (2013).
Article CAS PubMed PubMed Central Google Scholar
Romanov, N. et al. Disentangling genetic and environmental effects on the proteotypes of individuals. Cell 177, 1308–1318 (2019).
Article CAS PubMed PubMed Central Google Scholar
Skinnider, M. A., Squair, J. W. & Foster, L. J. Evaluating measures of association for single-cell transcriptomics. Nat. Methods 16, 381–386 (2019).
Article CAS PubMed Google Scholar
Stacey, R. G., Skinnider, M. A., Scott, N. E. & Foster, L. J. A rapid and accurate approach for prediction of interactomes from co-elution data (PrInCE). BMC Bioinformatics 18, 457 (2017).
Article PubMed PubMed Central CAS Google Scholar
Bludau, I. et al. Complex-centric proteome profiling by SEC-SWATH-MS for the parallel detection of hundreds of protein complexes. Nat. Protoc. 15, 2341–2386 (2020).
Article CAS PubMed Google Scholar
Cox, J. et al. Accurate proteome-wide label-free quantification by delayed normalization and maximal peptide ratio extraction, termed MaxLFQ. Mol. Cell. Proteomics 13, 2513–2526 (2014).
Article CAS PubMed PubMed Central Google Scholar
Liu, H., Sadygov, R. G. & Yates, J. R. A model for random sampling and estimation of relative protein abundance in shotgun proteomics. Anal. Chem. 76, 4193–4201 (2004).
Article CAS PubMed Google Scholar
Al Shweiki, M. R. et al. Assessment of label-free quantification in discovery proteomics and impact of technological factors and natural variability of protein abundance. J. Proteome Res. 16, 1410–1424 (2017).
Article CAS PubMed Google Scholar
McIlwain, S. et al. Estimating relative abundances of proteins from shotgun proteomics data. BMC Bioinformatics 13, 308 (2012).
Article CAS PubMed PubMed Central Google Scholar
Scott, N. E., Brown, L. M., Kristensen, A. R. & Foster, L. J. Development of a computational framework for the analysis of protein correlation profiling and spatial proteomics experiments. J. Proteomics 118, 112–129 (2015).
Article CAS PubMed Google Scholar
Scott, N. E. et al. Interactome disassembly during apoptosis occurs independent of caspase cleavage. Mol. Syst. Biol. 13, 906 (2017).
Article PubMed PubMed Central CAS Google Scholar
Pourhaghighi, R. et al. BraInMap elucidates the macromolecular connectivity landscape of mammalian brain. Cell Syst. 10, 333–350 (2020).
Kastritis, P. L. et al. Capturing protein communities by structural proteomics in a thermophilic eukaryote. Mol. Syst. Biol. 13, 936 (2017).
Article PubMed PubMed Central CAS Google Scholar
Drew, K. et al. Integration of over 9,000 mass spectrometry experiments builds a global map of human protein complexes. Mol. Syst. Biol. 13, 932 (2017).
Article PubMed PubMed Central CAS Google Scholar
Drew, K., Wallingford, J. B. & Marcotte, E. M. hu.MAP 2.0: integration of over 15,000 proteomic experiments builds a global compendium of human multiprotein assemblies. Mol. Syst. Biol. 17, e10016 (2021).
Ballouz, S., Weber, M., Pavlidis, P. & Gillis, J. EGAD: ultra-fast functional analysis of gene networks. Bioinformatics 33, 612–614 (2017).
Article CAS PubMed Google Scholar
Lapek, J. D. et al. Detection of dysregulated protein-association networks by high-throughput proteomics predicts cancer vulnerabilities. Nat. Biotechnol. 35, 983–989 (2017).
Article CAS PubMed PubMed Central Google Scholar
Orre, L. M. et al. SubCellBarCode: proteome-wide mapping of protein localization and relocalization. Mol. Cell 73, 166–182 (2019).
Article CAS PubMed Google Scholar
Geladaki, A. et al. Combining LOPIT with differential ultracentrifugation for high-resolution spatial proteomics. Nat. Commun. 10, 331 (2019).
Article PubMed PubMed Central CAS Google Scholar
Cusick, M. E. et al. Literature-curated protein interaction datasets. Nat. Methods 6, 39–46 (2009).
Article CAS PubMed PubMed Central Google Scholar
Heide, H. et al. Complexome profiling identifies TMEM126B as a component of the mitochondrial complex I assembly complex. Cell Metab. 16, 538–549 (2012).
Article CAS PubMed Google Scholar
von Mering, C. et al. Comparative assessment of large-scale data sets of protein–protein interactions. Nature 417, 399–403 (2002).
Article CAS Google Scholar
Venkatesan, K. et al. An empirical framework for binary interactome mapping. Nat. Methods 6, 83–90 (2009).
Article CAS PubMed Google Scholar
Stacey, R. G., Skinnider, M. A. & Foster, L. J. On the robustness of graph-based clustering to random network alterations. Mol. Cell. Proteomics 20, 100002 (2020).
Article PubMed PubMed Central Google Scholar
McBride, Z. et al. A label-free mass spectrometry method to predict endogenous protein complex composition. Mol. Cell. Proteomics 18, 1588–1606 (2019).
Article CAS PubMed PubMed Central Google Scholar
Heusel, M. et al. Complex-centric proteome profiling by SEC-SWATH-MS. Mol. Syst. Biol. 15, e8438 (2019).
Article PubMed PubMed Central CAS Google Scholar
Salas, D., Stacey, R. G., Akinlaja, M. & Foster, L. J. Next-generation interactomics: considerations for the use of co-elution to measure protein interaction networks. Mol. Cell. Proteomics 19, 1–10 (2020).
Article CAS PubMed Google Scholar
Pang, C. N. I. et al. Analytical guidelines for co-fractionation mass spectrometry obtained through global profiling of gold standard Saccharomyces cerevisiae protein complexes. Mol. Cell. Proteomics 19, 1876–1895 (2020).
Article CAS PubMed PubMed Central Google Scholar
Gorka, M. et al. Protein Complex Identification and quantitative complexome by CN-PAGE. Sci. Rep. 9, 11523 (2019).
Article PubMed PubMed Central CAS Google Scholar
Mallam, A. L. et al. Systematic discovery of endogenous human ribonucleoprotein complexes. Cell Rep. 29, 1351–1368 (2019).
Article CAS PubMed PubMed Central Google Scholar
Drew, K. et al. A systematic, label-free method for identifying RNA-associated proteins in vivo provides insights into vertebrate ciliary beating machinery. Dev. Biol. 467, 108–117 (2020).
Article CAS PubMed PubMed Central Google Scholar
Bludau, I. et al. Systematic detection of functional proteoform groups from bottom–up proteomic datasets. Preprint at bioRxiv https://doi.org/10.1101/2020.12.22.423928 (2020).
Garzón, J. I. et al. A computational interactome and functional annotation for the human proteome. eLife 5, e18715 (2016).
Meyer, M. J. et al. Interactome INSIDER: a structural interactome browser for genomic studies. Nat. Methods 15, 107–114 (2018).
Article CAS PubMed PubMed Central Google Scholar
Cunningham, J. M., Koytiger, G., Sorger, P. K. & AlQuraishi, M. Biophysical prediction of protein–peptide interactions and signaling networks using machine learning. Nat. Methods 17, 175–183 (2020).
Article CAS PubMed PubMed Central Google Scholar
Hopf, T. A. et al. Sequence co-evolution gives 3D contacts and structures of protein complexes. eLife 3, e03430 (2014).
Wang, M., Herrmann, C. J., Simonovic, M., Szklarczyk, D. & von Mering, C. Version 4.0 of PaxDb: protein abundance data, integrated across model organisms, tissues, and cell-lines. Proteomics 15, 3163–3168 (2015).
Article CAS PubMed PubMed Central Google Scholar
Kovalchik, K. A. et al. RawTools: rapid and dynamic interrogation of Orbitrap data files for mass spectrometer system management. J. Proteome Res. 18, 700–708 (2019).
Article CAS PubMed Google Scholar
Bogdanow, B., Zauber, H. & Selbach, M. Systematic errors in peptide and protein identification and quantification by modified peptides. Mol. Cell. Proteomics 15, 2791–2801 (2016).
Article CAS PubMed PubMed Central Google Scholar
Alexa, A., Rahnenführer, J. & Lengauer, T. Improved scoring of functional groups from gene expression data by decorrelating GO graph structure. Bioinformatics 22, 1600–1607 (2006).
Article CAS PubMed Google Scholar
Falcon, S. & Gentleman, R. Using GOstats to test gene lists for GO term association. Bioinformatics 23, 257–258 (2007).
Article CAS PubMed Google Scholar
Sánchez-Taltavull, D., Ramachandran, P., Lau, N. & Perkins, T. J. Bayesian correlation analysis for sequence count data. PLoS ONE 11, e0163595 (2016).
Article PubMed PubMed Central CAS Google Scholar
Huynh-Thu, V. A., Irrthum, A., Wehenkel, L. & Geurts, P. Inferring regulatory networks from expression data using tree-based methods. PLoS ONE 5, e12776 (2010).
Székely, G. J., Rizzo, M. L. & Bakirov, N. K. Measuring and testing dependence by correlation of distances. Ann. Stat. 35, 2769–2794 (2007).
Article Google Scholar
Simon, N. & Tibshirani, R. Comment on “Detecting novel associations in large data sets” by Reshef et al., Science Dec. 16, 2011. Preprint at https://arxiv.org/abs/1401.7645 (2014).
Kinney, J. B. & Atwal, G. S. Equitability, mutual information, and the maximal information coefficient. Proc. Natl Acad. Sci. USA 111, 3354–3359 (2014).
Article CAS PubMed PubMed Central Google Scholar
Dixon, P. VEGAN, a package of R functions for community ecology. J. Veg. Sci. 14, 927–930 (2003).
Article Google Scholar
Foroushani, A. et al. Large-scale gene network analysis reveals the significance of extracellular matrix pathway and homeobox genes in acute myeloid leukemia: an introduction to the Pigengene package and its applications. BMC Med. Genomics 10, 16 (2017).
Article PubMed PubMed Central CAS Google Scholar
Brunner, E. & Munzel, U. The nonparametric Behrens–Fisher problem: asymptotic theory and a small-sample approximation. Biomed. J. 42, 17–25 (2000).
Google Scholar
Munzel, U. & Brunner, E. An exact paired rank test. Biomed. J. 44, 584–593 (2002).
Google Scholar
Skinnider, M. A., Cai, C., Stacey, R. G. & Foster, L. J. PrInCE: an R/Bioconductor package for protein–protein interaction network inference from co-fractionation mass spectrometry data. Bioinformatics https://doi.org/10.1093/bioinformatics/btab022 (2021).
Larance, M. et al. Global membrane protein interactome analysis using in vivo crosslinking and mass spectrometry-based protein correlation profiling. Mol. Cell. Proteomics 15, 2476–2490 (2016).
Article CAS PubMed PubMed Central Google Scholar
Crozier, T. W. M., Tinti, M., Larance, M., Lamond, A. I. & Ferguson, M. A. J. Prediction of protein complexes in Trypanosoma brucei by protein correlation profiling mass spectrometry and machine learning. Mol. Cell. Proteomics 16, 2254–2267 (2017).
Article CAS PubMed PubMed Central Google Scholar
Hillier, C. et al. Landscape of the Plasmodium interactome reveals both conserved and species-specific functionality. Cell Rep. 28, 1635–1647 (2019).
Article CAS PubMed PubMed Central Google Scholar
Kerr, C. H. et al. Dynamic rewiring of the human interactome by interferon signaling. Genome Biol. 21, 140 (2020).
Article CAS PubMed PubMed Central Google Scholar
Liebeskind, B. J., Aldrich, R. W. & Marcotte, E. M. Ancestral reconstruction of protein interaction networks. PLoS Comput. Biol. 15, e1007396 (2019).
Article CAS PubMed PubMed Central Google Scholar
Skinnider, M. A., Stacey, R. G. & Foster, L. J. Genomic data integration systematically biases interactome mapping. PLoS Comput. Biol. 14, e1006474 (2018).
Article PubMed PubMed Central CAS Google Scholar
Carlson, M. L. et al. Profiling the Escherichia coli membrane protein interactome captured in Peptidisc libraries. eLife 8, e46615 (2019).
Oliver, S. Guilt-by-association goes global. Nature 403, 601–603 (2000).
Article CAS PubMed Google Scholar
Schwikowski, B., Uetz, P. & Fields, S. A network of protein–protein interactions in yeast. Nat. Biotechnol. 18, 1257–1261 (2000).
Article CAS PubMed Google Scholar
Huerta-Cepas, J. et al. Fast genome-wide functional annotation through orthology assignment by eggNOG-Mapper. Mol. Biol. Evol. 34, 2115–2122 (2017).
Article CAS PubMed PubMed Central Google Scholar
Kumar, S., Stecher, G., Suleski, M. & Hedges, S. B. TimeTree: a resource for timelines, timetrees, and divergence times. Mol. Biol. Evol. 34, 1812–1819 (2017).
Article CAS PubMed Google Scholar
Perez-Riverol, Y. et al. The PRIDE database and related tools and resources in 2019: improving support for quantification data. Nucleic Acids Res. 47, D442–D450 (2019).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

This work was supported by funding from Genome Canada and Genome British Columbia (project 214PRO). M.A.S. acknowledges support from a CIHR Vanier Canada Graduate Scholarship, an Izaak Walton Killam Memorial Pre-Doctoral Fellowship, a University of British Columbia (UBC) Four Year Fellowship and a Vancouver Coastal Health–CIHR–UBC MD/PhD Studentship. This work was enabled in part by the support provided by WestGrid and Compute Canada and through computational resources and services provided by Advanced Research Computing at the UBC. We thank Microsoft for the donation of cloud computing resources that enabled part of this work, T. Clark and J. Moon for advice on MaxQuant searches and D. Vavilov for assistance with the web server.

Author information

Authors and Affiliations

Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia, Canada
Michael A. Skinnider & Leonard J. Foster
Department of Biochemistry and Molecular Biology, University of British Columbia, Vancouver, British Columbia, Canada
Leonard J. Foster

Authors

Michael A. Skinnider
View author publications
You can also search for this author in PubMed Google Scholar
Leonard J. Foster
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

M.A.S. and L.J.F. designed experiments. M.A.S. performed experiments. M.A.S. and L.J.F. wrote the manuscript.

Corresponding author

Correspondence to Leonard J. Foster.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Methods thanks Fridtjof Lund-Johansen and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Arunima Singh was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 A uniformly processed resource of CF–MS data.

a, Approaches to protein quantification employed by published CF–MS experiments. SILAC, stable isotope labelling by amino acids in cell culture; iBAQ, intensity-based absolute quantification. b, Proportion of the organismal proteome quantified in each CF–MS experiment (grey lines, individual datasets; blue line, mean across all datasets). c, Cumulative distribution of the number of proteins quantified per dataset in between one and 25 fractions. d, GO term enrichment among CORUM proteins detected in at least one CF–MS fraction, left, or never detected, right. e, PaxDb consensus protein abundance of mouse proteins detected or never detected by CF–MS. f, Coverage of high, moderate, and low abundance proteins (expressed as a mean proportion of fractions in which these proteins were detected) in published human CF–MS experiments (n = 46). g, As in f, but with CF–MS experiments divided into three groups based on the length of the liquid chromatography gradient. ***, p < 0.001, two-sided Spearman rank correlation. h, PaxDb consensus protein abundance of human proteins in the CORUM database and non-CORUM proteins. i, Difference in the number of protein groups quantified in each CF–MS experiment, compared to the processed chromatogram data accompanying the original publications (grey lines, individual datasets; blue line, mean across all datasets).

Extended Data Fig. 2 Benchmarking computational analysis of individual CF–MS datasets.

a, Measures of association used to quantify the similarity of two protein chromatograms in published CF–MS studies. Bottom row indicates the incorporation of external genomic datasets⁷⁷. b, Ranks of each measure of association in identifying protein pairs in the same protein complex, left, or annotated to the same GO term, right, across individual CF–MS datasets. c, Number of peaks detected in 20 CF–MS datasets by fitting a mixture of Gaussians to each protein chromatogram. d, Recovery of known protein complexes in the 20 CF–MS datasets from c, scoring only chromatograms that could be fit with a mixture of Gaussians (r² ≥ 0.5) and comparing the 24 different measures of association shown in Fig. 2 with the co-apex score. Inset text shows the median AUC for each measure of association. e, As in d, but for proteins annotated to the same GO term. f, Recovery of known protein complexes, top, and proportion of originally quantified proteins, bottom, when filtering profiles not detected in some minimum number of fractions, using mutual information as a measure of profile similarity. g, Mean number of protein groups identified, top, and recovery of proteins annotated to the same GO term, bottom, for three approaches to label-free quantification implemented in MaxQuant.

Extended Data Fig. 3 Univariate statistical analysis of computational approaches to individual CF-MS datasets.

a, Difference in the median protein complex AUC between each pair of measures of association. Asterisks indicate pairs of measures of association with a p-value less than 0.05 in a two-sided Brunner-Munzel test. The difference in median AUCs is capped at [–0.1, +0.1] to improve visualization. b, As in a, but for GO terms. c, Difference in the median protein complex AUC between each pair of missing value handling strategies. Asterisks indicate pairs of measures of association with a p-value less than 0.05 in a two-sided Brunner-Munzel test. d, As in c, but for GO terms. e, Difference in the median protein complex AUC between each pair of chromatogram normalization approaches. Asterisks indicate pairs of measures of association with a p-value less than 0.05 in a two-sided Brunner-Munzel test. f, As in e, but for GO terms. g, Difference in the median protein complex AUC between each pair of measures of association, considering only the single best combination of missing value handling and chromatogram normalization for each measure of association. Asterisks indicate pairs of measures of association with a p-value less than 0.05 in a two-sided Brunner-Munzel test. h, As in g, but for GO terms. i, Median difference in the protein complex AUC between matched datasets with label-free protein quantification performed by one of three algorithms within MaxQuant. Asterisks indicate pairs of measures of association with a p-value less than 0.05 in a two-sided paired Brunner-Munzel test. j, As in i, but for GO terms.

Extended Data Fig. 4 Analysis pipelines for individual CF–MS datasets.

a, Recovery of known protein complexes with 163 valid combinations of measures of association, missing value handling, and normalization strategies. b, As in a, but for proteins annotated to the same GO term.

Extended Data Fig. 5 Downsampling analysis of published CF–MS experiments.

a, Recovery of proteins annotated to the same GO term after downsampling CF–MS chromatograms to a fixed number of fractions. b–c, Recovery of protein complexes, b, and GO terms, c, in downsampled CF–MS chromatograms, using mutual information as the measure of profile similarity. d–e, Recovery of protein complexes (AUC, d, and change in AUC, e), in downsampled CF–MS chromatograms, with experiments divided based on the separation method used (IEX, ion exchange chromatography; N-PAGE, native polyacrylamide gel electrophoresis; SEC, size exclusion chromatography). f, Recovery of protein complexes when downsampling windows of adjacent fractions of fixed length, rather than downsampling fractions randomly from the chromatogram matrix. g–h, Comparison of protein complex recovery (AUC, g, and change in AUC, h), in downsampled CF–MS chromatograms when drawing samples of random fractions or adjacent fractions from the chromatogram matrix. a–h, Shaded area shows the standard error.

Extended Data Fig. 6 Downsampling analysis of published CF–MS experiments incorporating multiple biological replicates.

a–b, Recovery of known protein complexes, a, and proteins annotated to the same GO term, b, in downsampled CF–MS chromatograms with fractions sampled from one to five replicates, as shown in Fig. 3b but visualized here as a line graph instead. Error bars show the standard error of the mean. c–d, Recovery of known protein complexes, c, and proteins annotated to the same GO term, d, in downsampled CF–MS chromatograms with fractions sampled from one to five replicates, within individual CF–MS datasets.

Extended Data Fig. 7 Protein quantification and chromatographic separation.

a, Comparison of GO term recovery and proteome coverage between SILAC ratios and iBAQ intensities from individual isotopologue channels in 20 SILAC datasets. Caption and error bars show the mean and standard deviation of the differences in the number of protein groups quantified and the AUC between SILAC ratios and iBAQ intensities. b, Recovery of proteins annotated to the same GO term in CF–MS experiments grouped by fractionation method. c, Difference in the median protein complex AUC between each pair of fractionation methods. Asterisks indicate pairs of measures of association with a p-value less than 0.05 in a two-sided Brunner-Munzel test. d, Regression coefficients for fractionation methods in multivariable statistical analysis, estimated by a linear model fit to the protein complex AUC and including terms for measures of association, missing value handling strategies, approaches to chromatogram normalization, and interactions between them. e, As in c, but for GO terms. f, As in d, but for GO terms. g, Recovery of known protein complexes in published CF–MS experiments grouped by fraction method, with each measure of association shown separately. h, As in g, but for proteins annotated to the same GO term. i, Recovery of individual protein complexes in published CF–MS experiments grouped by fractionation method. j, Number of protein complexes with at least three subunits detected exclusively by one separation method, left, and resolved significantly better by one of the four separation methods, right, across 67 human and mouse CF–MS datasets. k, Examples of protein complexes resolved best by each of the four separation methods. Inset text shows the median AUC.

Extended Data Fig. 8 Machine learning workflows for the integration of multiple CF–MS replicates.

a, Schematic overview of cross-validation approaches for CF–MS data. b, Comparison of cross-validation by protein pairs or individual proteins in network inference from two to four CF–MS experiments using a naive Bayes classifier, with AUCs calculated in cross-validation or in an independent set of held-out protein complexes. c, Impact of feature selection on network inference from two to four CF–MS experiments, comparing between one and six top-performing features, an equivalent number of random features, or five features computed in PrInCE. d, Comparison of top-performing or random features in network inference from two to ten CF–MS experiments, using between one and ten top-performing features. e, Comparison of network inference with features calculated from concatenated matrices of two to four CF–MS experiments, or with features calculated from individual experiments. f, Comparison of network inference from two to four CF–MS experiments using a naive Bayes classifier before and after median imputation of missing values. g, Impact of the number of top-performing or random features provided as input on network inference from two to ten CF–MS experiments. h, Comparison of random forest and naive Bayes classifiers in network inference from two to ten CF–MS replicates, using between one and ten features. i, Network inference from human CF–MS data when integrating varying proportions of SEC and IEX experiments. The total number of CF–MS datasets is shown above the plots, and the number of SEC datasets is shown on the x-axis.

Extended Data Fig. 9 Synergistic and antagonistic feature combinations in network inference from CF–MS data.

a, Performance (AUC, left, and rank, right) of naive Bayes and random forest classifiers trained on 24 measures of association in network inference from combinations of between two and six CF–MS datasets. Each cell reflects the mean AUC from 10 random combinations of datasets. b, Summary of synergistic and antagonistic interactions between features in CF–MS network inference, as shown in detail in panels c–f. Fill reflects the number of times a synergistic (magenta) or antagonistic (cyan) interaction was detected between two features. Network inference was performed using all possible combinations of 24 measures of association from ten random combinations of three or six CF–MS datasets, using either a random forest or naive Bayes classifier. Rows and columns are arranged by the mean performance of individual features across all combinations shown in a (both classifiers, two to six datasets). c, Performance (AUC) of networks inferred from combinations of three CF–MS datasets using a random forest classifier. Rows and columns are arranged by the mean performance of individual features in the same scenario. Text highlights significantly synergistic (+) and antagonistic (–) interactions. Each cell shows the mean AUC from 10 random combinations of datasets. d, As in c, but using a naive Bayes classifier. e, As in c, but for networks inferred from combinations of six CF–MS datasets. f, As in c, but for networks inferred from combinations of six CF–MS datasets, using a naive Bayes classifier.

Extended Data Fig. 10 Saturation analysis of network inference from CF–MS data.

a, Saturation analysis of network inference from two to 40 CF–MS experiments, using variable numbers of top-performing features. Boxplots show n = 10 independent samples. b, Impact of downsampling training set complexes on network inference from two to four CF–MS replicates.

Supplementary information

Supplementary Information

Supplementary Figs. 1–3

Reporting Summary

Supplementary Table 1

Complete list of all CF-MS experiments, a, and raw mass spectrometry files, b, analyzed in this study.

Supplementary Table 2

ROC analysis of protein complexes, a, and GO term membership, b, within individual CF-MS datasets.

Supplementary Table 3

Univariate statistical analysis of computational approaches to individual CF-MS datasets. a,c, protein complexes; b,d, GO term membership.

Supplementary Table 4

Multivariate statistical analysis of computational approaches to individual CF-MS datasets. a, protein complexes; b, GO term membership.

Supplementary Table 5

Univariate and multivariate statistical analysis of methods for protein complex separation.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Skinnider, M.A., Foster, L.J. Meta-analysis defines principles for the design and analysis of co-fractionation mass spectrometry experiments. Nat Methods 18, 806–815 (2021). https://doi.org/10.1038/s41592-021-01194-4

Download citation

Received: 20 November 2020
Accepted: 20 May 2021
Published: 01 July 2021
Issue Date: July 2021
DOI: https://doi.org/10.1038/s41592-021-01194-4

This article is cited by

DIP-MS: ultra-deep interaction proteomics for the deconvolution of protein complexes
- Fabian Frommelt
- Andrea Fossati
- Matthias Gstaiger
Nature Methods (2024)
Tapioca: a platform for predicting de novo protein–protein interactions in dynamic contexts
- Tavis. J. Reed
- Matthew. D. Tyl
- Ileana. M. Cristea
Nature Methods (2024)
Deciphering protein interaction network dynamics with a machine learning-based framework

Nature Methods (2024)
Next-generation proteomics for quantitative Jumbophage-bacteria interaction mapping
- Andrea Fossati
- Deepto Mozumdar
- Danielle L. Swaney
Nature Communications (2023)
Mapping protein states and interactions across the tree of life with co-fractionation mass spectrometry
- Michael A. Skinnider
- Mopelola O. Akinlaja
- Leonard J. Foster
Nature Communications (2023)