Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Meta-analysis defines principles for the design and analysis of co-fractionation mass spectrometry experiments

Abstract

Co-fractionation mass spectrometry (CF-MS) has emerged as a powerful technique for interactome mapping. However, there is little consensus on optimal strategies for the design of CF-MS experiments or their computational analysis. Here, we reanalyzed a total of 206 CF-MS experiments to generate a uniformly processed resource containing over 11 million measurements of protein abundance. We used this resource to benchmark experimental designs for CF-MS studies and systematically optimize computational approaches to network inference. We then applied this optimized methodology to reconstruct a draft-quality human interactome by CF-MS and predict over 700,000 protein–protein interactions across 27 eukaryotic species or clades. Our work defines new resources to illuminate proteome organization over evolutionary timescales and establishes best practices for the design and analysis of CF-MS studies.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: A comprehensive reanalysis of published CF-MS data.
Fig. 2: Benchmarking analysis of individual CF-MS experiments.
Fig. 3: Design of CF-MS experiments.
Fig. 4: Network inference from multiple CF-MS replicates.
Fig. 5: Meta-analysis defines a consensus human CF-MS interactome.
Fig. 6: CF-MS interactomes throughout the eukaryotic evolutionary tree.

Similar content being viewed by others

Data availability

A list of all raw mass spectrometry files analyzed in this study and their accession numbers in PRIDE or MassIVE repositories is provided in Supplementary Table 1. All data generated in this study are available at multiple levels of analysis from the following sources: protein chromatograms and protein–protein interaction networks for up to ten proteins can be visualized and downloaded via an interactive web application at http://cf-ms-browser.msl.ubc.ca; processed chromatograms and MaxQuant proteinGroups.txt files are available via Zenodo at https://doi.org/10.5281/zenodo.4499320; complete MaxQuant outputs for all 206 experiments were deposited to the PRIDE repository83 with the dataset identifier PXD022048; predicted interactomes for 27 species and clades, including the consensus human CF-MS interactome, are available via Zenodo at https://doi.org/10.5281/zenodo.4245282. An overview of all publicly available resources generated in this study is provided at the supporting website (https://fosterlab.github.io/CF-MS-analysis).

Code availability

Source code used to download and reanalyze publicly available CF-MS data using MaxQuant is available at https://github.com/skinnider/CF-MS-searches (https://doi.org/10.5281/zenodo.4774750). Source code used to carry out analyses presented in the paper, with relevant intermediate data files, is available at https://github.com/skinnider/CF-MS-analysis (https://doi.org/10.5281/zenodo.4774754). Source code for the CF-MS browser web application is available at https://github.com/skinnider/CF-MS-browser (https://doi.org/10.5281/zenodo.4774752). The CFTK R package is available at https://github.com/fosterlab/CFTK (https://doi.org/10.5281/zenodo.4774771).

References

  1. Rolland, T. et al. A proteome-scale map of the human interactome network. Cell 159, 1212–1226 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Luck, K. et al. A reference map of the human binary protein interactome. Nature 580, 402–408 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Huttlin, E. L. et al. The BioPlex network: a systematic exploration of the human interactome. Cell 162, 425–440 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Huttlin, E. L. et al. Architecture of the human interactome defines protein communities and disease networks. Nature 545, 505–509 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Hein, M. Y. et al. A human interactome in three quantitative dimensions organized by stoichiometries and abundances. Cell 163, 712–723 (2015).

    Article  CAS  PubMed  Google Scholar 

  6. Kim, Y., Jung, J. P., Pack, C.-G. & Huh, W.-K. Global analysis of protein homomerization in Saccharomyces cerevisiae. Genome Res. 29, 135–145 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Werner, J. N. et al. Quantitative genome-scale analysis of protein localization in an asymmetric bacterium. Proc. Natl Acad. Sci. USA 106, 7858–7863 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Kristensen, A. R., Gsponer, J. & Foster, L. J. A high-throughput approach for measuring temporal changes in the interactome. Nat. Methods 9, 907–909 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Havugimana, P. C. et al. A census of human soluble protein complexes. Cell 150, 1068–1081 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Wan, C. et al. Panorama of ancient metazoan macromolecular complexes. Nature 525, 339–344 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. McWhite, C. D. et al. A pan-plant protein complex map reveals deep conservation and novel assemblies. Cell 181, 460–474 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Rosenberger, G. et al. SECAT: quantifying protein complex dynamics across cell states by network-centric analysis of SEC-SWATH-MS profiles. Cell Syst. https://doi.org/10.1016/j.cels.2020.11.006 (2020).

  13. Fossati, A. et al. PCprophet: a framework for protein complex prediction and differential analysis using proteomic data. Nat. Methods https://doi.org/10.1038/s41592-021-01107-5 (2020).

  14. Hu, L. Z. et al. EPIC: software toolkit for elution profile-based inference of protein complexes. Nat. Methods 16, 737–742 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Tyanova, S., Temu, T. & Cox, J. The MaxQuant computational platform for mass spectrometry-based shotgun proteomics. Nat. Protoc. 11, 2301–2319 (2016).

    Article  CAS  PubMed  Google Scholar 

  16. Giurgiu, M. et al. CORUM: the comprehensive resource of mammalian protein complexes—2019. Nucleic Acids Res. 47, D559–D563 (2019).

    Article  CAS  PubMed  Google Scholar 

  17. Skinnider, M. A. et al. An atlas of protein–protein interactions across mammalian tissues. Preprint at bioRxiv https://doi.org/10.1101/351247 (2018).

  18. Jarzab, A. et al. Meltome atlas—thermal proteome stability across the tree of life. Nat. Methods 17, 495–503 (2020).

    Article  CAS  PubMed  Google Scholar 

  19. Ochoa, D. et al. The functional landscape of the human phosphoproteome. Nat. Biotechnol. 38, 365–373 (2020).

    Article  CAS  PubMed  Google Scholar 

  20. Kustatscher, G. et al. Co-regulation map of the human proteome enables identification of protein functions. Nat. Biotechnol. 37, 1361–1371 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Mellacheruvu, D. et al. The CRAPome: a contaminant repository for affinity purification–mass spectrometry data. Nat. Methods 10, 730–736 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Romanov, N. et al. Disentangling genetic and environmental effects on the proteotypes of individuals. Cell 177, 1308–1318 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Skinnider, M. A., Squair, J. W. & Foster, L. J. Evaluating measures of association for single-cell transcriptomics. Nat. Methods 16, 381–386 (2019).

    Article  CAS  PubMed  Google Scholar 

  24. Stacey, R. G., Skinnider, M. A., Scott, N. E. & Foster, L. J. A rapid and accurate approach for prediction of interactomes from co-elution data (PrInCE). BMC Bioinformatics 18, 457 (2017).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  25. Bludau, I. et al. Complex-centric proteome profiling by SEC-SWATH-MS for the parallel detection of hundreds of protein complexes. Nat. Protoc. 15, 2341–2386 (2020).

    Article  CAS  PubMed  Google Scholar 

  26. Cox, J. et al. Accurate proteome-wide label-free quantification by delayed normalization and maximal peptide ratio extraction, termed MaxLFQ. Mol. Cell. Proteomics 13, 2513–2526 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Liu, H., Sadygov, R. G. & Yates, J. R. A model for random sampling and estimation of relative protein abundance in shotgun proteomics. Anal. Chem. 76, 4193–4201 (2004).

    Article  CAS  PubMed  Google Scholar 

  28. Al Shweiki, M. R. et al. Assessment of label-free quantification in discovery proteomics and impact of technological factors and natural variability of protein abundance. J. Proteome Res. 16, 1410–1424 (2017).

    Article  CAS  PubMed  Google Scholar 

  29. McIlwain, S. et al. Estimating relative abundances of proteins from shotgun proteomics data. BMC Bioinformatics 13, 308 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Scott, N. E., Brown, L. M., Kristensen, A. R. & Foster, L. J. Development of a computational framework for the analysis of protein correlation profiling and spatial proteomics experiments. J. Proteomics 118, 112–129 (2015).

    Article  CAS  PubMed  Google Scholar 

  31. Scott, N. E. et al. Interactome disassembly during apoptosis occurs independent of caspase cleavage. Mol. Syst. Biol. 13, 906 (2017).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  32. Pourhaghighi, R. et al. BraInMap elucidates the macromolecular connectivity landscape of mammalian brain. Cell Syst. 10, 333–350 (2020).

  33. Kastritis, P. L. et al. Capturing protein communities by structural proteomics in a thermophilic eukaryote. Mol. Syst. Biol. 13, 936 (2017).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  34. Drew, K. et al. Integration of over 9,000 mass spectrometry experiments builds a global map of human protein complexes. Mol. Syst. Biol. 13, 932 (2017).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  35. Drew, K., Wallingford, J. B. & Marcotte, E. M. hu.MAP 2.0: integration of over 15,000 proteomic experiments builds a global compendium of human multiprotein assemblies. Mol. Syst. Biol. 17, e10016 (2021).

  36. Ballouz, S., Weber, M., Pavlidis, P. & Gillis, J. EGAD: ultra-fast functional analysis of gene networks. Bioinformatics 33, 612–614 (2017).

    Article  CAS  PubMed  Google Scholar 

  37. Lapek, J. D. et al. Detection of dysregulated protein-association networks by high-throughput proteomics predicts cancer vulnerabilities. Nat. Biotechnol. 35, 983–989 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Orre, L. M. et al. SubCellBarCode: proteome-wide mapping of protein localization and relocalization. Mol. Cell 73, 166–182 (2019).

    Article  CAS  PubMed  Google Scholar 

  39. Geladaki, A. et al. Combining LOPIT with differential ultracentrifugation for high-resolution spatial proteomics. Nat. Commun. 10, 331 (2019).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  40. Cusick, M. E. et al. Literature-curated protein interaction datasets. Nat. Methods 6, 39–46 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Heide, H. et al. Complexome profiling identifies TMEM126B as a component of the mitochondrial complex I assembly complex. Cell Metab. 16, 538–549 (2012).

    Article  CAS  PubMed  Google Scholar 

  42. von Mering, C. et al. Comparative assessment of large-scale data sets of protein–protein interactions. Nature 417, 399–403 (2002).

    Article  CAS  Google Scholar 

  43. Venkatesan, K. et al. An empirical framework for binary interactome mapping. Nat. Methods 6, 83–90 (2009).

    Article  CAS  PubMed  Google Scholar 

  44. Stacey, R. G., Skinnider, M. A. & Foster, L. J. On the robustness of graph-based clustering to random network alterations. Mol. Cell. Proteomics 20, 100002 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  45. McBride, Z. et al. A label-free mass spectrometry method to predict endogenous protein complex composition. Mol. Cell. Proteomics 18, 1588–1606 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Heusel, M. et al. Complex-centric proteome profiling by SEC-SWATH-MS. Mol. Syst. Biol. 15, e8438 (2019).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  47. Salas, D., Stacey, R. G., Akinlaja, M. & Foster, L. J. Next-generation interactomics: considerations for the use of co-elution to measure protein interaction networks. Mol. Cell. Proteomics 19, 1–10 (2020).

    Article  CAS  PubMed  Google Scholar 

  48. Pang, C. N. I. et al. Analytical guidelines for co-fractionation mass spectrometry obtained through global profiling of gold standard Saccharomyces cerevisiae protein complexes. Mol. Cell. Proteomics 19, 1876–1895 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Gorka, M. et al. Protein Complex Identification and quantitative complexome by CN-PAGE. Sci. Rep. 9, 11523 (2019).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  50. Mallam, A. L. et al. Systematic discovery of endogenous human ribonucleoprotein complexes. Cell Rep. 29, 1351–1368 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Drew, K. et al. A systematic, label-free method for identifying RNA-associated proteins in vivo provides insights into vertebrate ciliary beating machinery. Dev. Biol. 467, 108–117 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Bludau, I. et al. Systematic detection of functional proteoform groups from bottom–up proteomic datasets. Preprint at bioRxiv https://doi.org/10.1101/2020.12.22.423928 (2020).

  53. Garzón, J. I. et al. A computational interactome and functional annotation for the human proteome. eLife 5, e18715 (2016).

  54. Meyer, M. J. et al. Interactome INSIDER: a structural interactome browser for genomic studies. Nat. Methods 15, 107–114 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Cunningham, J. M., Koytiger, G., Sorger, P. K. & AlQuraishi, M. Biophysical prediction of protein–peptide interactions and signaling networks using machine learning. Nat. Methods 17, 175–183 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Hopf, T. A. et al. Sequence co-evolution gives 3D contacts and structures of protein complexes. eLife 3, e03430 (2014).

  57. Wang, M., Herrmann, C. J., Simonovic, M., Szklarczyk, D. & von Mering, C. Version 4.0 of PaxDb: protein abundance data, integrated across model organisms, tissues, and cell-lines. Proteomics 15, 3163–3168 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Kovalchik, K. A. et al. RawTools: rapid and dynamic interrogation of Orbitrap data files for mass spectrometer system management. J. Proteome Res. 18, 700–708 (2019).

    Article  CAS  PubMed  Google Scholar 

  59. Bogdanow, B., Zauber, H. & Selbach, M. Systematic errors in peptide and protein identification and quantification by modified peptides. Mol. Cell. Proteomics 15, 2791–2801 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Alexa, A., Rahnenführer, J. & Lengauer, T. Improved scoring of functional groups from gene expression data by decorrelating GO graph structure. Bioinformatics 22, 1600–1607 (2006).

    Article  CAS  PubMed  Google Scholar 

  61. Falcon, S. & Gentleman, R. Using GOstats to test gene lists for GO term association. Bioinformatics 23, 257–258 (2007).

    Article  CAS  PubMed  Google Scholar 

  62. Sánchez-Taltavull, D., Ramachandran, P., Lau, N. & Perkins, T. J. Bayesian correlation analysis for sequence count data. PLoS ONE 11, e0163595 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  63. Huynh-Thu, V. A., Irrthum, A., Wehenkel, L. & Geurts, P. Inferring regulatory networks from expression data using tree-based methods. PLoS ONE 5, e12776 (2010).

  64. Székely, G. J., Rizzo, M. L. & Bakirov, N. K. Measuring and testing dependence by correlation of distances. Ann. Stat. 35, 2769–2794 (2007).

    Article  Google Scholar 

  65. Simon, N. & Tibshirani, R. Comment on “Detecting novel associations in large data sets” by Reshef et al., Science Dec. 16, 2011. Preprint at https://arxiv.org/abs/1401.7645 (2014).

  66. Kinney, J. B. & Atwal, G. S. Equitability, mutual information, and the maximal information coefficient. Proc. Natl Acad. Sci. USA 111, 3354–3359 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. Dixon, P. VEGAN, a package of R functions for community ecology. J. Veg. Sci. 14, 927–930 (2003).

    Article  Google Scholar 

  68. Foroushani, A. et al. Large-scale gene network analysis reveals the significance of extracellular matrix pathway and homeobox genes in acute myeloid leukemia: an introduction to the Pigengene package and its applications. BMC Med. Genomics 10, 16 (2017).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  69. Brunner, E. & Munzel, U. The nonparametric Behrens–Fisher problem: asymptotic theory and a small-sample approximation. Biomed. J. 42, 17–25 (2000).

    Google Scholar 

  70. Munzel, U. & Brunner, E. An exact paired rank test. Biomed. J. 44, 584–593 (2002).

    Google Scholar 

  71. Skinnider, M. A., Cai, C., Stacey, R. G. & Foster, L. J. PrInCE: an R/Bioconductor package for protein–protein interaction network inference from co-fractionation mass spectrometry data. Bioinformatics https://doi.org/10.1093/bioinformatics/btab022 (2021).

  72. Larance, M. et al. Global membrane protein interactome analysis using in vivo crosslinking and mass spectrometry-based protein correlation profiling. Mol. Cell. Proteomics 15, 2476–2490 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  73. Crozier, T. W. M., Tinti, M., Larance, M., Lamond, A. I. & Ferguson, M. A. J. Prediction of protein complexes in Trypanosoma brucei by protein correlation profiling mass spectrometry and machine learning. Mol. Cell. Proteomics 16, 2254–2267 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  74. Hillier, C. et al. Landscape of the Plasmodium interactome reveals both conserved and species-specific functionality. Cell Rep. 28, 1635–1647 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  75. Kerr, C. H. et al. Dynamic rewiring of the human interactome by interferon signaling. Genome Biol. 21, 140 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  76. Liebeskind, B. J., Aldrich, R. W. & Marcotte, E. M. Ancestral reconstruction of protein interaction networks. PLoS Comput. Biol. 15, e1007396 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  77. Skinnider, M. A., Stacey, R. G. & Foster, L. J. Genomic data integration systematically biases interactome mapping. PLoS Comput. Biol. 14, e1006474 (2018).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  78. Carlson, M. L. et al. Profiling the Escherichia coli membrane protein interactome captured in Peptidisc libraries. eLife 8, e46615 (2019).

  79. Oliver, S. Guilt-by-association goes global. Nature 403, 601–603 (2000).

    Article  CAS  PubMed  Google Scholar 

  80. Schwikowski, B., Uetz, P. & Fields, S. A network of protein–protein interactions in yeast. Nat. Biotechnol. 18, 1257–1261 (2000).

    Article  CAS  PubMed  Google Scholar 

  81. Huerta-Cepas, J. et al. Fast genome-wide functional annotation through orthology assignment by eggNOG-Mapper. Mol. Biol. Evol. 34, 2115–2122 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  82. Kumar, S., Stecher, G., Suleski, M. & Hedges, S. B. TimeTree: a resource for timelines, timetrees, and divergence times. Mol. Biol. Evol. 34, 1812–1819 (2017).

    Article  CAS  PubMed  Google Scholar 

  83. Perez-Riverol, Y. et al. The PRIDE database and related tools and resources in 2019: improving support for quantification data. Nucleic Acids Res. 47, D442–D450 (2019).

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

This work was supported by funding from Genome Canada and Genome British Columbia (project 214PRO). M.A.S. acknowledges support from a CIHR Vanier Canada Graduate Scholarship, an Izaak Walton Killam Memorial Pre-Doctoral Fellowship, a University of British Columbia (UBC) Four Year Fellowship and a Vancouver Coastal Health–CIHR–UBC MD/PhD Studentship. This work was enabled in part by the support provided by WestGrid and Compute Canada and through computational resources and services provided by Advanced Research Computing at the UBC. We thank Microsoft for the donation of cloud computing resources that enabled part of this work, T. Clark and J. Moon for advice on MaxQuant searches and D. Vavilov for assistance with the web server.

Author information

Authors and Affiliations

Authors

Contributions

M.A.S. and L.J.F. designed experiments. M.A.S. performed experiments. M.A.S. and L.J.F. wrote the manuscript.

Corresponding author

Correspondence to Leonard J. Foster.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Methods thanks Fridtjof Lund-Johansen and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Arunima Singh was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 A uniformly processed resource of CF–MS data.

a, Approaches to protein quantification employed by published CF–MS experiments. SILAC, stable isotope labelling by amino acids in cell culture; iBAQ, intensity-based absolute quantification. b, Proportion of the organismal proteome quantified in each CF–MS experiment (grey lines, individual datasets; blue line, mean across all datasets). c, Cumulative distribution of the number of proteins quantified per dataset in between one and 25 fractions. d, GO term enrichment among CORUM proteins detected in at least one CF–MS fraction, left, or never detected, right. e, PaxDb consensus protein abundance of mouse proteins detected or never detected by CF–MS. f, Coverage of high, moderate, and low abundance proteins (expressed as a mean proportion of fractions in which these proteins were detected) in published human CF–MS experiments (n = 46). g, As in f, but with CF–MS experiments divided into three groups based on the length of the liquid chromatography gradient. ***, p < 0.001, two-sided Spearman rank correlation. h, PaxDb consensus protein abundance of human proteins in the CORUM database and non-CORUM proteins. i, Difference in the number of protein groups quantified in each CF–MS experiment, compared to the processed chromatogram data accompanying the original publications (grey lines, individual datasets; blue line, mean across all datasets).

Extended Data Fig. 2 Benchmarking computational analysis of individual CF–MS datasets.

a, Measures of association used to quantify the similarity of two protein chromatograms in published CF–MS studies. Bottom row indicates the incorporation of external genomic datasets77. b, Ranks of each measure of association in identifying protein pairs in the same protein complex, left, or annotated to the same GO term, right, across individual CF–MS datasets. c, Number of peaks detected in 20 CF–MS datasets by fitting a mixture of Gaussians to each protein chromatogram. d, Recovery of known protein complexes in the 20 CF–MS datasets from c, scoring only chromatograms that could be fit with a mixture of Gaussians (r2 ≥ 0.5) and comparing the 24 different measures of association shown in Fig. 2 with the co-apex score. Inset text shows the median AUC for each measure of association. e, As in d, but for proteins annotated to the same GO term. f, Recovery of known protein complexes, top, and proportion of originally quantified proteins, bottom, when filtering profiles not detected in some minimum number of fractions, using mutual information as a measure of profile similarity. g, Mean number of protein groups identified, top, and recovery of proteins annotated to the same GO term, bottom, for three approaches to label-free quantification implemented in MaxQuant.

Extended Data Fig. 3 Univariate statistical analysis of computational approaches to individual CF-MS datasets.

a, Difference in the median protein complex AUC between each pair of measures of association. Asterisks indicate pairs of measures of association with a p-value less than 0.05 in a two-sided Brunner-Munzel test. The difference in median AUCs is capped at [–0.1, +0.1] to improve visualization. b, As in a, but for GO terms. c, Difference in the median protein complex AUC between each pair of missing value handling strategies. Asterisks indicate pairs of measures of association with a p-value less than 0.05 in a two-sided Brunner-Munzel test. d, As in c, but for GO terms. e, Difference in the median protein complex AUC between each pair of chromatogram normalization approaches. Asterisks indicate pairs of measures of association with a p-value less than 0.05 in a two-sided Brunner-Munzel test. f, As in e, but for GO terms. g, Difference in the median protein complex AUC between each pair of measures of association, considering only the single best combination of missing value handling and chromatogram normalization for each measure of association. Asterisks indicate pairs of measures of association with a p-value less than 0.05 in a two-sided Brunner-Munzel test. h, As in g, but for GO terms. i, Median difference in the protein complex AUC between matched datasets with label-free protein quantification performed by one of three algorithms within MaxQuant. Asterisks indicate pairs of measures of association with a p-value less than 0.05 in a two-sided paired Brunner-Munzel test. j, As in i, but for GO terms.

Extended Data Fig. 4 Analysis pipelines for individual CF–MS datasets.

a, Recovery of known protein complexes with 163 valid combinations of measures of association, missing value handling, and normalization strategies. b, As in a, but for proteins annotated to the same GO term.

Extended Data Fig. 5 Downsampling analysis of published CF–MS experiments.

a, Recovery of proteins annotated to the same GO term after downsampling CF–MS chromatograms to a fixed number of fractions. b–c, Recovery of protein complexes, b, and GO terms, c, in downsampled CF–MS chromatograms, using mutual information as the measure of profile similarity. d–e, Recovery of protein complexes (AUC, d, and change in AUC, e), in downsampled CF–MS chromatograms, with experiments divided based on the separation method used (IEX, ion exchange chromatography; N-PAGE, native polyacrylamide gel electrophoresis; SEC, size exclusion chromatography). f, Recovery of protein complexes when downsampling windows of adjacent fractions of fixed length, rather than downsampling fractions randomly from the chromatogram matrix. g–h, Comparison of protein complex recovery (AUC, g, and change in AUC, h), in downsampled CF–MS chromatograms when drawing samples of random fractions or adjacent fractions from the chromatogram matrix. a–h, Shaded area shows the standard error.

Extended Data Fig. 6 Downsampling analysis of published CF–MS experiments incorporating multiple biological replicates.

a–b, Recovery of known protein complexes, a, and proteins annotated to the same GO term, b, in downsampled CF–MS chromatograms with fractions sampled from one to five replicates, as shown in Fig. 3b but visualized here as a line graph instead. Error bars show the standard error of the mean. c–d, Recovery of known protein complexes, c, and proteins annotated to the same GO term, d, in downsampled CF–MS chromatograms with fractions sampled from one to five replicates, within individual CF–MS datasets.

Extended Data Fig. 7 Protein quantification and chromatographic separation.

a, Comparison of GO term recovery and proteome coverage between SILAC ratios and iBAQ intensities from individual isotopologue channels in 20 SILAC datasets. Caption and error bars show the mean and standard deviation of the differences in the number of protein groups quantified and the AUC between SILAC ratios and iBAQ intensities. b, Recovery of proteins annotated to the same GO term in CF–MS experiments grouped by fractionation method. c, Difference in the median protein complex AUC between each pair of fractionation methods. Asterisks indicate pairs of measures of association with a p-value less than 0.05 in a two-sided Brunner-Munzel test. d, Regression coefficients for fractionation methods in multivariable statistical analysis, estimated by a linear model fit to the protein complex AUC and including terms for measures of association, missing value handling strategies, approaches to chromatogram normalization, and interactions between them. e, As in c, but for GO terms. f, As in d, but for GO terms. g, Recovery of known protein complexes in published CF–MS experiments grouped by fraction method, with each measure of association shown separately. h, As in g, but for proteins annotated to the same GO term. i, Recovery of individual protein complexes in published CF–MS experiments grouped by fractionation method. j, Number of protein complexes with at least three subunits detected exclusively by one separation method, left, and resolved significantly better by one of the four separation methods, right, across 67 human and mouse CF–MS datasets. k, Examples of protein complexes resolved best by each of the four separation methods. Inset text shows the median AUC.

Extended Data Fig. 8 Machine learning workflows for the integration of multiple CF–MS replicates.

a, Schematic overview of cross-validation approaches for CF–MS data. b, Comparison of cross-validation by protein pairs or individual proteins in network inference from two to four CF–MS experiments using a naive Bayes classifier, with AUCs calculated in cross-validation or in an independent set of held-out protein complexes. c, Impact of feature selection on network inference from two to four CF–MS experiments, comparing between one and six top-performing features, an equivalent number of random features, or five features computed in PrInCE. d, Comparison of top-performing or random features in network inference from two to ten CF–MS experiments, using between one and ten top-performing features. e, Comparison of network inference with features calculated from concatenated matrices of two to four CF–MS experiments, or with features calculated from individual experiments. f, Comparison of network inference from two to four CF–MS experiments using a naive Bayes classifier before and after median imputation of missing values. g, Impact of the number of top-performing or random features provided as input on network inference from two to ten CF–MS experiments. h, Comparison of random forest and naive Bayes classifiers in network inference from two to ten CF–MS replicates, using between one and ten features. i, Network inference from human CF–MS data when integrating varying proportions of SEC and IEX experiments. The total number of CF–MS datasets is shown above the plots, and the number of SEC datasets is shown on the x-axis.

Extended Data Fig. 9 Synergistic and antagonistic feature combinations in network inference from CF–MS data.

a, Performance (AUC, left, and rank, right) of naive Bayes and random forest classifiers trained on 24 measures of association in network inference from combinations of between two and six CF–MS datasets. Each cell reflects the mean AUC from 10 random combinations of datasets. b, Summary of synergistic and antagonistic interactions between features in CF–MS network inference, as shown in detail in panels c–f. Fill reflects the number of times a synergistic (magenta) or antagonistic (cyan) interaction was detected between two features. Network inference was performed using all possible combinations of 24 measures of association from ten random combinations of three or six CF–MS datasets, using either a random forest or naive Bayes classifier. Rows and columns are arranged by the mean performance of individual features across all combinations shown in a (both classifiers, two to six datasets). c, Performance (AUC) of networks inferred from combinations of three CF–MS datasets using a random forest classifier. Rows and columns are arranged by the mean performance of individual features in the same scenario. Text highlights significantly synergistic (+) and antagonistic (–) interactions. Each cell shows the mean AUC from 10 random combinations of datasets. d, As in c, but using a naive Bayes classifier. e, As in c, but for networks inferred from combinations of six CF–MS datasets. f, As in c, but for networks inferred from combinations of six CF–MS datasets, using a naive Bayes classifier.

Extended Data Fig. 10 Saturation analysis of network inference from CF–MS data.

a, Saturation analysis of network inference from two to 40 CF–MS experiments, using variable numbers of top-performing features. Boxplots show n = 10 independent samples. b, Impact of downsampling training set complexes on network inference from two to four CF–MS replicates.

Supplementary information

Supplementary Information

Supplementary Figs. 1–3

Reporting Summary

Supplementary Table 1

Complete list of all CF-MS experiments, a, and raw mass spectrometry files, b, analyzed in this study.

Supplementary Table 2

ROC analysis of protein complexes, a, and GO term membership, b, within individual CF-MS datasets.

Supplementary Table 3

Univariate statistical analysis of computational approaches to individual CF-MS datasets. a,c, protein complexes; b,d, GO term membership.

Supplementary Table 4

Multivariate statistical analysis of computational approaches to individual CF-MS datasets. a, protein complexes; b, GO term membership.

Supplementary Table 5

Univariate and multivariate statistical analysis of methods for protein complex separation.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Skinnider, M.A., Foster, L.J. Meta-analysis defines principles for the design and analysis of co-fractionation mass spectrometry experiments. Nat Methods 18, 806–815 (2021). https://doi.org/10.1038/s41592-021-01194-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41592-021-01194-4

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing