EPIC: software toolkit for elution profile-based inference of protein complexes

Hu, Lucas ZhongMing; Goebels, Florian; Tan, June H.; Wolf, Eric; Kuzmanov, Uros; Wan, Cuihong; Phanse, Sadhna; Xu, Changjiang; Schertzberg, Mike; Fraser, Andrew G.; Bader, Gary D.; Emili, Andrew

doi:10.1038/s41592-019-0461-4

Article
Published: 15 July 2019

EPIC: software toolkit for elution profile-based inference of protein complexes

Lucas ZhongMing Hu^1,2^na1,
Florian Goebels¹^na1,
June H. Tan ORCID: orcid.org/0000-0001-6597-3952^1,2,
Eric Wolf^1,2,
Uros Kuzmanov¹,
Cuihong Wan¹^nAff4,
Sadhna Phanse¹,
Changjiang Xu¹,
Mike Schertzberg¹,
Andrew G. Fraser^1,2,
Gary D. Bader ORCID: orcid.org/0000-0003-0185-8861^1,2 &
…
Andrew Emili ORCID: orcid.org/0000-0001-8995-246X^1,2,3

Nature Methods volume 16, pages 737–742 (2019)Cite this article

6455 Accesses
52 Citations
19 Altmetric
Metrics details

Subjects

Abstract

Protein complexes are key macromolecular machines of the cell, but their description remains incomplete. We and others previously reported an experimental strategy for global characterization of native protein assemblies based on chromatographic fractionation of biological extracts coupled to precision mass spectrometry analysis (chromatographic fractionation–mass spectrometry, CF–MS), but the resulting data are challenging to process and interpret. Here, we describe EPIC (elution profile-based inference of complexes), a software toolkit for automated scoring of large-scale CF–MS data to define high-confidence multi-component macromolecules from diverse biological specimens. As a case study, we used EPIC to map the global interactome of Caenorhabditis elegans, defining 612 putative worm protein complexes linked to diverse biological processes. These included novel subunits and assemblies unique to nematodes that we validated using orthogonal methods. The open source EPIC software is freely available as a Jupyter notebook packaged in a Docker container (https://hub.docker.com/r/baderlab/bio-epic/).

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 3: Prediction, benchmarking and analysis of *C. elegans* protein complexes.**

Pooled multicolour tagging for visualizing subcellular protein dynamics

Article Open access 19 April 2024

Luciferase- and HaloTag-based reporter assays to measure small-molecule-induced degradation pathway in living cells

Article 18 April 2024

Proteome-scale discovery of protein degradation and stabilization effectors

Article 20 March 2024

Data availability

The supporting co-fractionation data are available via ProteomeXchange with the identifier PXD011182. The entire WormMap network (Cytoscape format) is available on GitHub (https://github.com/BaderLab/EPIC/tree/master/WormMap) and has been submitted to the BioGRID database. Source Data for Fig. 2 are available online.

Code availability

EPIC is available via a Docker container (https://hub.docker.com/r/baderlab/bio-epic/). The EPIC software code is publicly available on GitHub (https://github.com/BaderLab/EPIC).

References

Rigaut, G. et al. A generic protein purification method for protein complex characterization and proteome exploration. Nat. Biotechnol. 17, 1030–1032 (1999).
Article CAS Google Scholar
Krogan, N. J. et al. Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature 440, 637–643 (2006).
Article CAS Google Scholar
Gavin, A. C. et al. Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 415, 141–147 (2002).
Article CAS Google Scholar
Ho, Y. et al. Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 415, 180–183 (2002).
Article CAS Google Scholar
Gavin, A. C. et al. Proteome survey reveals modularity of the yeast cell machinery. Nature 440, 631–636 (2006).
Article CAS Google Scholar
Hu, P. et al. Global functional atlas of Escherichia coli encompassing previously uncharacterized proteins. PLoS Biol. 7, e96 (2009).
Article Google Scholar
Huttlin, E. L. et al. The BioPlex network: a systematic exploration of the human interactome. Cell 162, 425–440 (2015).
Article CAS Google Scholar
Hein, M. Y. et al. A human interactome in three quantitative dimensions organized by stoichiometries and abundances. Cell 163, 712–723 (2015).
Article CAS Google Scholar
Babu, M. et al. Interaction landscape of membrane-protein complexes in Saccharomyces cerevisiae. Nature 489, 585–589 (2012).
Article CAS Google Scholar
Havugimana, P. C. et al. A census of human soluble protein complexes. Cell 150, 1068–1081 (2012).
Article CAS Google Scholar
Wan, C. et al. Panorama of ancient metazoan macromolecular complexes. Nature 525, 339–344 (2015).
Article CAS Google Scholar
Liu, F., Rijkers, D. T., Post, H. & Heck, A. J. Proteome-wide profiling of protein assemblies by cross-linking mass spectrometry. Nat. Meth. 12, 1179–1184 (2015).
Article CAS Google Scholar
Ruepp, A. et al. CORUM: the comprehensive resource of mammalian protein complexes—2009. Nucleic Acids Res. 38, D497–D501 (2010).
Article CAS Google Scholar
UniProt, C. UniProt: a hub for protein information. Nucleic Acids Res. 43, D204–D212 (2015).
Article Google Scholar
Orchard, S. et al. The MIntAct project-IntAct as a common curation platform for 11 molecular interaction databases. Nucleic Acids Res. 42, D358–D363 (2014).
Article CAS Google Scholar
The Gene Ontology, C. Expansion of the Gene Ontology knowledgebase and resources. Nucleic Acids Res. 45, D331–D338 (2017).
Article Google Scholar
Zuberi, K. et al. GeneMANIA prediction server 2013 update. Nucleic Acids Res. 41, W115–W122 (2013).
Article Google Scholar
Szklarczyk, D. et al. The STRING database in 2017: quality-controlled protein–protein association networks, made broadly accessible. Nucleic Acids Res. 45, D362–D368 (2017).
Article CAS Google Scholar
Sonnhammer, E. L. & Ostlund, G. InParanoid 8: orthology analysis between 273 proteomes, mostly eukaryotic. Nucleic Acids Res. 43, D234–D239 (2015).
Article CAS Google Scholar
Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003).
Article CAS Google Scholar
Stacey, R. G., Skinnider, M. A., Scott, N. E. & Foster, L. J. A rapid and accurate approach for prediction of interactomes from co-elution data (PrInCE). BMC Bioinformatics 18, 457 (2017).
Article Google Scholar
Sanchez-Taltavull, D., Ramachandran, P., Lau, N. & Perkins, T. J. Bayesian correlation analysis for sequence count data. PloS ONE 11, e0163595 (2016).
Article Google Scholar
Nepusz, T., Yu, H. & Paccanaro, A. Detecting overlapping protein complexes in protein–protein interaction networks. Nat. Meth. 9, 471–472 (2012).
Article CAS Google Scholar
Wiwie, C., Baumbach, J. & Rottger, R. Comparing the performance of biomedical clustering methods. Nat. Meth. 12, 1033–1038 (2015).
Article CAS Google Scholar
Cho, A. et al. WormNetv3: a network-assisted hypothesis-generating server for Caenorhabditis elegans. Nucleic Acids Res. 42, W76–W82 (2014).
Article CAS Google Scholar
Turner, B. et al. iRefWeb: interactive analysis of consolidated protein interaction data and their supporting evidence. Database 2010, baq023 (2010).
Article Google Scholar
Chatr-Aryamontri, A. et al. The BioGRID interaction database: 2017 update. Nucleic Acids Res. 45, D369–D379 (2017).
Article CAS Google Scholar
Mulder, N. J. et al. The InterPro Database, 2003 brings increased coverage and new features. Nucleic Acids Res. 31, 315–318 (2003).
Article CAS Google Scholar
Kagawa, H., Gengyo, K., McLachlan, A. D., Brenner, S. & Karn, J. Paramyosin gene (unc-15) of Caenorhabditis elegans. Molecular cloning, nucleotide sequence and models for thick filament structure. J. Mol. Biol. 207, 311–333 (1989).
Article CAS Google Scholar
Harris, T. W. et al. WormBase: a comprehensive resource for nematode research. Nucleic Acids Res. 38, D463–D467 (2010).
Article CAS Google Scholar
Monemi, S. et al. Identification of a novel adult-onset primary open-angle glaucoma (POAG) gene on 5q22.1. Hum. Mol. Genet 14, 725–733 (2005).
Article CAS Google Scholar
Yunger, E., Safra, M., Levi-Ferber, M., Haviv-Chesner, A. & Henis-Korenblit, S. Innate immunity mediated longevity and longevity induced by germ cell removal converge on the C-type lectin domain protein IRG-7. PLoS Genet. 13, e1006577 (2017).
Article Google Scholar
Amberger, J. S., Bocchini, C. A., Schiettecatte, F., Scott, A. F. & Hamosh, A. OMIM.org: Online Mendelian Inheritance in Man (OMIM(R)), an online catalog of human genes and genetic disorders. Nucleic Acids Res. 43, D789–D798 (2015).
Article Google Scholar
Stenson, P. D. et al. The human gene mutation database: building a comprehensive mutation repository for clinical and molecular genetics, diagnostic testing and personalized genomic medicine. Hum. Genet. 133, 1–9 (2014).
Article CAS Google Scholar
Olinares, P. D., Ponnala, L. & van Wijk, K. J. Megadalton complexes in the chloroplast stroma of Arabidopsis thaliana characterized by size exclusion chromatography, mass spectrometry, and hierarchical clustering. Mol. Cell. Proteomics 9, 1594–1615 (2010).
Article CAS Google Scholar
Skinnider, M. A., Stacey, R. G. & Foster, L. J. Genomic data integration systematically biases interactome mapping. PLoS Comput. Biol. 14, e1006474 (2018).
Article Google Scholar
Tran, J. C. et al. Mapping intact protein isoforms in discovery mode using top-down proteomics. Nature 480, 254–258 (2011).
Article CAS Google Scholar
Werner, T. et al. Ion coalescence of neutron encoded TMT 10-plex reporter ions. Anal. Chem. 86, 3594–3601 (2014).
Article CAS Google Scholar
Ideker, T. & Krogan, N. J. Differential network biology. Mol. Syst. Biol. 8, 565 (2012).
Article Google Scholar
Stiernagle, T. in WormBook: The Online Review of C. elegans Biology (ed. The C. elegans Research Community) (WormBook).
Kwon, T., Choi, H., Vogel, C., Nesvizhskii, A. I. & Marcotte, E. M. MSblender: a probabilistic approach for integrating peptide identifications from multiple database search engines. J. Proteome Res. 10, 2949–2958 (2011).
Article CAS Google Scholar
Cox, J. & Mann, M. MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat. Biotechnol. 26, 1367–1372 (2008).
Article CAS Google Scholar
Kislinger, T. et al. PRISM, a generic large scale proteomic investigation strategy for mammals. Mol. Cell. Proteomics 2, 96–106 (2003).
Article CAS Google Scholar
Campagnola, P. J. et al. Three-dimensional high-resolution second-harmonic generation imaging of endogenous structural proteins in biological tissues. Biophys. J. 82, 493–508 (2002).
Article CAS Google Scholar
Dupuy, D. et al. A first version of the Caenorhabditis elegans promoterome. Genome Res. 14, 2169–2175 (2004).
Article CAS Google Scholar
Kwan, J. et al. DLG5 connects cell polarity and Hippo signaling protein networks by linking PAR-1 with MST1/2. Genes Dev. 30, 2696–2709 (2016).
Article CAS Google Scholar
Wehrens, R., Melssen, W., Buydens, L. & de Gelder, R. Representing structural databases in a self-organizing map. Acta Crystallogr. B 61, 548–557 (2005).
Article Google Scholar
Brohee, S. & van Helden, J. Evaluation of clustering algorithms for protein–protein interaction networks. BMC Bioinformatics 7, 488 (2006).
Article Google Scholar
Reimand, J. et al. g:Profiler-a web server for functional interpretation of gene lists (2016 update). Nucleic Acids Res. 44, W83–W89 (2016).
Article CAS Google Scholar

Download references

Acknowledgements

This study was supported by a Foundation Grant (FDN no. 148399) from the Canadian Institute of Health Research (CIHR, to A.E.), and US National Institutes of Health grants (nos. P41 GM103504, GM070743 to G.D.B.) L.Z.M.H. was supported by the Natural Sciences and Engineering Research Council of Canada (NSERC) Mass Spectrometry-Enabled Science and Engineering (MS-ESE) program. C. elegans strain access was supported by the NIH Office of Research Infrastructure Programs (P40 OD010440).

Author information

Cuihong Wan
Present address: School of Life Science, Central China Normal University, Wuhan, China
These authors contributed equally: Lucas ZhongMing Hu, Florian Goebels.

Authors and Affiliations

Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada
Lucas ZhongMing Hu, Florian Goebels, June H. Tan, Eric Wolf, Uros Kuzmanov, Cuihong Wan, Sadhna Phanse, Changjiang Xu, Mike Schertzberg, Andrew G. Fraser, Gary D. Bader & Andrew Emili
Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
Lucas ZhongMing Hu, June H. Tan, Eric Wolf, Andrew G. Fraser, Gary D. Bader & Andrew Emili
Departments of Biochemistry and Biology, Boston University, Boston, MA, USA
Andrew Emili

Authors

Lucas ZhongMing Hu
View author publications
You can also search for this author in PubMed Google Scholar
Florian Goebels
View author publications
You can also search for this author in PubMed Google Scholar
June H. Tan
View author publications
You can also search for this author in PubMed Google Scholar
Eric Wolf
View author publications
You can also search for this author in PubMed Google Scholar
Uros Kuzmanov
View author publications
You can also search for this author in PubMed Google Scholar
Cuihong Wan
View author publications
You can also search for this author in PubMed Google Scholar
Sadhna Phanse
View author publications
You can also search for this author in PubMed Google Scholar
Changjiang Xu
View author publications
You can also search for this author in PubMed Google Scholar
Mike Schertzberg
View author publications
You can also search for this author in PubMed Google Scholar
Andrew G. Fraser
View author publications
You can also search for this author in PubMed Google Scholar
Gary D. Bader
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Emili
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

A.E. and G.D.B. conceived the project. L.Z.M.H. and F.G. wrote the software, performed computational analysis and wrote the manuscript. L.Z.M.H. and C.W. performed the co-fractionation experiments. J.H.T. performed the protein GFP tagging in C. elegans with assistance and guidance from M.S. and A.G.F. The AP–MS experiments were performed by E.W. with assistance from U.K. S.P. and C.X. provided technical support. G.D.B and A.E. supervised the study and edited the paper.

Corresponding authors

Correspondence to Gary D. Bader or Andrew Emili.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information: Allison Doerr was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Integrated supplementary information

Supplementary Figure 1 Pre-enrichment improves the dynamic range of CF/MS studies.

a) Schematic workflow of bead-based sample pre-enrichment. b) Venn diagram showing improved proteome coverage by pre-enrichment. c) Bar chart showing improved detection of low abundance proteins. d) Bar chart showing improved detection of small (low molecular mass) proteins. e) Bar chart showing the distribution of identified proteins across top 8 biological processes in GO. f) Bar chart showing the distribution of identified proteins across top 13 cellular localizations in GO. g) Bar chart showing distribution of identified proteins across top 13 molecular functions in GO.

Supplementary Figure 2 Schematic workflow for generating a training set of macromolecules.

Previously reported protein complexes, collected from the CORUM, GO and Intact curation databases, are first mapped to a target species protein complexes based on InParanoid orthology predictions. Redundancy is minimized to generate a final set of reference assemblies.

Supplementary Figure 3 Co-elution profile similarity predicts PPIs.

Plots showing the Pearson correlation coefficients (distribution density curves) obtained for a representative worm protein co-fractionation experiment; positive (CORUM derived; blue) and negative (randomized; orange) co-complex interactions, as well as the positive/negative ratio (green), are shown.

Supplementary Figure 4 Correlation score cut-off setting.

Histogram of maximal correlation scores of positive protein-protein interaction pairs among all seven different correlation metrics across all 16 co-fractionation experiments. The red line indicates the cutoff chosen for EPIC.

Supplementary Figure 5 Composite score comparison for original and optimized features integrated with different sources functional evidence.

Composite score analysis demonstrates that for predicting complexes, based on EPIC analysis of CF/MS data, integration of functional associations from WormNet outperforms STRING and GeneMANIA evidence. The analysis also shows an optimized set of EPIC-derived co-elution scores better predicts protein complex memberships than were reported previously.

Supplementary Figure 6 ROC curve and Precision-recall curve for co-complex PPI prediction from different input data.

The plot demonstrates that the best co-complex interaction predictions were obtained after integrating experimental data with supporting functional evidence data (that is WormNet).

Supplementary Figure 7

Pie chart showing overlap of predicted co-complex interactions with PPIs from BioGRID, iRefIndex and our previously reported conserved metazoan complex map.

Supplementary Figure 8

Detailed overview of the EPIC computational pipeline.

Supplementary Figure 9 Comparison of peptides identified using different search tools.

a) Number of Peptides before and after removing ‘one-hit-wonders’ for each used searching tools identified in one co-fractionation experiment. There are 16 co-fractionation experiments (n = 16). b) Percentage of one-hit-wonders for each search engine. There are 16 co-fractionation experiments (n = 16). In each box plot, the red line is the median, the lower and upper line of the box indicates the first and the third quartile. The upper and lower whiskers extend to the largest value less than the third quartile plus 1.5 times the interquartile range (IQR) and smallest value greater than first quartile minus 1.5 times the IQR, respectively. All data points beyond the whiskers are plotted as individual points.

Supplementary Figure 10 Number of Poisson noise iteration comparison.

Precision-recall (PR) curves (a) and Receiver-operating-characteristic (ROC) curves (b) for different iterations of Poisson noise added in Pearson correlation coefficients feature.

Supplementary Figure 11 Different Bayes correlation priors comparison.

Precision-recall (PR) curves (a) and Receiver-operating-characteristic (ROC) curves (b) for different Bayes correlation priors: uniform (Bayes1), Dirichlet-marginalized (Bayes2) and zero count-motivated (Bayes3).

Supplementary Figure 12 EPIC parameters global optimization by nested cross-validation.

(a). Boxplot showing the complex prediction performance (composite score) from two different machine-learning classifiers (random forest n = 1014 vs. support vector machine n = 945). (b). Boxplot showing the complex prediction performance (composite score) based on the 234 results from each four different protein search/quantification tool. (c). Boxplot showing the relationship between different numbers of correlation scores and complex prediction performance (that is composite score). n = 28, 110, 224, 280, 224, 112, 32 and 4 are the number of composite score results with various correlation scores used (from 1 to 8). Red arrow indicates the set of (five) correlation scores producing the highest composite score. In each box plot, the red line is the median, the lower and upper line of the box indicates the first and the third quartile. The upper and lower whiskers extend to the largest value less than the third quartile plus 1.5 times the interquartile range (IQR) and smallest value greater than first quartile minus 1.5 times the IQR, respectively. All data points beyond the whiskers are plotted as individual points.

Supplementary Figure 13 Exploring the value of additional experiments.

(a). Line plot of the number of experiments and corresponding averaged composite score. (b). Line plot of the number of experiments and the corresponding averaged value of composite score times the number of predicted protein complexes.

Supplementary information

Supplementary Information

Supplementary Figs. 1–13 and Supplementary Tables 1, 8 and 9

Reporting Summary

Supplementary Table 2

Complete list of predicted worm PPIs.

Supplementary Table 3

Complete list of predicted worm protein complexes in WormMap.

Supplementary Table 4

Results of AP–MS validation experiments.

Supplementary Table 5

Functional (GO term) enrichment on assemblies in WormMap.

Supplementary Table 6

Phenotypic enrichment analysis of complexes in WormMap.

Supplementary Table 7

Disease enrichment for human orthologs of worm protein macromolecules in WormMap.

Source data

Source Data Fig. 2

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hu, L.Z., Goebels, F., Tan, J.H. et al. EPIC: software toolkit for elution profile-based inference of protein complexes. Nat Methods 16, 737–742 (2019). https://doi.org/10.1038/s41592-019-0461-4

Download citation

Received: 16 February 2018
Accepted: 15 May 2019
Published: 15 July 2019
Issue Date: August 2019
DOI: https://doi.org/10.1038/s41592-019-0461-4

This article is cited by

DIP-MS: ultra-deep interaction proteomics for the deconvolution of protein complexes
- Fabian Frommelt
- Andrea Fossati
- Matthias Gstaiger
Nature Methods (2024)
Tapioca: a platform for predicting de novo protein–protein interactions in dynamic contexts
- Tavis. J. Reed
- Matthew. D. Tyl
- Ileana. M. Cristea
Nature Methods (2024)
Co-fractionation–mass spectrometry to characterize native mitochondrial protein assemblies in mammalian neurons and brain
- Mara Zilocchi
- Matineh Rahmatbakhsh
- Mohan Babu
Nature Protocols (2023)
Mapping protein states and interactions across the tree of life with co-fractionation mass spectrometry
- Michael A. Skinnider
- Mopelola O. Akinlaja
- Leonard J. Foster
Nature Communications (2023)
Scalable multiplex co-fractionation/mass spectrometry platform for accelerated protein interactome discovery
- Pierre C. Havugimana
- Raghuveera Kumar Goel
- Andrew Emili
Nature Communications (2022)