Panorama of ancient metazoan macromolecular complexes

Journal name:
Nature
Volume:
525,
Pages:
339–344
Date published:
DOI:
doi:10.1038/nature14877
Received
Accepted
Published online

Abstract

Macromolecular complexes are essential to conserved biological processes, but their prevalence across animals is unclear. By combining extensive biochemical fractionation with quantitative mass spectrometry, here we directly examined the composition of soluble multiprotein complexes among diverse metazoan models. Using an integrative approach, we generated a draft conservation map consisting of more than one million putative high-confidence co-complex interactions for species with fully sequenced genomes that encompasses functional modules present broadly across all extant animals. Clustering reveals a spectrum of conservation, ranging from ancient eukaryotic assemblies that have probably served cellular housekeeping roles for at least one billion years, ancestral complexes that have accrued contemporary components, and rarer metazoan innovations linked to multicellularity. We validated these projections by independent co-fractionation experiments in evolutionarily distant species, affinity purification and functional analyses. The comprehensiveness, centrality and modularity of these reconstructed interactomes reflect their fundamental mechanistic importance and adaptive value to animal cell systems.

At a glance

Figures

  1. Workflow.
    Figure 1: Workflow.

    a, Phylogenetic relationships of organisms analysed in this study. We fractionated soluble protein complexes from worm (C. elegans) larvae, fly (D. melanogaster) S2 cells, mouse (M. musculus) embryonic stem cells, sea urchin (S. purpuratus) eggs and human (HEK293/HeLa) cell lines. Holdout species (‘T’, for test) likewise analysed were frog (X. laevis), an amphibian; sea anemone (N. vectensis), a cnidarian with primitive eumetazoan tissue organization; slime mould (D. discoideum), an amoeba; and yeast (S. cerevisiae), a unicellular eukaryote. b, Protein fractions were digested and analysed by high-performance liquid chromatography tandem mass spectrometry (LC–MS/MS), measuring peptide spectral counts and precursor ion intensities. c, Integrative computational analysis. After orthologue mapping to human, correlation scores of co-eluting protein pairs detected in each ‘input’ species were subjected to machine learning together with additional external association evidence, using the CORUM complex database as a reference standard for training. High-confidence interactions were clustered to define co-complex membership.

  2. Derivation and projection of protein co-complex associations across taxa.
    Figure 2: Derivation and projection of protein co-complex associations across taxa.

    a, Expanded coverage via experimental scale-up relative to our previous human study6. Chart shows number of proteins detected, most (63%) in two or more species. b, Performance benchmarks, measuring precision and recall of our method and data in identifying known co-complex interactions (annotated human complexes from CORUM39). Complexes were split into training and withheld test sets; fivefold cross-validation against 4,528 interactions derived from the withheld test set shows strong performance gains, beyond baselines achieved using only co-fractionation or external evidence alone. TP, true positive; FP, false positive; FN, false negative. c, Plots showing high enrichment (probability ratio of interacting) of predicted interacting orthologous protein pairs (relative to non-interacting pairs) among highly correlated fractionation profiles, in both the holdout validation (test, T) and input species (colours reflect clade memberships). d, Left, representative co-fractionation data (normalized spectral counts shown for portions of 3 of 42 experimental profiles) from human, fly and sea urchin showing characteristic profiles of proteasome core, base and lid sub-complexes. Hierarchical clustering (right) of pan-species pairwise Pearson correlation scores (centre) is consistent with accepted structural models (Protein Data Bank ID: 4CR2; core, red; base, blue; lid, green; out-clusters, white). e, Projection of conserved co-complex interactions across 122 eukaryotic species, indicating overlap with leading public PPI reference databases39, 40, 41. STRING bars indicate excess over CORUM; GeneMANIA bars indicate excess over both; component and interaction occurrences across clades indicated at bottom.

  3. Prevalence of conservation of protein complexes across Metazoa and beyond.
    Figure 3: Prevalence of conservation of protein complexes across Metazoa and beyond.

    a, Conserved multiprotein complexes, identified by clustering, arranged according to average estimated component age (see Supplementary Methods and ref. 25). Proteins (nodes) classified as metazoan (green) or ancient (orange); assemblies showing divergent phylogenetic trajectories termed ‘mixed’. b, Example complexes with different proportions of old and new subunits. c, Presumed origins of metazoan (new), mixed and old complexes; ‘?’ indicates variable origins of new genes. d, Heat map showing prevalence of selected complexes across phyla. Colour reflects fraction of components with detectable orthologues (absence, dark blue). Sea anemone (N. vectensis) is the most distant metazoan (cnidarian) analysed biochemically.

  4. Physical validation of complexes.
    Figure 4: Physical validation of complexes.

    a, Verification of complexes from tagged human cell lines and transgenic worms (see Supplementary Methods; complexes drawn as in Fig. 3). Inset reports spectral counts obtained in replicate AP/MS analyses of indicated bait protein (header). MIB2–VPS4 complex confirmed by co-immunoprecipitation (co-IP; Extended Data Fig. 6a). b, Conserved complexes significantly overlap large-scale AP/MS data reported for human cell lines (E. L. Huttlin et al., BioGRID preprint 166968) to a comparable extent as literature reference sets39, 42, using three measures of complex-level agreement (see Supplementary Methods, Extended Data Fig. 6b); ***P < 0.001, determined by shuffling (grey distributions). c, Agreement of inferred molecular weights (MW) of human protein complexes with size-exclusion chromatography profiles (data in c, d, from ref. 43). d, Co-elution of human Commander complex subunits by size-exclusion chromatography consistent with an approximately 500-kDa particle.

  5. Functional validation of complexes.
    Figure 5: Functional validation of complexes.

    a, Morpholino (MO(ATG), targeting start codon to block translation) knockdown of COMMD2 (n = 55 animals, 2 clutches, 1 eye each) or COMMD3 (n = 64) in X. laevis embryos causes defective head and eye development (control n = 57; Extended Data Fig. 9f, h). ***P < 0.0001, 2-sided Mann–Whitney test. b, COMMD2/3 knockdown animals (five embryos per treatment examined) show altered neural patterning, including posterior shift or loss of expression of mid-brain marker EN2 and KROX20 (EGR1), the latter in rhombomeres R3/R5 (compare to Extended Data Fig. 9g, h). c, Enhanced embryonic lethality (epistasis) following RNAi knockdown in C. elegans of B0035.1 (ZNF207) and bub-3 together (eggs laid: HT115, 1,308; B0035.1, 1,096; bub-3, 445; bub-3 + B0035.1, 341). d, Enhanced sensitivity (mean ± s.d. across four cell culture experiments) of two independent CCDC97-knockout lines to the SF3b inhibitor pladienolide B (PB) relative to control HEK293 cells. e, Enrichment (permutation test P value) for interactions among sequential pathway components and metabolic enzymes relative to shuffled controls (n refers to enzyme index, where n, n + 1 denotes sequential enzymes, n, n + 2 sequential-but-one, and so on, as described in Supplementary Information. f, Metabolic channelling as opposed to traditional (typical) two-step cascade model. g, Conserved interactions among consecutively acting enzymes involved in purine biosynthesis (two representative co-fractionation profiles of the 69 total generated are shown).

  6. Performance measures.
    Extended Data Fig. 1: Performance measures.

    a, Performance benchmarks, measuring the precision and recall of our method and data in identifying known co-complex interactions from a withheld reference set of annotated human complexes (from CORUM39; as in Fig. 2b). Fivefold cross-validation against this withheld set shows strong performance gains, beyond a baseline achieved using only human and mouse co-fractionation data along with additional evidence from independent protein interaction screens5, 19 and a functional gene network20 (far-left curve), made by integrating co-fractionation data from the additional non-human animal species (as indicated). ‘All data’ and ‘Fractionation data only’ curves include biochemical fractionation data from all five input species: human, mouse, urchin, fly and worm; the latter curve omits all external data. In all cases, at least two species were required to show supporting biochemical evidence. Recall refers to the fraction of 4,528 total positive interactions derived from the withheld human CORUM complexes. b, All 16,655 interactions were identified at least in two species, half (49%, 8,121) found in three or more species. c, Among these high-confidence co-complex interactions, 8,981 (54%) were not reported in iRefWeb44 (v13.0), BioGRID45 (v3.2.119) or CORUM reference (Supplementary Table 2) for any of the five input species or in yeast; half (46%, 4,128) of these novel co-complex interactions display evidence of co-fractionation in three or more species. d, Final precision/recall performance on withheld interaction test set. A support vector machine classifier was trained using interactions derived from our training set of CORUM complexes, then ~1 million protein pairs found to co-elute in at least two of the five input species were scored by the classifier. Black curve shows precision and recall for ranked list of co-eluting pairs, with recall representing the fraction recovered of 4,528 total positive interactions derived from the withheld set of merged human CORUM complexes, and precision measured using co-eluting pairs where both members of the pair are contained in the set of proteins represented in the CORUM withheld set. The top 16,655 pairs, giving a cumulative precision of 67.5% and recall of 23.0% on this withheld test set, form the high-confidence set of co-complex protein–protein interactions (blue circle). The highest-scoring interactions were clustered using the two-stage approach described in the Supplementary Methods, yielding a final set of 7,669 interactions, which form the 981 identified complexes (red circle; precision = 90.0%, recall = 20.8%).

  7. Properties of protein elution profiles.
    Extended Data Fig. 2: Properties of protein elution profiles.

    a, Distribution of global protein tissue expression pattern similarity, measured as the Pearson correlation coefficient of protein abundance across 30 human tissues23, showing markedly higher correlations for 16,468 protein–protein pairs of putative co-complex interaction partners compared to the same number of randomized pairs of proteins in the network which were not predicted to interact. b, Heat map illustrating the low to moderate cross-species Spearman’s rank correlation coefficients in the elution profiles observed between orthologous proteins during mixed-bed ion exchange chromatography under standardized conditions, highlighting the shift in absolute chromatographic retention times in different species. This variation indicates that the conservation of co-fractionation by putatively interacting proteins is not merely a trivial result stemming from fixed column-retention times. c, The degree of co-fractionation is measured as the correlation coefficient between elution profiles. Spatial proximity is calculated from the mean of residue pair distances between components of multisubunit complexes with known three-dimensional structures (see Supplementary Methods).

  8. Derivation of complexes.
    Extended Data Fig. 3: Derivation of complexes.

    a, The 2,153 proteins present in the 981 derived metazoan complexes participate in multiple assemblies (‘moonlighting’) to an extent comparable to the sharing of subunits reported for literature-derived complexes (CORUM). For comparison, we examined the 1,550 unique proteins from the full CORUM set of 1,216 human complexes passing our selection criteria for supporting evidence (‘Unmerged’) and the 1,461 unique proteins from the non-redundant set of 501 merged complexes used as the reference for splitting our training and testing sets, with some of the largest complexes removed to avoid bias in training (‘Merged’; see ‘Optimizing the two-stage clustering’ in Supplementary Methods for details). b, Schematic of 981 identified complexes containing 2,153 unique proteins. In this graphical representation, 7,669 co-complex interactions are shown as lines, and proteins as nodes. Red and green interactions were previously annotated in CORUM. Red interactions were used in training the classifier and/or clustering procedure, while green interactions were held out for validation purposes. Grey interactions were not previously annotated in CORUM.

  9. Properties of new and old proteins and complexes.
    Extended Data Fig. 4: Properties of new and old proteins and complexes.

    a, The 2,153 protein components in the conserved animal complexes tend to be more ancient than the 2,301 proteins reported in the CORUM reference complexes or in two recent large-scale protein interaction assays, based on either the 7,062 proteins found by affinity purification/mass spectrometry (AP/MS; E. L. Huttlin et al., BioGRID preprint 166968, http://thebiogrid.org/166968/publication/) or the 3,667 proteins analysed by yeast two-hybrid assays (Y2H)10. Ages are derived from OMA (Orthologous Matrix database) as in ref. 25. b, Annotation rates (mean count of annotation terms per protein) of old and new proteins in the derived complexes and pairwise PPIs, compared with proteins in the CORUM reference complex set. Old proteins (defined by OMA) from the complexes generally exhibited higher annotation rates than new proteins. c, Differential enrichment of old, mixed and metazoan-specific protein complexes for functional annotations (select GO-slim biological process terms shown, top) and protein domains (Pfam, bottom).

  10. Abundance and expression trends for proteins in complexes.
    Extended Data Fig. 5: Abundance and expression trends for proteins in complexes.

    Proteins within the identified complexes tend to be ubiquitously expressed across human tissues. a, b, Pie charts show the proportions of proteins with varying tissue expression patterns, from a recently published human tissue proteome map46, comparing the full set of 20,258 human proteins (a) with the 2,131 proteins within the identified complexes (b). Consistent with these observations, 91% of the protein components in the complexes were expressed in >15 tissues in data from a reference human proteome23, compared to less than half (46%) of the 17,294 proteins in the overall reference set (Z-test P < 0.001). c, d, The distributions of average mRNA (c, data from EBI accession E-MTAB-1733) and protein (d, data from PaxDb integrated data set, 9606-H.sapiens_whole_organism-integrated_data set) abundances for all proteins identified and those within complexes. Evolutionarily old proteins (defined by OMA as described in ref. 25 and mentioned earlier) tend towards higher abundances, even for proteins in reference complexes.

  11. Additional validation data.
    Extended Data Fig. 6: Additional validation data.

    a, Confirmation of MIB2 interactions by co-immunoprecipitation. Extract (~10 mg protein) from cultured human HCT116 cells expressing Flag-tagged MIB2 or control (WT) cells was incubated with 100 µl anti-Flag M2 resin for 4 h while gently rotating at 4 °C. After extensive washing with RIPA buffer, co-purifying proteins bound to the beads were eluted by the addition of 25 µl Laemmli loading buffer at 95 °C. Polypeptides were separated by SDS–PAGE and immunoblotted using Flag, VPS4A, VPS4B or IST1 antibodies as indicated (expanded gel images provided in Supplementary Information). b, Protein co-complex interactions reported in the CYC2008 yeast protein complex database42 are reconstructed accurately from the co-fractionation data, regardless of whether the full set of co-fractionation plus external data are used to derive protein interactions (‘All data’, see also Fig. 4b) or if the external yeast data was specifically excluded from the analyses (‘All data, excluding yeast’).

  12. Agreement of derived complexes/' molecular weights with measurement by HPLC and density centrifugation.
    Extended Data Fig. 7: Agreement of derived complexes’ molecular weights with measurement by HPLC and density centrifugation.

    a, CORUM reference complexes’ inferred molecular weights (MW) are consistent with their components’ average cumulative size-exclusion chromatograms. The molecular weight of each complex was calculated as the sum of putative component molecular weights, assuming 1:1 stoichiometry. Data from ref. 43 were analysed as in Fig. 4c and show a similar trend as for the derived complexes. b, Derived complexes’ inferred molecular weights are broadly consistent with their components’ average cumulative ultracentrifugation profiles on a sucrose density gradient. Average profiles are plotted for X. laevis orthologues, based on a preparation of haemoglobin-depleted heart and liver proteins separated on a 7–47% sucrose density gradient, as described in the Supplementary Methods.

  13. Distribution of uncharacterized proteins and novel interactions across the 981 derived complexes.
    Extended Data Fig. 8: Distribution of uncharacterized proteins and novel interactions across the 981 derived complexes.

    Complexes were sorted by median age (defined by OMA). Among 2,153 unique proteins, 293 (red) lack Gene Ontology (GO) functional annotations, while 1,756 of 7,665 co-complex interactions are novel (light green) (not listed in iRefWeb curation database).

  14. Properties of the Commander complex.
    Extended Data Fig. 9: Properties of the Commander complex.

    The automatically derived 8 subunit Commander complex (Fig. 3b) was subsequently extended to 13 subunits (COMMD1 to 10, CCDC22, CCDC93, and SH3GLB1) based on combined analysis of AP/MS (Fig. 4a), size-exclusion chromatograms43 (Fig. 4d), published pairwise interactions30, 47, 48, and analysis of elution profiles of the remaining COMM-domain-containing proteins, as shown here. Example protein elution profiles are plotted for Commander complex subunits observed from: HEK293 cell nuclear extract (a); sea urchin embryonic (5 days post-fertilization) extract (b); and fly SL2 cell nuclear extract (c); each fractionated by heparin affinity chromatography. d, Co-expression of Commander complex subunits during embryonic development of X. tropicalis (plotting mean ± s.d. of three clutches; data from ref. 49). e, Messenger RNA expression patterns of Commander complex subunits in stage 15 X. laevis embryos. Images show coordinated spatial expression in early vertebrate embryogenesis, as measured by in situ hybridization (three embryos examined). f, Knockdown of Commd2 induced marked head and eye defects in developing X. laevis. Top, Commd2 antisense knockdown significantly decreased eye size, shown for stage 38 tadpoles (from three clutches; control n = 47 animals, one eye each; ***P < 0.0001, two-sided Mann–Whitney test); phenotypes were consistent between translation blocking (MOatg; n = 60) morpholino reagents, splice site blocking (MOsp; n = 50) morpholinos, and knockdowns of interaction partner Commd3 (see Fig. 5a). Bottom, Commd2-knockdown induced altered Pax6 patterning in the embryonic eye (control n = 8 animals, two eyes each; MO n = 11). g, Commd2/3-knockdown animals show altered neural patterning. Changes in stage 15 X. laevis embryos, measured by in situ hybridization (assayed in duplicates; five embryos per treatment), seen upon knockdown but not on controls: the forebrain marker PAX6 was expanded, while the mid-brain marker EN2 was strongly reduced. Notably, while expression of KROX20/EGR1 in rhombomere R3 was shifted posteriorly, expression in R5 was strongly reduced or entirely absent. Panels in Fig. 5b are reproduced from this figure and are directly comparable. h, Confirmation of splice-blocking Commd2 morpholino activity. Images and schematic show the basis and results of RT–PCR and agarose gel electrophoresis obtained with the corresponding X. laevis knockdown tadpoles.

  15. Supporting data for BUB3 and CCDC97 experiments.
    Extended Data Fig. 10: Supporting data for BUB3 and CCDC97 experiments.

    a, Sequence alignment showing conservation of ZNF207 GLEBS domain. b, Targeted CRISPR/Cas9-induced knockout of CCDC97 in two independent lines of human HEK293 cells, as verified by western blotting (expanded gel images provided in Supplementary Information). c, Loss of CCDC97 impairs cell growth. Lines show growth curves of control versus knockout cell lines in two biological replicate assays.

References

  1. Hartwell, L. H., Hopfield, J. J., Leibler, S. & Murray, A. W. From molecular to modular cell biology. Nature 402, C47C52 (1999)
  2. Alberts, B. The cell as a collection of protein machines: Preparing the next generation of molecular biologists. Cell 92, 291294 (1998)
  3. Butland, G. et al. Interaction network containing conserved and essential protein complexes in Escherichia coli. Nature 433, 531537 (2005)
  4. Krogan, N. J. et al. Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature 440, 637643 (2006)
  5. Guruharsha, K. G. et al. A protein complex network of Drosophila melanogaster. Cell 147, 690703 (2011)
  6. Havugimana, P. C. et al. A census of human soluble protein complexes. Cell 150, 10681081 (2012)
  7. Stelzl, U. et al. A human protein-protein interaction network: a resource for annotating the proteome. Cell 122, 957968 (2005)
  8. Li, S. et al. A map of the interactome network of the metazoan C-elegans. Science 303, 540543 (2004)
  9. Hu, P. et al. Global functional atlas of Escherichia coli encompassing previously uncharacterized proteins. PLoS Biol. 7, e1000096 (2009)
  10. Rolland, T. et al. A proteome-scale map of the human interactome network. Cell 159, 12121226 (2014)
  11. Sharan, R. et al. Conserved patterns of protein interaction in multiple species. Proc. Natl Acad. Sci. USA 102, 19741979 (2005)
  12. Gandhi, T. K. B. et al. Analysis of the human protein interactome and comparison with yeast, worm and fly interaction datasets. Nature Genet. 38, 285293 (2006)
  13. Tan, K., Shlomi, T., Feizi, H., Ideker, T. & Sharan, R. Transcriptional regulation of protein complexes within and across species. Proc. Natl Acad. Sci. USA 104, 12831288 (2007)
  14. Singh, R., Xu, J. B. & Berger, B. Global alignment of multiple protein interaction networks with application to functional orthology detection. Proc. Natl Acad. Sci. USA 105, 1276312768 (2008)
  15. Yu, H. et al. Annotation transfer between genomes: protein–protein interologs and protein–DNA regulogs. Genome Res. 14, 11071118 (2004)
  16. Ideker, T. & Krogan, N. J. Differential network biology. Mol. Syst. Biol. 8, 565 (2012)
  17. Kiemer, L. & Cesareni, G. Comparative interactomics: comparing apples and pears? Trends Biotechnol. 25, 448454 (2007)
  18. von Mering, C. et al. Comparative assessment of large-scale data sets of protein–protein interactions. Nature 417, 399403 (2002)
  19. Malovannaya, A. et al. Analysis of the human endogenous coregulator complexome. Cell 145, 787799 (2011)
  20. Lee, I., Blom, U. M., Wang, P. I., Shim, J. E. & Marcotte, E. M. Prioritizing candidate disease genes by network-based boosting of genome-wide association data. Genome Res. 21, 11091121 (2011)
  21. Uhlen, M. et al. Towards a knowledge-based Human Protein Atlas. Nature Biotechnol. 28, 12481250 (2010)
  22. McKusick, V. A. Mendelian Inheritance in Man: A Catalog of Human Genes and Genetic Disorders. (Johns Hopkins Univ. Press, 1998)
  23. Kim, M. S. et al. A draft map of the human proteome. Nature 509, 575581 (2014)
  24. Rubin, G. M. et al. Comparative genomics of the eukaryotes. Science 287, 22042215 (2000)
  25. Bezginov, A., Clark, G. W., Charlebois, R. L., Dar, V. U. N. & Tillier, E. R. M. Coevolution reveals a network of human proteins originating with multicellularity. Mol. Biol. Evol. 30, 332346 (2013)
  26. Stumpf, M. P. H. et al. Estimating the size of the human interactome. Proc. Natl Acad. Sci. USA 105, 69596964 (2008)
  27. Hart, G. T., Ramani, A. K. & Marcotte, E. M. How complete are current yeast and human protein-interaction networks? Genome Biol. 7, 120 (2006)
  28. Eisenberg, E. & Levanon, E. Y. Preferential attachment in the protein network evolution. Phys. Rev. Lett. 91, 138701 (2003)
  29. Knoll, A. H. The early evolution of eukaryotes: a geological perspective. Science 256, 622627 (1992)
  30. Burstein, E. et al. COMMD proteins, a novel family of structural and functional homologs of MURR1. J. Biol. Chem. 280, 2222222232 (2005)
  31. van de Sluis, B., Rothuizen, J., Pearson, P. L., van Oost, B. A. & Wijmenga, C. Identification of a new copper metabolism gene by positional cloning in a purebred dog population. Hum. Mol. Genet. 11, 165173 (2002)
  32. McDonald, F. J. COMMD1 and ion transport proteins: what is the COMMection? Focus on “COMMD1 interacts with the COOH terminus of NKCC1 in Calu-3 airway epithelial cells to modulate NKCC1 ubiquitination”. Am. J. Physiol. Cell Physiol. 305, C129C130 (2013)
  33. Kolanczyk, M. et al. Missense variant in CCDC22 causes X-linked recessive intellectual disability with features of Ritscher-Schinzel/3C syndrome. Eur. J. Hum. Genet. 109, 16 (2014)
  34. Voineagu, I. et al. CCDC22: a novel candidate gene for syndromic X-linked intellectual disability. Mol. Psychiatry 17, 47 (2012)
  35. Toledo, C. M. et al. BuGZ is required for Bub3 stability, Bub1 kinetochore function, and chromosome alignment. Dev. Cell 28, 282294 (2014)
  36. Kotake, Y. et al. Splicing factor SF3b as a target of the antitumor natural product pladienolide. Nature Chem. Biol. 3, 570575 (2007)
  37. Croft, D. et al. The Reactome pathway knowledgebase. Nucleic Acids Res. 42, D472D477 (2014)
  38. Ovádi, J. Cell Architecture and Metabolite Channeling. (RG Landes Company, 1995)
  39. Ruepp, A. et al. CORUM: the comprehensive resource of mammalian protein complexes-2009. Nucleic Acids Res. 38, D497D501 (2010)
  40. Warde-Farley, D. et al. The GeneMANIA prediction server: biological network integration for gene prioritization and predicting gene function. Nucleic Acids Res. 38, W214W220 (2010)
  41. Franceschini, A. et al. STRING v9.1: protein-protein interaction networks, with increased coverage and integration. Nucleic Acids Res. 41, D808D815 (2013)
  42. Pu, S., Wong, J., Turner, B., Cho, E. & Wodak, S. J. Up-to-date catalogues of yeast protein complexes. Nucleic Acids Res. 37, 825831 (2009)
  43. Kirkwood, K. J., Ahmad, Y., Larance, M. & Lamond, A. I. Characterization of native protein complexes and protein isoform variation using size-fractionation-based quantitative proteomics. Mol. Cell. Proteomics 12, 38513873 (2013)
  44. Turner, B. et al. iRefWeb: interactive analysis of consolidated protein interaction data and their supporting evidence. Database 2010, baq023 (2010)
  45. Stark, C. et al. BioGRID: a general repository for interaction datasets. Nucleic Acids Res. 34, D535D539 (2006)
  46. Uhlen, M. et al. Tissue-based map of the human proteome. Science 347, 6220 (2015)
  47. de Bie, P. et al. Characterization of COMMD protein–protein interactions in NF-κB signalling. Biochem. J. 398, 6371 (2006)
  48. Phillips-Krawczak, C. A. et al. COMMD1 is linked to the WASH complex and regulates endosomal trafficking of the copper transporter ATP7A. Mol. Biol. Cell 26, 91103 (2015)
  49. Yanai, I., Peshkin, L., Jorgensen, P. & Kirschner, M. W. Mapping gene expression in two Xenopus species: evolutionary constraints and developmental flexibility. Dev. Cell 20, 483496 (2011)

Download references

Author information

  1. These authors contributed equally to this work.

    • Cuihong Wan &
    • Blake Borgeson

Affiliations

  1. Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario M5S 3E1, Canada

    • Cuihong Wan,
    • Sadhna Phanse,
    • Olga Kagan,
    • Julian Kwan,
    • Zuyao Ni,
    • Snejana Stoilova,
    • Pierre C. Havugimana,
    • Xinghua Guo,
    • Jack Greenblatt &
    • Andrew Emili
  2. Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, Texas 78712, USA

    • Cuihong Wan,
    • Blake Borgeson,
    • Fan Tu,
    • Kevin Drew,
    • Ophelia Papoulas,
    • Daniel R. Boutz,
    • John B. Wallingford &
    • Edward M. Marcotte
  3. Department of Medical Biophysics, Toronto, Ontario M5G 1L7, Canada

    • Greg Clark,
    • Alexandr Bezginov &
    • Elisabeth R. Tillier
  4. Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 1A8, Canada

    • Xuejian Xiong,
    • Julian Kwan,
    • Kyle Chessman,
    • Graham Cromar,
    • Jack Greenblatt,
    • W. Brent Derry,
    • John Parkinson &
    • Andrew Emili
  5. Hospital for Sick Children, Toronto, Ontario M5G 1X8, Canada

    • Xuejian Xiong,
    • Kyle Chessman,
    • Swati Pal,
    • Graham Cromar,
    • W. Brent Derry &
    • John Parkinson
  6. Department of Biochemistry, University of Regina, Regina, Saskatchewan S4S 0A2, Canada

    • Ramy H. Malty &
    • Mohan Babu
  7. Max Planck Institute of Molecular Cell Biology and Genetics, 01307 Dresden, Germany

    • Mihail Sarov
  8. Department of Molecular Biosciences, University of Texas at Austin, Austin, Texas 78712, USA

    • John B. Wallingford &
    • Edward M. Marcotte

Contributions

A.E. and E.M.M. designed and co-supervised the project. C.W. performed proteomic experiments, aided by P.C.H. B.B. coordinated data analysis, aided by S.Ph., K.D. and S.S., and guided by E.M.M. E.R.T., G.Cl., A.B., J.P., X.X., K.C., G.Cr., C.W. and S.Ph. analysed network and conservation data. C.W., F.T., O.K., J.K., S.Pa., O.P., Z.N., D.R.B., X.G., R.H.M, M.S., J.G., M.B., W.B.D. and J.B.W. contributed validation experiments. S.Ph. designed the web portal. C.W., B.B., E.M.M. and A.E. drafted the manuscript. All authors discussed results and contributed edits.

Competing financial interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to:

Author details

Extended data figures and tables

Extended Data Figures

  1. Extended Data Figure 1: Performance measures. (381 KB)

    a, Performance benchmarks, measuring the precision and recall of our method and data in identifying known co-complex interactions from a withheld reference set of annotated human complexes (from CORUM39; as in Fig. 2b). Fivefold cross-validation against this withheld set shows strong performance gains, beyond a baseline achieved using only human and mouse co-fractionation data along with additional evidence from independent protein interaction screens5, 19 and a functional gene network20 (far-left curve), made by integrating co-fractionation data from the additional non-human animal species (as indicated). ‘All data’ and ‘Fractionation data only’ curves include biochemical fractionation data from all five input species: human, mouse, urchin, fly and worm; the latter curve omits all external data. In all cases, at least two species were required to show supporting biochemical evidence. Recall refers to the fraction of 4,528 total positive interactions derived from the withheld human CORUM complexes. b, All 16,655 interactions were identified at least in two species, half (49%, 8,121) found in three or more species. c, Among these high-confidence co-complex interactions, 8,981 (54%) were not reported in iRefWeb44 (v13.0), BioGRID45 (v3.2.119) or CORUM reference (Supplementary Table 2) for any of the five input species or in yeast; half (46%, 4,128) of these novel co-complex interactions display evidence of co-fractionation in three or more species. d, Final precision/recall performance on withheld interaction test set. A support vector machine classifier was trained using interactions derived from our training set of CORUM complexes, then ~1 million protein pairs found to co-elute in at least two of the five input species were scored by the classifier. Black curve shows precision and recall for ranked list of co-eluting pairs, with recall representing the fraction recovered of 4,528 total positive interactions derived from the withheld set of merged human CORUM complexes, and precision measured using co-eluting pairs where both members of the pair are contained in the set of proteins represented in the CORUM withheld set. The top 16,655 pairs, giving a cumulative precision of 67.5% and recall of 23.0% on this withheld test set, form the high-confidence set of co-complex protein–protein interactions (blue circle). The highest-scoring interactions were clustered using the two-stage approach described in the Supplementary Methods, yielding a final set of 7,669 interactions, which form the 981 identified complexes (red circle; precision = 90.0%, recall = 20.8%).

  2. Extended Data Figure 2: Properties of protein elution profiles. (133 KB)

    a, Distribution of global protein tissue expression pattern similarity, measured as the Pearson correlation coefficient of protein abundance across 30 human tissues23, showing markedly higher correlations for 16,468 protein–protein pairs of putative co-complex interaction partners compared to the same number of randomized pairs of proteins in the network which were not predicted to interact. b, Heat map illustrating the low to moderate cross-species Spearman’s rank correlation coefficients in the elution profiles observed between orthologous proteins during mixed-bed ion exchange chromatography under standardized conditions, highlighting the shift in absolute chromatographic retention times in different species. This variation indicates that the conservation of co-fractionation by putatively interacting proteins is not merely a trivial result stemming from fixed column-retention times. c, The degree of co-fractionation is measured as the correlation coefficient between elution profiles. Spatial proximity is calculated from the mean of residue pair distances between components of multisubunit complexes with known three-dimensional structures (see Supplementary Methods).

  3. Extended Data Figure 3: Derivation of complexes. (656 KB)

    a, The 2,153 proteins present in the 981 derived metazoan complexes participate in multiple assemblies (‘moonlighting’) to an extent comparable to the sharing of subunits reported for literature-derived complexes (CORUM). For comparison, we examined the 1,550 unique proteins from the full CORUM set of 1,216 human complexes passing our selection criteria for supporting evidence (‘Unmerged’) and the 1,461 unique proteins from the non-redundant set of 501 merged complexes used as the reference for splitting our training and testing sets, with some of the largest complexes removed to avoid bias in training (‘Merged’; see ‘Optimizing the two-stage clustering’ in Supplementary Methods for details). b, Schematic of 981 identified complexes containing 2,153 unique proteins. In this graphical representation, 7,669 co-complex interactions are shown as lines, and proteins as nodes. Red and green interactions were previously annotated in CORUM. Red interactions were used in training the classifier and/or clustering procedure, while green interactions were held out for validation purposes. Grey interactions were not previously annotated in CORUM.

  4. Extended Data Figure 4: Properties of new and old proteins and complexes. (209 KB)

    a, The 2,153 protein components in the conserved animal complexes tend to be more ancient than the 2,301 proteins reported in the CORUM reference complexes or in two recent large-scale protein interaction assays, based on either the 7,062 proteins found by affinity purification/mass spectrometry (AP/MS; E. L. Huttlin et al., BioGRID preprint 166968, http://thebiogrid.org/166968/publication/) or the 3,667 proteins analysed by yeast two-hybrid assays (Y2H)10. Ages are derived from OMA (Orthologous Matrix database) as in ref. 25. b, Annotation rates (mean count of annotation terms per protein) of old and new proteins in the derived complexes and pairwise PPIs, compared with proteins in the CORUM reference complex set. Old proteins (defined by OMA) from the complexes generally exhibited higher annotation rates than new proteins. c, Differential enrichment of old, mixed and metazoan-specific protein complexes for functional annotations (select GO-slim biological process terms shown, top) and protein domains (Pfam, bottom).

  5. Extended Data Figure 5: Abundance and expression trends for proteins in complexes. (204 KB)

    Proteins within the identified complexes tend to be ubiquitously expressed across human tissues. a, b, Pie charts show the proportions of proteins with varying tissue expression patterns, from a recently published human tissue proteome map46, comparing the full set of 20,258 human proteins (a) with the 2,131 proteins within the identified complexes (b). Consistent with these observations, 91% of the protein components in the complexes were expressed in >15 tissues in data from a reference human proteome23, compared to less than half (46%) of the 17,294 proteins in the overall reference set (Z-test P < 0.001). c, d, The distributions of average mRNA (c, data from EBI accession E-MTAB-1733) and protein (d, data from PaxDb integrated data set, 9606-H.sapiens_whole_organism-integrated_data set) abundances for all proteins identified and those within complexes. Evolutionarily old proteins (defined by OMA as described in ref. 25 and mentioned earlier) tend towards higher abundances, even for proteins in reference complexes.

  6. Extended Data Figure 6: Additional validation data. (186 KB)

    a, Confirmation of MIB2 interactions by co-immunoprecipitation. Extract (~10 mg protein) from cultured human HCT116 cells expressing Flag-tagged MIB2 or control (WT) cells was incubated with 100 µl anti-Flag M2 resin for 4 h while gently rotating at 4 °C. After extensive washing with RIPA buffer, co-purifying proteins bound to the beads were eluted by the addition of 25 µl Laemmli loading buffer at 95 °C. Polypeptides were separated by SDS–PAGE and immunoblotted using Flag, VPS4A, VPS4B or IST1 antibodies as indicated (expanded gel images provided in Supplementary Information). b, Protein co-complex interactions reported in the CYC2008 yeast protein complex database42 are reconstructed accurately from the co-fractionation data, regardless of whether the full set of co-fractionation plus external data are used to derive protein interactions (‘All data’, see also Fig. 4b) or if the external yeast data was specifically excluded from the analyses (‘All data, excluding yeast’).

  7. Extended Data Figure 7: Agreement of derived complexes’ molecular weights with measurement by HPLC and density centrifugation. (515 KB)

    a, CORUM reference complexes’ inferred molecular weights (MW) are consistent with their components’ average cumulative size-exclusion chromatograms. The molecular weight of each complex was calculated as the sum of putative component molecular weights, assuming 1:1 stoichiometry. Data from ref. 43 were analysed as in Fig. 4c and show a similar trend as for the derived complexes. b, Derived complexes’ inferred molecular weights are broadly consistent with their components’ average cumulative ultracentrifugation profiles on a sucrose density gradient. Average profiles are plotted for X. laevis orthologues, based on a preparation of haemoglobin-depleted heart and liver proteins separated on a 7–47% sucrose density gradient, as described in the Supplementary Methods.

  8. Extended Data Figure 8: Distribution of uncharacterized proteins and novel interactions across the 981 derived complexes. (309 KB)

    Complexes were sorted by median age (defined by OMA). Among 2,153 unique proteins, 293 (red) lack Gene Ontology (GO) functional annotations, while 1,756 of 7,665 co-complex interactions are novel (light green) (not listed in iRefWeb curation database).

  9. Extended Data Figure 9: Properties of the Commander complex. (438 KB)

    The automatically derived 8 subunit Commander complex (Fig. 3b) was subsequently extended to 13 subunits (COMMD1 to 10, CCDC22, CCDC93, and SH3GLB1) based on combined analysis of AP/MS (Fig. 4a), size-exclusion chromatograms43 (Fig. 4d), published pairwise interactions30, 47, 48, and analysis of elution profiles of the remaining COMM-domain-containing proteins, as shown here. Example protein elution profiles are plotted for Commander complex subunits observed from: HEK293 cell nuclear extract (a); sea urchin embryonic (5 days post-fertilization) extract (b); and fly SL2 cell nuclear extract (c); each fractionated by heparin affinity chromatography. d, Co-expression of Commander complex subunits during embryonic development of X. tropicalis (plotting mean ± s.d. of three clutches; data from ref. 49). e, Messenger RNA expression patterns of Commander complex subunits in stage 15 X. laevis embryos. Images show coordinated spatial expression in early vertebrate embryogenesis, as measured by in situ hybridization (three embryos examined). f, Knockdown of Commd2 induced marked head and eye defects in developing X. laevis. Top, Commd2 antisense knockdown significantly decreased eye size, shown for stage 38 tadpoles (from three clutches; control n = 47 animals, one eye each; ***P < 0.0001, two-sided Mann–Whitney test); phenotypes were consistent between translation blocking (MOatg; n = 60) morpholino reagents, splice site blocking (MOsp; n = 50) morpholinos, and knockdowns of interaction partner Commd3 (see Fig. 5a). Bottom, Commd2-knockdown induced altered Pax6 patterning in the embryonic eye (control n = 8 animals, two eyes each; MO n = 11). g, Commd2/3-knockdown animals show altered neural patterning. Changes in stage 15 X. laevis embryos, measured by in situ hybridization (assayed in duplicates; five embryos per treatment), seen upon knockdown but not on controls: the forebrain marker PAX6 was expanded, while the mid-brain marker EN2 was strongly reduced. Notably, while expression of KROX20/EGR1 in rhombomere R3 was shifted posteriorly, expression in R5 was strongly reduced or entirely absent. Panels in Fig. 5b are reproduced from this figure and are directly comparable. h, Confirmation of splice-blocking Commd2 morpholino activity. Images and schematic show the basis and results of RT–PCR and agarose gel electrophoresis obtained with the corresponding X. laevis knockdown tadpoles.

  10. Extended Data Figure 10: Supporting data for BUB3 and CCDC97 experiments. (176 KB)

    a, Sequence alignment showing conservation of ZNF207 GLEBS domain. b, Targeted CRISPR/Cas9-induced knockout of CCDC97 in two independent lines of human HEK293 cells, as verified by western blotting (expanded gel images provided in Supplementary Information). c, Loss of CCDC97 impairs cell growth. Lines show growth curves of control versus knockout cell lines in two biological replicate assays.

Supplementary information

PDF files

  1. Supplementary Information (389 KB)

    This file contains Supplementary Methods and Data and additional references.

  2. Supplementary Information (1.7 MB)

    This file contains the Western blots gels for Extended Data Figures 6a and 10b.

Zip files

  1. Supplementary Tables (16.1 MB)

    This file contains Supplementary Tables 1-9 as follows: (1) Sample information (2) Human protein interactions + interologs detected in other 8 experimentally studied species + CORUM interaction reference standards (3) List co-complex interactions projected for 122 sequenced eukaryotic species (4) Final set of (981) conserved animal protein complexes (5) Protein age and conservation profile across 122 sequenced eukaryotic species (6) GOSlim domain, disease and phenotype enrichment results (7) Human disease annotations (8) Consecutive pathway and metabolic enzymes and (9) 36 common metabolites excluded from Recon2.

Additional data