Abstract
Protein–protein interactions (PPIs) drive cellular processes and responses to environmental cues, reflecting the cellular state. Here we develop Tapioca, an ensemble machine learning framework for studying global PPIs in dynamic contexts. Tapioca predicts de novo interactions by integrating mass spectrometry interactome data from thermal/ion denaturation or cofractionation workflows with protein properties and tissue-specific functional networks. Focusing on the thermal proximity coaggregation method, we improved the experimental workflow. Finely tuned thermal denaturation afforded increased throughput, while cell lysis optimization enhanced protein detection from different subcellular compartments. The Tapioca workflow was next leveraged to investigate viral infection dynamics. Temporal PPIs were characterized during the reactivation from latency of the oncogenic Kaposi’s sarcoma-associated herpesvirus. Together with functional assays, NUCKS was identified as a proviral hub protein, and a broader role was uncovered by integrating PPI networks from alpha- and betaherpesvirus infections. Altogether, Tapioca provides a web-accessible platform for predicting PPIs in dynamic contexts.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
The mass spectrometry proteomics data reported in this paper, excluding the PRM data, have been deposited in the ProteomeXchange Consortium78 via the PRIDE79 partner repository with the dataset identifier PXD041152 and 10.6019/PXD041152. The PRM data were uploaded to Panorama80 and can be accessed at panoramaweb.org/HfwV6S.url. All Tapioca predictions can be downloaded from (and scores ≥0.15 viewed) at tapioca.princeton.edu/. Source data are provided with this paper.
Code availability
The code to run or modify Tapioca is provided at github.com/FunctionLab/tapioca and on Code Ocean (codeocean.com/capsule/7217908). There are no restrictions on access to this code.
References
Braun, P. & Gingras, A.-C. History of protein–protein interactions: from egg-white to complex networks. Proteomics 12, 1478–1498 (2012).
Taylor, I. W. & Wrana, J. L. Protein interaction networks in medicine and disease. Proteomics 12, 1706–1716 (2012).
Tsitsiridis, G. et al. CORUM: the comprehensive resource of mammalian protein complexes–2022. Nucleic Acids Res. 51, D539–D545 (2023).
Stark, C. et al. BioGRID: a general repository for interaction datasets. Nucleic Acids Res. 34, D535–D539 (2006).
Szklarczyk, D. et al. The STRING database in 2021: customizable protein–protein networks, and functional characterization of user-uploaded gene/measurement sets. Nucleic Acids Res. 49, D605–D612 (2021).
Orchard, S. et al. The MIntAct project—IntAct as a common curation platform for 11 molecular interaction databases. Nucleic Acids Res. 42, D358–D363 (2014).
Jean Beltran, P. M., Federspiel, J. D., Sheng, X. & Cristea, I. M. Proteomics and integrative omic approaches for understanding host–pathogen interactions and infectious diseases. Mol. Syst. Biol. 13, 922 (2017).
Greco, T. M., Kennedy, M. A. & Cristea, I. M. Proteomic technologies for deciphering local and global protein interactions. Trends Biochem. Sci. 45, 454–455 (2020).
Truong, K. & Ikura, M. The use of FRET imaging microscopy to detect protein–protein interactions and protein conformational changes in vivo. Curr. Opin. Struct. Biol. 11, 573–578 (2001).
Brückner, A., Polge, C., Lentze, N., Auerbach, D. & Schlattner, U. Yeast two-hybrid, a powerful tool for systems biology. Int. J. Mol. Sci. 10, 2763–2788 (2009).
Yu, X., Petritis, B. & LaBaer, J. Advancing translational research with next-generation protein microarrays. Proteomics 16, 1238–1250 (2016).
Dionne, U. & Gingras, A.-C. Proximity-dependent biotinylation approaches to explore the dynamic compartmentalized proteome. Front. Mol. Biosci. 9, 852911 (2022).
Miteva, Y. V., Budayeva, H. G. & Cristea, I. M. Proteomics-based methods for discovery, quantification, and validation of protein–protein interactions. Anal. Chem. 85, 749–768 (2013).
Fossati, A. et al. PCprophet: a framework for protein complex prediction and differential analysis using proteomic data. Nat. Methods 18, 520–527 (2021).
Heusel, M. et al. Complex-centric proteome profiling by SEC-SWATH-MS. Mol. Syst. Biol. 15, e8438 (2019).
Hu, L. Z. et al. EPIC: software toolkit for elution profile-based inference of protein complexes. Nat. Methods 16, 737–742 (2019).
Skinnider, M. A. & Foster, L. J. Meta-analysis defines principles for the design and analysis of co-fractionation mass spectrometry experiments. Nat. Methods 18, 806–815 (2021).
Franken, H. et al. Thermal proteome profiling for unbiased identification of direct and indirect drug targets using multiplexed quantitative mass spectrometry. Nat. Protoc. 10, 1567–1593 (2015).
Mateus, A., Määttä, T. A. & Savitski, M. M. Thermal proteome profiling: unbiased assessment of protein state through heat-induced stability changes. Proteome Sci. 15, 13 (2017).
Savitski, M. M. et al. Tracking cancer drugs in living cells by thermal profiling of the proteome. Science 346, 1255784 (2014).
Tan, C. S. H. et al. Thermal proximity coaggregation for system-wide profiling of protein complex dynamics in cells. Science 359, 1170–1177 (2018).
Beusch, C. M., Sabatier, P. & Zubarev, R. A. Ion-based proteome-integrated solubility alteration assays for systemwide profiling of protein–molecule interactions. Anal. Chem. 94, 7066–7074 (2022).
Arias, C. et al. KSHV 2.0: a comprehensive annotation of the Kaposi’s sarcoma-associated herpesvirus genome using next-generation sequencing reveals novel genomic and functional features. PLoS Pathog. 10, e1003847 (2014).
Davis, Z. H. et al. Global mapping of herpesvirus-host protein complexes reveals a transcription strategy for late genes. Mol. Cell 57, 349–360 (2015).
Wen, K. W. & Damania, B. Kaposi sarcoma-associated herpesvirus (KSHV): molecular biology and oncogenesis. Cancer Lett. 289, 140–150 (2010).
Justice, J. L. et al. Systematic profiling of protein complex dynamics reveals DNA-PK phosphorylation of IFI16 en route to herpesvirus immunity. Sci. Adv. 7, eabg6680 (2021).
Hashimoto, Y., Sheng, X., Murray-Nerger, L. A. & Cristea, I. M. Temporal dynamics of protein complex formation and dissociation during human cytomegalovirus infection. Nat. Commun. 11, 806 (2020).
Selkrig, J. et al. SARS-CoV-2 infection remodels the host protein thermal stability landscape. Mol. Syst. Biol. 17, e10188 (2021).
Mistry, J. et al. Pfam: the protein families database in 2021. Nucleic Acids Res. 49, D412–D419 (2021).
Menon, R. et al. Single cell transcriptomics identifies focal segmental glomerulosclerosis remission endothelial biomarker. JCI Insight 5, e133267 (2020).
Meyer, M. et al. Attenuated activation of pulmonary immune cells in mRNA-1273-vaccinated hamsters after SARS-CoV-2 infection. J. Clin. Invest. 131, e148036 (2021).
Zhou, J. et al. Whole-genome deep learning analysis identifies contribution of noncoding mutations to autism risk. Nat. Genet. 51, 973–980 (2019).
Chen, X. et al. Tissue-specific enhancer functional networks for associating distal regulatory regions to disease. Cell Syst. 12, 353–362.e6 (2021).
Krishnan, A. et al. Genome-wide prediction and functional characterization of the genetic basis of autism spectrum disorder. Nat. Neurosci. 19, 1454–1462 (2016).
Roussarie, J.-P. et al. Selective neuronal vulnerability in Alzheimer’s disease: a network-based analysis. Neuron 107, 821–835.e12 (2020).
Zhang, Z. et al. Blood RNA alternative splicing events as diagnostic biomarkers for infectious disease. Cell Rep. Methods 3, 100395 (2023).
George, A. L. et al. Comparison of quantitative mass spectrometric methods for drug target identification by thermal proteome profiling. J. Proteome Res. 22, 2629–2640 (2023).
Becher, I. et al. Pervasive protein thermal stability variation during the cell cycle. Cell 173, 1495–1507.e18 (2018).
Skinnider, M. A. et al. An atlas of protein–protein interactions across mouse tissues. Cell 184, 4073–4089.e17 (2021).
Heusel, M. et al. A global screen for assembly state changes of the mitotic proteome by SEC-SWATH-MS. Cell Syst. 10, 133–155.e6 (2020).
Stacey, R. G., Skinnider, M. A., Chik, J. H. L. & Foster, L. J. Context-specific interactions in literature-curated protein interaction databases. BMC Genomics 19, 758 (2018).
Gillespie, M. et al. The reactome pathway knowledgebase 2022. Nucleic Acids Res. 50, D687–D692 (2022).
Banerjee, A., Lee, A., Campbell, E. & MacKinnon, R. Structure of a pore-blocking toxin in complex with a eukaryotic voltage-dependent K+ channel. eLife 2, e00594 (2013).
Luche, S., Santoni, V. & Rabilloud, T. Evaluation of nonionic and zwitterionic detergents as membrane protein solubilizers in two-dimensional electrophoresis. Proteomics 3, 249–253 (2003).
Betsinger, C. N. et al. The human cytomegalovirus protein pUL13 targets mitochondrial cristae architecture to increase cellular respiration during infection. Proc. Natl Acad. Sci. USA 118, e2101675118 (2021).
Federspiel, J. D., Greco, T. M., Lum, K. K. & Cristea, I. M. Hdac4 interactions in Huntington’s disease viewed through the prism of multiomics. Mol. Cell. Proteom. 18, S92–S113 (2019).
Liuzzi, M. et al. A potent peptidomimetic inhibitor of HSV ribonucleotide reductase with antiviral activity in vivo. Nature 372, 695–698 (1994).
Newcomb, W. W. & Brown, J. C. Structure and capsid association of the herpesvirus large tegument protein UL36. J. Virol. 84, 9408–UL9414 (2010).
Owen, D. J., Crump, C. M. & Graham, S. C. Tegument assembly and secondary envelopment of alphaherpesviruses. Viruses 7, 5084–5114 (2015).
Scrima, N. et al. Insights into herpesvirus tegument organization from structural analyses of the 970 central residues of HSV-1 UL36 protein. J. Biol. Chem. 290, 8820–8833 (2015).
Vittone, V. et al. Determination of interactions between tegument proteins of herpes simplex virus type 1. J. Virol. 79, 9566–9571 (2005).
Draganova, E. B., Valentin, J. & Heldwein, E. E. The ins and outs of herpesviral capsids: divergent structures and assembly mechanisms across the three subfamilies. Viruses 13, 1913 (2021).
Grzesik, P. et al. Incorporation of the Kaposi’s sarcoma-associated herpesvirus capsid vertex-specific component (CVSC) into self-assembled capsids. Virus Res. 236, 9–13 (2017).
Huang, P., Cai, Y., Zhao, B. & Cui, L. Roles of NUCKS1 in diseases: susceptibility, potential biomarker, and regulatory mechanisms. BioMed. Res. Int. 2018, e7969068 (2018).
Østvold, A. C., Grundt, K. & Wiese, C. NUCKS1 is a highly modified, chromatin-associated protein involved in a diverse set of biological and pathophysiological processes. Biochem. J. 479, 1205–1220 (2022).
Kim, H.-Y. et al. NUCKS1, a novel Tat coactivator, plays a crucial role in HIV-1 replication by increasing Tat-mediated viral transcription on the HIV-1 LTR promoter. Retrovirology 11, 67 (2014).
Cannon, J. S., Hamzeh, F., Moore, S., Nicholas, J. & Ambinder, R. F. Human herpesvirus 8-encoded thymidine kinase and phosphotransferase homologues confer sensitivity to ganciclovir. J. Virol. 73, 4786–4793 (1999).
Jordan, A. & Reichard, P. Ribonucleotide reductases. Annu. Rev. Biochem. 67, 71–98 (1998).
Kuang, E., Tang, Q., Maul, G. G. & Zhu, F. Activation of p90 ribosomal S6 kinase by ORF45 of Kaposi’s sarcoma-associated herpesvirus and its role in viral lytic replication. J. Virol. 82, 1838–1850 (2008).
Licata, L. et al. MINT, the molecular interaction database: 2012 update. Nucleic Acids Res. 40, D857–D861 (2012).
Hernández Durán, A., Grünewald, K. & Topf, M. Conserved central intraviral protein interactome of the Herpesviridae family. mSystems 4, e00295-19 (2019).
Jarzab, A. et al. Meltome atlas—thermal proteome stability across the tree of life. Nat. Methods 17, 495–503 (2020).
Wong, A. K., Krishnan, A. & Troyanskaya, O. G. GIANT 2.0: genome-scale integrated analysis of gene networks in tissues. Nucleic Acids Res. 46, W65–W70 (2018).
Harris, C. R. et al. Array programming with NumPy. Nature 585, 357–362 (2020).
Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
Virtanen, P. et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods 17, 261–272 (2020).
McKinney, W. Data structures for statistical computing in Python. In Proc. 9th Python in Science Conference (eds van der Walt, S. & Millman, J.) 56–61 (2010); https://doi.org/10.25080/Majora-92bf1922-00a
The Pandas Development Team. pandas-dev/pandas: Pandas (v.2.2.0rc0). Zenodo https://doi.org/10.5281/zenodo.3509134 (2023).
Cock, P. J. A. et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25, 1422–1423 (2009).
Uhlén, M. et al. Tissue-based map of the human proteome. Science 347, 1260419 (2015).
Kennedy, M. A. et al. A TRUSTED targeted mass spectrometry assay for pan-herpesvirus protein detection. Cell Rep. 39, 110810 (2022).
MacLean, B. et al. Skyline: an open source document editor for creating and analyzing targeted proteomics experiments. Bioinformatics 26, 966–968 (2010).
Mateus, A. et al. Thermal proteome profiling for interrogating protein interactions. Mol. Syst. Biol. 16, e9232 (2020).
Diner, B. A., Lum, K. K., Javitt, A. & Cristea, I. M. Interactions of the antiviral factor interferon gamma-inducible protein 16 (IFI16) mediate immune signaling and herpes simplex virus-1 immunosuppression. Mol. Cell. Proteom. 14, 2341–2356 (2015).
Waskom, M. L. seaborn: statistical data visualization. J. Open Source Softw. 6, 3021 (2021).
Hunter, J. D. Matplotlib: a 2D graphics environment. Comput. Sci. Eng. 9, 90–95 (2007).
Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003).
Deutsch, E. W. et al. The ProteomeXchange consortium in 2020: enabling ‘big data’ approaches in proteomics. Nucleic Acids Res. 48, D1145–D1152 (2020).
Perez-Riverol, Y. et al. The PRIDE database resources in 2022: a hub for mass spectrometry-based proteomics evidences. Nucleic Acids Res. 50, D543–D552 (2022).
Sharma, V. et al. Panorama Public: a public repository for quantitative data sets processed in Skyline. Mol. Cell. Proteom. 17, 1239–1244 (2018).
Acknowledgements
We thank J. E. Hutton III for mass spectrometry support, J. L. Justice for the creation of the Tapioca logo, and all members of the Cristea laboratory and Troyanskaya laboratory at Princeton University and the Flatiron Institute for helpful discussions. We are grateful for funding from the NIH NIGMS (R01GM114141, I.M.C.; T32GM007388, M.D.T.; R01GM071966, O.G.T.), NIAID (AI174515, I.M.C.), NHGRI (R01HG005998, O.G.T.), Stand Up To Cancer Convergence (3.1416, I.M.C.), Simons Foundation grant (395506) to O.G.T., the CHDI Foundation (I.M.C.) and a Pre-Doctoral Fellowship from the New Jersey Commission on Cancer Research (COCR23PRF019) to M.D.T. This material is based on work supported by the National Science Foundation Graduate Research Fellowship Program under grant no. DGE-2039656 (awarded to T.J.R.). Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.
Author information
Authors and Affiliations
Contributions
T.J.R., O.G.T. and I.M.C. designed the research. T.J.R. designed and developed Tapioca. T.J.R. performed TPCA experiments, M.D.T. performed virology assays and T.J.R. and M.D.T. performed IP experiments. T.J.R. performed all data analysis. A.T. developed the Tapioca website. Paper writing was done by T.J.R., M.D.T., O.G.T. and I.M.C.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Methods thanks Ben Collins, Mikhail Savitski and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available. Primary Handling Editor: Arunima Singh, in collaboration with the Nature Methods team.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Evaluation of Tapioca, its sub-models, and Euclidean distance (Euc.) based PPI predictions.
a, 5-fold cross validation based evaluation by the area under a precision recall curve (AUPRC), n = 48 biologically independent samples. For boxplots, boxes show median, 25th and 75th percentile values, with the line within the box representing the median value, whiskers represent +/− 1.5 interquartile range, and points are outliers. b, 5-fold cross validation-based evaluation by one minus the false positivity rate (1-FPR), n = 48 biologically independent samples. For boxplots, boxes show median, 25th and 75th percentile values, with the line within the box representing the median value, whiskers represent +/− 1.5 interquartile range, and points are outliers. c, d,5-fold cross validation-based evaluation by (C) one minus the false positivity rate (1-FPR) and (D) the area under a precision recall curve (AUPRC). Logistic regression outperforms other machine learning methods by 1-FPR. Although random forest outperforms other sub-models by AUPRC, Tapioca integration of random forest sub-models gives inconsistent results, showing this machine learning method fails to generalize. Boxplot definitions can be found in the methods. For both figures C and D, n = 48 biologically independent samples. For boxplots, boxes show median, 25th and 75th percentile values, with the line within the box representing the median value, whiskers represent +/− 1.5 interquartile range, and points are outliers. e, f, Performance of (E) naïve bayes and (F) random forest based models. The standard deviation (SD) of the difference in PPI scores at 0 and 15 hours post infection in HSV-1 infection was used to assess dynamics. The random forest models often showed negative correlation with the dynamics data used in Tapioca sub-model integration, resulting in a high SD for the random forest Tapioca model. Given these negative correlations, it is likely that the random forest Tapioca model is failing to capture the true system dynamics, instead predicting inappropriate and artificial fluctuations. The 1-FPR values shown are based on only the values from the 0 and 15 hours post infection in HSV-1 datasets (that is, not all 48 datasets used in Fig. 1, Extended Data Fig. 1a–d).
Extended Data Fig. 2 KSHV TPCA data quality, number of predicted PPIs, and CORUM complex assembly.
a, Density plots comparing the log abundance for all temperatures, all time points, and all proteins (observed in both compared replicates) compared between all possible replicate pairs. b, The number of total predicted protein-protein interactions and the number of protein-protein interactions involving at least one KSHV protein at each time point. c, Depicted are 355 CORUM complexes (of a total of 1250 detected CORUM complexes) identified as assembled during at least one time point during KSHV reactivation from latency. Complexes are included in the heatmap if they were always observed with ≥ 50% subunit detection at all time points and achieved a minimum Tapioca score of 0.4 in at least one time point. The list of complexes within the heatmap can be found in Supplementary Table 8a.
Extended Data Fig. 3 Temporal dynamics of KSHV biological process interactions in KSHV reactivation of latency.
K-means clusters of the temporal dynamics of GO term enrichment of host interactors of KSHV proteins; the solid line represents the mean value, and the shaded region represents the 95% confidence interval.
Extended Data Fig. 4 NUCKS IP-MS Experiment.
Volcano plots of NUCKS IP-MS experiments performed using different combinations of two antibodies, two lysis buffers, and two strengths of nucleases. Interactors that pass significance thresholds, represented by dotted lines (fold change ≥ 2, p-value ≤ 0.05), are colored red (non-interactors that pass fold change ≤ −2, p-value ≤ 0.05 are also colored red). KSHV proteins are colored blue, regardless of significance. NUCKS is shown as a black ‘X’. All KSHV proteins and the top five interactors (ranked by p-value) are labeled. There was a wide distribution in the number of statistically significant PPIs identified by the different IP conditions, ranging from 1 to 304 proteins. In total, 392 interactors of NUCKS were identified. Significance was determined by two-tailed Student’s t test.
Extended Data Fig. 5 Comparison of IP-MS and Tapioca predicted NUCKS interactors.
a, UpSet plot comparing predicted NUCKS interactors by each IP-MS condition and Tapioca. Between 0% and 31% of interactors were commonly identified between any given pair of IP-MS conditions. The results of IPs are known to be heavily influenced by the lysis conditions and antibodies used, thus it is unsurprising that differential interactomes were identified between conditions. Nevertheless, the observed interactomes were largely thematically similar, capturing NUCKS interactors involved in chromosome regulation, RNA transport and processing, and DNA biosynthetic processes. Thus, it would seem that these IP conditions are leading to the capture of different subpopulations from the same overall NUCKS interactome. Significance was determined by two-tailed Student’s t test. b, Tapioca score versus the negative base 10 log of the IP-MS p-value for all proteins predicted to interact with NUCKS at 48 HPR that were detected by both Tapioca and IP-MS. We observed a little correlation (r = 0.037) between Tapioca score and the IP -log10(p-value) for proteins in both interactomes (for proteins observed in both IP and TPCA buffer conditions, the maximum -log10(p-value) was used). Significance was determined by two-tailed Student’s t test. c, The Tapioca score distributions for Tapioca predicted NUCKS interactors detected and not detected by IP-MS. There was a statistically significant difference in the distribution of Tapioca scores of NUCKS interactors detected and not detected by IP-MS, with those detected by IP-MS tending to have a higher Tapioca score. Significance was determined by two-tailed Student’s t test. d, Venn diagram of Tapioca and IP-MS NUCKS interactors showing the direct overlap, the overlap of IP-MS unique proteins which are known interactors of Tapioca unique interactors (and vice versa; see Methods), and the proteins with no connection between Tapioca and IP-MS. There was an overlap of 61 Tapioca and IP-MS identified NUCKS interactors, and 314 IP-MS unique proteins had known interactions with Tapioca unique proteins. Thus, it would seem that, similar to using different IP conditions between IP-MS experiments, Tapioca and IP-MS likely capture different subpopulations of the NUCKS interactome, here likely due to the major differences between IP and TPCA (Tapioca’s source of dynamics data in the KSHV experimentation) the methodologies.
Extended Data Fig. 6 Temporal dynamics of NUCKS biological process interactions in KSHV reactivation of latency. Tapioca randomization, and Tapioca versus sub-model scores.
a, K-means clusters of the temporal dynamics of GO term enrichment of host interactors of the host protein NUCKS; the solid line represents the mean value, and the shaded region represents the 95% confidence interval. b, Evaluation by area under a true positive rate versus false positive rate curve (AUC) of Tapioca predictions on datasets with randomized dynamics data, n = 48 biologically independent samples. Due to the use of static interaction data (which was not randomized), many sub-models better than random predictions, as intended. This is a byproduct of using a static gold standard. See Fig. 2, for evaluation of systems dynamics. For boxplots, boxes show median, 25th and 75th percentile values, with the line within the box representing the median value, whiskers represent +/− 1.5 interquartile range, and points are outliers. c, The distribution of score difference for a given PPI between Tapioca and a given sub-model, for PPIs that achieve a score of at least 0.5 by either the sub-model and/or Tapioca. Violin plot definitions can be found in the methods, n = 48 biologically independent samples. d, UpSet plot comparing, between sub-models, the proteins that most often have highly differentially predicted interactomes (99.9th percentile) between the given sub-model and Tapioca. For brevity, only comparisons with an intersection size of > = 30 are shown. For violin plots, the white dot represents the median, the thick black bar represents the +/− 1.5 interquartile range, and the thin grey line represents the total range, excluding outliers.
Extended Data Fig. 7 GO Term enrichment of proteins with frequently highly differential predicted interactomes between Tapioca and one of its sub-models.
a, GO term enrichments from HumanBase (https://hb.flatironinstitute.org/) for the given sub-models. For an explanation of the HumanBase GO term enrichment nodes see Methods. The rest of the associated GO terms for each module (for example, M1) can be found in Supplementary Table 3a–h.
Extended Data Fig. 8 Differences in TPCA and IP-MS sample prep and detected interactomes.
IP-MS and TPCA methodologies take distinct approaches in the determine the interactors of a given protein, resulting in the detection of different subsets of true (biologically relevant) and false (biologically irrelevant) interactions. In IP-MS, cells are lysed prior to the detection of PPIs, which can result in the loss of interactors that do not survive the stringency of the lysis conditions. In addition, the loss of cellular compartments can lead to interactions that do not occur in the true biological system, leading to contamination of the MS determined interactome. Additionally, during the immunoaffinity purification contaminates can form non-specific interactions with the antibodies and beads used. In contrast, in TPCA, PPIs are ‘captured’ prior to cell lysis, during thermal denaturation, preventing non-specific interactions with contaminates and limiting the loss of interactions. Instead, interactome contaminates and loss of true PPIs are introduced upon interpretation of protein melting curves into PPIs. The random overlap of curves can lead to the prediction of false interactions, and true interactors with distant melting curves can be missed.
Extended Data Fig. 9 NUCKS melting curve behavior and 48 HPR interactome.
a, Diagram of how protein-non-protein interactions can influence the observed melting curve of a protein. b, The melting curve of NUCKS at 0 HPR and the melting curves three histone proteins that NUCKS interacts with (by both Tapioca and IP-MS); the solid line represents the mean value, and the shaded region represents the 95% confidence interval. c, Example of how NUCKS interactors become stabilized upon interaction with NUCKS; the solid line represents the mean value, and the shaded region represents the 95% confidence interval. d, Example of the distinct curve shapes of CORUM complexes with which NUCKS interacts with (Tapioca predicted interaction with ≥ 50% of complex subunits and interaction with at least one subunit confirmed by IP-MS). The median melting curve for all proteins is represented by the dotted blue line, and all melting curves (clipped to those with a maximum value of less than 1.3 for visualization) are shown in grey. The median melting curves per replicate is also plotted, with the shaded region representing the 95% confidence interval. e, Leiden community clustering on NUCKS 48 HPR interactome (only NUCKS interactors are shown, not NUCKS; edges shown are those with scores in the top 97th percentile amongst NUCKS interactors).
Extended Data Fig. 10 NUCKS1 knockout efficiency.
a, b, c, Western blots showing NUCKS1 knockout efficiency in iSLK.219, HFF, and MRC5 cells. d, e, f, Results of TUNEL assays, used to evaluate cell viability, on iSLK.219, HFF, and MRC5 NUCKS1 knockout cells. Error bars represent the standard error.
Supplementary information
Supplementary Information
Legends for Supplementary Tables 1–11.
Supplementary Table 1
Tapioca evaluation values per dataset used in evaluation, as well as tables describing the features used by Tapioca to predict PPIs.
Supplementary Table 2
Tapioca evaluation of systems dynamic capture.
Supplementary Table 3
Protein–protein interactions with substantially higher scores by submodel than by Tapioca and HumanBase GO term enrichment of proteins with frequently highly differential predicted interactomes between Tapioca and its submodels.
Supplementary Table 4
HumanBase GO term enrichment of proteins with unique PPIs predicted by CF-TPCA or I-PISA-TPCA compared to CF, I-PISA and TPCA datasets alone.
Supplementary Table 5
Evaluation of Euclidean distance-based PPI predictions for TPCA temperature optimization and lysis optimization experiments.
Supplementary Table 6
Spreadsheet containing raw and normalized TPCA data for temperature optimization, lysis optimization and KSHV experiments, as well as an example of calculating ratios from TPCA test mix samples.
Supplementary Table 7
CORUM complex Tapioca scores and percentage of subunits detected throughout KSHV reactivation experiment.
Supplementary Table 8
NUCKS IP–MS and IP–PRM data and HumanBase GO term enrichment of NUCKS IP–MS.
Supplementary Table 9
Interactome similarity scores between HSV-1, HCMV and KSHV proteins.
Supplementary Table 10
HumanBase GO term enrichment of NUCKS1 interactors throughout HSV-1 and HCMV infections.
Supplementary Table 11
UniProt accessions of proteins in NUCKS 48 HPR Leiden communities.
Source data
Source Data Extended Data Fig. 10
Unprocessed western blot.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Reed, T.J., Tyl, M.D., Tadych, A. et al. Tapioca: a platform for predicting de novo protein–protein interactions in dynamic contexts. Nat Methods 21, 488–500 (2024). https://doi.org/10.1038/s41592-024-02179-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41592-024-02179-9