Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Tapioca: a platform for predicting de novo protein–protein interactions in dynamic contexts

Abstract

Protein–protein interactions (PPIs) drive cellular processes and responses to environmental cues, reflecting the cellular state. Here we develop Tapioca, an ensemble machine learning framework for studying global PPIs in dynamic contexts. Tapioca predicts de novo interactions by integrating mass spectrometry interactome data from thermal/ion denaturation or cofractionation workflows with protein properties and tissue-specific functional networks. Focusing on the thermal proximity coaggregation method, we improved the experimental workflow. Finely tuned thermal denaturation afforded increased throughput, while cell lysis optimization enhanced protein detection from different subcellular compartments. The Tapioca workflow was next leveraged to investigate viral infection dynamics. Temporal PPIs were characterized during the reactivation from latency of the oncogenic Kaposi’s sarcoma-associated herpesvirus. Together with functional assays, NUCKS was identified as a proviral hub protein, and a broader role was uncovered by integrating PPI networks from alpha- and betaherpesvirus infections. Altogether, Tapioca provides a web-accessible platform for predicting PPIs in dynamic contexts.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Tapioca, a machine learning method for predicting global protein–protein interaction networks in dynamic contexts.
Fig. 2: Tapioca balances accurate PPI prediction with the preservation and capture of system dynamics.
Fig. 3: Tapioca reliably predicts PPIs from heterogeneous dynamics data and identifies unique subsets of PPIs.
Fig. 4: Optimizing the thermal denaturation and lysis conditions in TPCA for studying PPIs.
Fig. 5: Leveraging Tapioca to characterize global PPI network dynamics during KSHV reactivation from latency.
Fig. 6: NUCKS plays a proviral role across herpesvirus infections.

Similar content being viewed by others

Data availability

The mass spectrometry proteomics data reported in this paper, excluding the PRM data, have been deposited in the ProteomeXchange Consortium78 via the PRIDE79 partner repository with the dataset identifier PXD041152 and 10.6019/PXD041152. The PRM data were uploaded to Panorama80 and can be accessed at panoramaweb.org/HfwV6S.url. All Tapioca predictions can be downloaded from (and scores ≥0.15 viewed) at tapioca.princeton.edu/. Source data are provided with this paper.

Code availability

The code to run or modify Tapioca is provided at github.com/FunctionLab/tapioca and on Code Ocean (codeocean.com/capsule/7217908). There are no restrictions on access to this code.

References

  1. Braun, P. & Gingras, A.-C. History of protein–protein interactions: from egg-white to complex networks. Proteomics 12, 1478–1498 (2012).

    Article  CAS  PubMed  Google Scholar 

  2. Taylor, I. W. & Wrana, J. L. Protein interaction networks in medicine and disease. Proteomics 12, 1706–1716 (2012).

    Article  CAS  PubMed  Google Scholar 

  3. Tsitsiridis, G. et al. CORUM: the comprehensive resource of mammalian protein complexes–2022. Nucleic Acids Res. 51, D539–D545 (2023).

    Article  CAS  PubMed  Google Scholar 

  4. Stark, C. et al. BioGRID: a general repository for interaction datasets. Nucleic Acids Res. 34, D535–D539 (2006).

    Article  CAS  PubMed  Google Scholar 

  5. Szklarczyk, D. et al. The STRING database in 2021: customizable protein–protein networks, and functional characterization of user-uploaded gene/measurement sets. Nucleic Acids Res. 49, D605–D612 (2021).

    Article  CAS  PubMed  Google Scholar 

  6. Orchard, S. et al. The MIntAct project—IntAct as a common curation platform for 11 molecular interaction databases. Nucleic Acids Res. 42, D358–D363 (2014).

    Article  CAS  PubMed  Google Scholar 

  7. Jean Beltran, P. M., Federspiel, J. D., Sheng, X. & Cristea, I. M. Proteomics and integrative omic approaches for understanding host–pathogen interactions and infectious diseases. Mol. Syst. Biol. 13, 922 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  8. Greco, T. M., Kennedy, M. A. & Cristea, I. M. Proteomic technologies for deciphering local and global protein interactions. Trends Biochem. Sci. 45, 454–455 (2020).

    Article  CAS  PubMed  Google Scholar 

  9. Truong, K. & Ikura, M. The use of FRET imaging microscopy to detect protein–protein interactions and protein conformational changes in vivo. Curr. Opin. Struct. Biol. 11, 573–578 (2001).

    Article  CAS  PubMed  Google Scholar 

  10. Brückner, A., Polge, C., Lentze, N., Auerbach, D. & Schlattner, U. Yeast two-hybrid, a powerful tool for systems biology. Int. J. Mol. Sci. 10, 2763–2788 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  11. Yu, X., Petritis, B. & LaBaer, J. Advancing translational research with next-generation protein microarrays. Proteomics 16, 1238–1250 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Dionne, U. & Gingras, A.-C. Proximity-dependent biotinylation approaches to explore the dynamic compartmentalized proteome. Front. Mol. Biosci. 9, 852911 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Miteva, Y. V., Budayeva, H. G. & Cristea, I. M. Proteomics-based methods for discovery, quantification, and validation of protein–protein interactions. Anal. Chem. 85, 749–768 (2013).

    Article  CAS  PubMed  Google Scholar 

  14. Fossati, A. et al. PCprophet: a framework for protein complex prediction and differential analysis using proteomic data. Nat. Methods 18, 520–527 (2021).

    Article  CAS  PubMed  Google Scholar 

  15. Heusel, M. et al. Complex-centric proteome profiling by SEC-SWATH-MS. Mol. Syst. Biol. 15, e8438 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  16. Hu, L. Z. et al. EPIC: software toolkit for elution profile-based inference of protein complexes. Nat. Methods 16, 737–742 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Skinnider, M. A. & Foster, L. J. Meta-analysis defines principles for the design and analysis of co-fractionation mass spectrometry experiments. Nat. Methods 18, 806–815 (2021).

    Article  CAS  PubMed  Google Scholar 

  18. Franken, H. et al. Thermal proteome profiling for unbiased identification of direct and indirect drug targets using multiplexed quantitative mass spectrometry. Nat. Protoc. 10, 1567–1593 (2015).

    Article  CAS  PubMed  Google Scholar 

  19. Mateus, A., Määttä, T. A. & Savitski, M. M. Thermal proteome profiling: unbiased assessment of protein state through heat-induced stability changes. Proteome Sci. 15, 13 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  20. Savitski, M. M. et al. Tracking cancer drugs in living cells by thermal profiling of the proteome. Science 346, 1255784 (2014).

    Article  PubMed  Google Scholar 

  21. Tan, C. S. H. et al. Thermal proximity coaggregation for system-wide profiling of protein complex dynamics in cells. Science 359, 1170–1177 (2018).

    Article  ADS  CAS  PubMed  Google Scholar 

  22. Beusch, C. M., Sabatier, P. & Zubarev, R. A. Ion-based proteome-integrated solubility alteration assays for systemwide profiling of protein–molecule interactions. Anal. Chem. 94, 7066–7074 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Arias, C. et al. KSHV 2.0: a comprehensive annotation of the Kaposi’s sarcoma-associated herpesvirus genome using next-generation sequencing reveals novel genomic and functional features. PLoS Pathog. 10, e1003847 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  24. Davis, Z. H. et al. Global mapping of herpesvirus-host protein complexes reveals a transcription strategy for late genes. Mol. Cell 57, 349–360 (2015).

    Article  CAS  PubMed  Google Scholar 

  25. Wen, K. W. & Damania, B. Kaposi sarcoma-associated herpesvirus (KSHV): molecular biology and oncogenesis. Cancer Lett. 289, 140–150 (2010).

    Article  CAS  PubMed  Google Scholar 

  26. Justice, J. L. et al. Systematic profiling of protein complex dynamics reveals DNA-PK phosphorylation of IFI16 en route to herpesvirus immunity. Sci. Adv. 7, eabg6680 (2021).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  27. Hashimoto, Y., Sheng, X., Murray-Nerger, L. A. & Cristea, I. M. Temporal dynamics of protein complex formation and dissociation during human cytomegalovirus infection. Nat. Commun. 11, 806 (2020).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  28. Selkrig, J. et al. SARS-CoV-2 infection remodels the host protein thermal stability landscape. Mol. Syst. Biol. 17, e10188 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Mistry, J. et al. Pfam: the protein families database in 2021. Nucleic Acids Res. 49, D412–D419 (2021).

    Article  CAS  PubMed  Google Scholar 

  30. Menon, R. et al. Single cell transcriptomics identifies focal segmental glomerulosclerosis remission endothelial biomarker. JCI Insight 5, e133267 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  31. Meyer, M. et al. Attenuated activation of pulmonary immune cells in mRNA-1273-vaccinated hamsters after SARS-CoV-2 infection. J. Clin. Invest. 131, e148036 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Zhou, J. et al. Whole-genome deep learning analysis identifies contribution of noncoding mutations to autism risk. Nat. Genet. 51, 973–980 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Chen, X. et al. Tissue-specific enhancer functional networks for associating distal regulatory regions to disease. Cell Syst. 12, 353–362.e6 (2021).

    Article  CAS  PubMed  Google Scholar 

  34. Krishnan, A. et al. Genome-wide prediction and functional characterization of the genetic basis of autism spectrum disorder. Nat. Neurosci. 19, 1454–1462 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Roussarie, J.-P. et al. Selective neuronal vulnerability in Alzheimer’s disease: a network-based analysis. Neuron 107, 821–835.e12 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Zhang, Z. et al. Blood RNA alternative splicing events as diagnostic biomarkers for infectious disease. Cell Rep. Methods 3, 100395 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. George, A. L. et al. Comparison of quantitative mass spectrometric methods for drug target identification by thermal proteome profiling. J. Proteome Res. 22, 2629–2640 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Becher, I. et al. Pervasive protein thermal stability variation during the cell cycle. Cell 173, 1495–1507.e18 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Skinnider, M. A. et al. An atlas of protein–protein interactions across mouse tissues. Cell 184, 4073–4089.e17 (2021).

    Article  CAS  PubMed  Google Scholar 

  40. Heusel, M. et al. A global screen for assembly state changes of the mitotic proteome by SEC-SWATH-MS. Cell Syst. 10, 133–155.e6 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Stacey, R. G., Skinnider, M. A., Chik, J. H. L. & Foster, L. J. Context-specific interactions in literature-curated protein interaction databases. BMC Genomics 19, 758 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Gillespie, M. et al. The reactome pathway knowledgebase 2022. Nucleic Acids Res. 50, D687–D692 (2022).

    Article  CAS  PubMed  Google Scholar 

  43. Banerjee, A., Lee, A., Campbell, E. & MacKinnon, R. Structure of a pore-blocking toxin in complex with a eukaryotic voltage-dependent K+ channel. eLife 2, e00594 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  44. Luche, S., Santoni, V. & Rabilloud, T. Evaluation of nonionic and zwitterionic detergents as membrane protein solubilizers in two-dimensional electrophoresis. Proteomics 3, 249–253 (2003).

    Article  CAS  PubMed  Google Scholar 

  45. Betsinger, C. N. et al. The human cytomegalovirus protein pUL13 targets mitochondrial cristae architecture to increase cellular respiration during infection. Proc. Natl Acad. Sci. USA 118, e2101675118 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Federspiel, J. D., Greco, T. M., Lum, K. K. & Cristea, I. M. Hdac4 interactions in Huntington’s disease viewed through the prism of multiomics. Mol. Cell. Proteom. 18, S92–S113 (2019).

    Article  CAS  Google Scholar 

  47. Liuzzi, M. et al. A potent peptidomimetic inhibitor of HSV ribonucleotide reductase with antiviral activity in vivo. Nature 372, 695–698 (1994).

    Article  ADS  CAS  PubMed  Google Scholar 

  48. Newcomb, W. W. & Brown, J. C. Structure and capsid association of the herpesvirus large tegument protein UL36. J. Virol. 84, 9408–UL9414 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Owen, D. J., Crump, C. M. & Graham, S. C. Tegument assembly and secondary envelopment of alphaherpesviruses. Viruses 7, 5084–5114 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Scrima, N. et al. Insights into herpesvirus tegument organization from structural analyses of the 970 central residues of HSV-1 UL36 protein. J. Biol. Chem. 290, 8820–8833 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Vittone, V. et al. Determination of interactions between tegument proteins of herpes simplex virus type 1. J. Virol. 79, 9566–9571 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Draganova, E. B., Valentin, J. & Heldwein, E. E. The ins and outs of herpesviral capsids: divergent structures and assembly mechanisms across the three subfamilies. Viruses 13, 1913 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Grzesik, P. et al. Incorporation of the Kaposi’s sarcoma-associated herpesvirus capsid vertex-specific component (CVSC) into self-assembled capsids. Virus Res. 236, 9–13 (2017).

    Article  CAS  PubMed  Google Scholar 

  54. Huang, P., Cai, Y., Zhao, B. & Cui, L. Roles of NUCKS1 in diseases: susceptibility, potential biomarker, and regulatory mechanisms. BioMed. Res. Int. 2018, e7969068 (2018).

    Article  Google Scholar 

  55. Østvold, A. C., Grundt, K. & Wiese, C. NUCKS1 is a highly modified, chromatin-associated protein involved in a diverse set of biological and pathophysiological processes. Biochem. J. 479, 1205–1220 (2022).

    Article  PubMed  Google Scholar 

  56. Kim, H.-Y. et al. NUCKS1, a novel Tat coactivator, plays a crucial role in HIV-1 replication by increasing Tat-mediated viral transcription on the HIV-1 LTR promoter. Retrovirology 11, 67 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Cannon, J. S., Hamzeh, F., Moore, S., Nicholas, J. & Ambinder, R. F. Human herpesvirus 8-encoded thymidine kinase and phosphotransferase homologues confer sensitivity to ganciclovir. J. Virol. 73, 4786–4793 (1999).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Jordan, A. & Reichard, P. Ribonucleotide reductases. Annu. Rev. Biochem. 67, 71–98 (1998).

    Article  CAS  PubMed  Google Scholar 

  59. Kuang, E., Tang, Q., Maul, G. G. & Zhu, F. Activation of p90 ribosomal S6 kinase by ORF45 of Kaposi’s sarcoma-associated herpesvirus and its role in viral lytic replication. J. Virol. 82, 1838–1850 (2008).

    Article  CAS  PubMed  Google Scholar 

  60. Licata, L. et al. MINT, the molecular interaction database: 2012 update. Nucleic Acids Res. 40, D857–D861 (2012).

    Article  CAS  PubMed  Google Scholar 

  61. Hernández Durán, A., Grünewald, K. & Topf, M. Conserved central intraviral protein interactome of the Herpesviridae family. mSystems 4, e00295-19 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  62. Jarzab, A. et al. Meltome atlas—thermal proteome stability across the tree of life. Nat. Methods 17, 495–503 (2020).

    Article  CAS  PubMed  Google Scholar 

  63. Wong, A. K., Krishnan, A. & Troyanskaya, O. G. GIANT 2.0: genome-scale integrated analysis of gene networks in tissues. Nucleic Acids Res. 46, W65–W70 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Harris, C. R. et al. Array programming with NumPy. Nature 585, 357–362 (2020).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  65. Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).

    MathSciNet  Google Scholar 

  66. Virtanen, P. et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods 17, 261–272 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. McKinney, W. Data structures for statistical computing in Python. In Proc. 9th Python in Science Conference (eds van der Walt, S. & Millman, J.) 56–61 (2010); https://doi.org/10.25080/Majora-92bf1922-00a

  68. The Pandas Development Team. pandas-dev/pandas: Pandas (v.2.2.0rc0). Zenodo https://doi.org/10.5281/zenodo.3509134 (2023).

  69. Cock, P. J. A. et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25, 1422–1423 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  70. Uhlén, M. et al. Tissue-based map of the human proteome. Science 347, 1260419 (2015).

    Article  PubMed  Google Scholar 

  71. Kennedy, M. A. et al. A TRUSTED targeted mass spectrometry assay for pan-herpesvirus protein detection. Cell Rep. 39, 110810 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  72. MacLean, B. et al. Skyline: an open source document editor for creating and analyzing targeted proteomics experiments. Bioinformatics 26, 966–968 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  73. Mateus, A. et al. Thermal proteome profiling for interrogating protein interactions. Mol. Syst. Biol. 16, e9232 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  74. Diner, B. A., Lum, K. K., Javitt, A. & Cristea, I. M. Interactions of the antiviral factor interferon gamma-inducible protein 16 (IFI16) mediate immune signaling and herpes simplex virus-1 immunosuppression. Mol. Cell. Proteom. 14, 2341–2356 (2015).

    Article  CAS  Google Scholar 

  75. Waskom, M. L. seaborn: statistical data visualization. J. Open Source Softw. 6, 3021 (2021).

    Article  ADS  Google Scholar 

  76. Hunter, J. D. Matplotlib: a 2D graphics environment. Comput. Sci. Eng. 9, 90–95 (2007).

    Article  Google Scholar 

  77. Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  78. Deutsch, E. W. et al. The ProteomeXchange consortium in 2020: enabling ‘big data’ approaches in proteomics. Nucleic Acids Res. 48, D1145–D1152 (2020).

    CAS  PubMed  Google Scholar 

  79. Perez-Riverol, Y. et al. The PRIDE database resources in 2022: a hub for mass spectrometry-based proteomics evidences. Nucleic Acids Res. 50, D543–D552 (2022).

    Article  CAS  PubMed  Google Scholar 

  80. Sharma, V. et al. Panorama Public: a public repository for quantitative data sets processed in Skyline. Mol. Cell. Proteom. 17, 1239–1244 (2018).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

We thank J. E. Hutton III for mass spectrometry support, J. L. Justice for the creation of the Tapioca logo, and all members of the Cristea laboratory and Troyanskaya laboratory at Princeton University and the Flatiron Institute for helpful discussions. We are grateful for funding from the NIH NIGMS (R01GM114141, I.M.C.; T32GM007388, M.D.T.; R01GM071966, O.G.T.), NIAID (AI174515, I.M.C.), NHGRI (R01HG005998, O.G.T.), Stand Up To Cancer Convergence (3.1416, I.M.C.), Simons Foundation grant (395506) to O.G.T., the CHDI Foundation (I.M.C.) and a Pre-Doctoral Fellowship from the New Jersey Commission on Cancer Research (COCR23PRF019) to M.D.T. This material is based on work supported by the National Science Foundation Graduate Research Fellowship Program under grant no. DGE-2039656 (awarded to T.J.R.). Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Author information

Authors and Affiliations

Authors

Contributions

T.J.R., O.G.T. and I.M.C. designed the research. T.J.R. designed and developed Tapioca. T.J.R. performed TPCA experiments, M.D.T. performed virology assays and T.J.R. and M.D.T. performed IP experiments. T.J.R. performed all data analysis. A.T. developed the Tapioca website. Paper writing was done by T.J.R., M.D.T., O.G.T. and I.M.C.

Corresponding authors

Correspondence to Olga. G. Troyanskaya or Ileana. M. Cristea.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Methods thanks Ben Collins, Mikhail Savitski and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available. Primary Handling Editor: Arunima Singh, in collaboration with the Nature Methods team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Evaluation of Tapioca, its sub-models, and Euclidean distance (Euc.) based PPI predictions.

a, 5-fold cross validation based evaluation by the area under a precision recall curve (AUPRC), n = 48 biologically independent samples. For boxplots, boxes show median, 25th and 75th percentile values, with the line within the box representing the median value, whiskers represent +/− 1.5 interquartile range, and points are outliers. b, 5-fold cross validation-based evaluation by one minus the false positivity rate (1-FPR), n = 48 biologically independent samples. For boxplots, boxes show median, 25th and 75th percentile values, with the line within the box representing the median value, whiskers represent +/− 1.5 interquartile range, and points are outliers. c, d,5-fold cross validation-based evaluation by (C) one minus the false positivity rate (1-FPR) and (D) the area under a precision recall curve (AUPRC). Logistic regression outperforms other machine learning methods by 1-FPR. Although random forest outperforms other sub-models by AUPRC, Tapioca integration of random forest sub-models gives inconsistent results, showing this machine learning method fails to generalize. Boxplot definitions can be found in the methods. For both figures C and D, n = 48 biologically independent samples. For boxplots, boxes show median, 25th and 75th percentile values, with the line within the box representing the median value, whiskers represent +/− 1.5 interquartile range, and points are outliers. e, f, Performance of (E) naïve bayes and (F) random forest based models. The standard deviation (SD) of the difference in PPI scores at 0 and 15 hours post infection in HSV-1 infection was used to assess dynamics. The random forest models often showed negative correlation with the dynamics data used in Tapioca sub-model integration, resulting in a high SD for the random forest Tapioca model. Given these negative correlations, it is likely that the random forest Tapioca model is failing to capture the true system dynamics, instead predicting inappropriate and artificial fluctuations. The 1-FPR values shown are based on only the values from the 0 and 15 hours post infection in HSV-1 datasets (that is, not all 48 datasets used in Fig. 1, Extended Data Fig. 1a–d).

Extended Data Fig. 2 KSHV TPCA data quality, number of predicted PPIs, and CORUM complex assembly.

a, Density plots comparing the log abundance for all temperatures, all time points, and all proteins (observed in both compared replicates) compared between all possible replicate pairs. b, The number of total predicted protein-protein interactions and the number of protein-protein interactions involving at least one KSHV protein at each time point. c, Depicted are 355 CORUM complexes (of a total of 1250 detected CORUM complexes) identified as assembled during at least one time point during KSHV reactivation from latency. Complexes are included in the heatmap if they were always observed with ≥ 50% subunit detection at all time points and achieved a minimum Tapioca score of 0.4 in at least one time point. The list of complexes within the heatmap can be found in Supplementary Table 8a.

Extended Data Fig. 3 Temporal dynamics of KSHV biological process interactions in KSHV reactivation of latency.

K-means clusters of the temporal dynamics of GO term enrichment of host interactors of KSHV proteins; the solid line represents the mean value, and the shaded region represents the 95% confidence interval.

Extended Data Fig. 4 NUCKS IP-MS Experiment.

Volcano plots of NUCKS IP-MS experiments performed using different combinations of two antibodies, two lysis buffers, and two strengths of nucleases. Interactors that pass significance thresholds, represented by dotted lines (fold change ≥ 2, p-value ≤ 0.05), are colored red (non-interactors that pass fold change ≤ −2, p-value ≤ 0.05 are also colored red). KSHV proteins are colored blue, regardless of significance. NUCKS is shown as a black ‘X’. All KSHV proteins and the top five interactors (ranked by p-value) are labeled. There was a wide distribution in the number of statistically significant PPIs identified by the different IP conditions, ranging from 1 to 304 proteins. In total, 392 interactors of NUCKS were identified. Significance was determined by two-tailed Student’s t test.

Extended Data Fig. 5 Comparison of IP-MS and Tapioca predicted NUCKS interactors.

a, UpSet plot comparing predicted NUCKS interactors by each IP-MS condition and Tapioca. Between 0% and 31% of interactors were commonly identified between any given pair of IP-MS conditions. The results of IPs are known to be heavily influenced by the lysis conditions and antibodies used, thus it is unsurprising that differential interactomes were identified between conditions. Nevertheless, the observed interactomes were largely thematically similar, capturing NUCKS interactors involved in chromosome regulation, RNA transport and processing, and DNA biosynthetic processes. Thus, it would seem that these IP conditions are leading to the capture of different subpopulations from the same overall NUCKS interactome. Significance was determined by two-tailed Student’s t test. b, Tapioca score versus the negative base 10 log of the IP-MS p-value for all proteins predicted to interact with NUCKS at 48 HPR that were detected by both Tapioca and IP-MS. We observed a little correlation (r = 0.037) between Tapioca score and the IP -log10(p-value) for proteins in both interactomes (for proteins observed in both IP and TPCA buffer conditions, the maximum -log10(p-value) was used). Significance was determined by two-tailed Student’s t test. c, The Tapioca score distributions for Tapioca predicted NUCKS interactors detected and not detected by IP-MS. There was a statistically significant difference in the distribution of Tapioca scores of NUCKS interactors detected and not detected by IP-MS, with those detected by IP-MS tending to have a higher Tapioca score. Significance was determined by two-tailed Student’s t test. d, Venn diagram of Tapioca and IP-MS NUCKS interactors showing the direct overlap, the overlap of IP-MS unique proteins which are known interactors of Tapioca unique interactors (and vice versa; see Methods), and the proteins with no connection between Tapioca and IP-MS. There was an overlap of 61 Tapioca and IP-MS identified NUCKS interactors, and 314 IP-MS unique proteins had known interactions with Tapioca unique proteins. Thus, it would seem that, similar to using different IP conditions between IP-MS experiments, Tapioca and IP-MS likely capture different subpopulations of the NUCKS interactome, here likely due to the major differences between IP and TPCA (Tapioca’s source of dynamics data in the KSHV experimentation) the methodologies.

Extended Data Fig. 6 Temporal dynamics of NUCKS biological process interactions in KSHV reactivation of latency. Tapioca randomization, and Tapioca versus sub-model scores.

a, K-means clusters of the temporal dynamics of GO term enrichment of host interactors of the host protein NUCKS; the solid line represents the mean value, and the shaded region represents the 95% confidence interval. b, Evaluation by area under a true positive rate versus false positive rate curve (AUC) of Tapioca predictions on datasets with randomized dynamics data, n = 48 biologically independent samples. Due to the use of static interaction data (which was not randomized), many sub-models better than random predictions, as intended. This is a byproduct of using a static gold standard. See Fig. 2, for evaluation of systems dynamics. For boxplots, boxes show median, 25th and 75th percentile values, with the line within the box representing the median value, whiskers represent +/− 1.5 interquartile range, and points are outliers. c, The distribution of score difference for a given PPI between Tapioca and a given sub-model, for PPIs that achieve a score of at least 0.5 by either the sub-model and/or Tapioca. Violin plot definitions can be found in the methods, n = 48 biologically independent samples. d, UpSet plot comparing, between sub-models, the proteins that most often have highly differentially predicted interactomes (99.9th percentile) between the given sub-model and Tapioca. For brevity, only comparisons with an intersection size of > = 30 are shown. For violin plots, the white dot represents the median, the thick black bar represents the +/− 1.5 interquartile range, and the thin grey line represents the total range, excluding outliers.

Extended Data Fig. 7 GO Term enrichment of proteins with frequently highly differential predicted interactomes between Tapioca and one of its sub-models.

a, GO term enrichments from HumanBase (https://hb.flatironinstitute.org/) for the given sub-models. For an explanation of the HumanBase GO term enrichment nodes see Methods. The rest of the associated GO terms for each module (for example, M1) can be found in Supplementary Table 3a–h.

Extended Data Fig. 8 Differences in TPCA and IP-MS sample prep and detected interactomes.

IP-MS and TPCA methodologies take distinct approaches in the determine the interactors of a given protein, resulting in the detection of different subsets of true (biologically relevant) and false (biologically irrelevant) interactions. In IP-MS, cells are lysed prior to the detection of PPIs, which can result in the loss of interactors that do not survive the stringency of the lysis conditions. In addition, the loss of cellular compartments can lead to interactions that do not occur in the true biological system, leading to contamination of the MS determined interactome. Additionally, during the immunoaffinity purification contaminates can form non-specific interactions with the antibodies and beads used. In contrast, in TPCA, PPIs are ‘captured’ prior to cell lysis, during thermal denaturation, preventing non-specific interactions with contaminates and limiting the loss of interactions. Instead, interactome contaminates and loss of true PPIs are introduced upon interpretation of protein melting curves into PPIs. The random overlap of curves can lead to the prediction of false interactions, and true interactors with distant melting curves can be missed.

Extended Data Fig. 9 NUCKS melting curve behavior and 48 HPR interactome.

a, Diagram of how protein-non-protein interactions can influence the observed melting curve of a protein. b, The melting curve of NUCKS at 0 HPR and the melting curves three histone proteins that NUCKS interacts with (by both Tapioca and IP-MS); the solid line represents the mean value, and the shaded region represents the 95% confidence interval. c, Example of how NUCKS interactors become stabilized upon interaction with NUCKS; the solid line represents the mean value, and the shaded region represents the 95% confidence interval. d, Example of the distinct curve shapes of CORUM complexes with which NUCKS interacts with (Tapioca predicted interaction with ≥ 50% of complex subunits and interaction with at least one subunit confirmed by IP-MS). The median melting curve for all proteins is represented by the dotted blue line, and all melting curves (clipped to those with a maximum value of less than 1.3 for visualization) are shown in grey. The median melting curves per replicate is also plotted, with the shaded region representing the 95% confidence interval. e, Leiden community clustering on NUCKS 48 HPR interactome (only NUCKS interactors are shown, not NUCKS; edges shown are those with scores in the top 97th percentile amongst NUCKS interactors).

Extended Data Fig. 10 NUCKS1 knockout efficiency.

a, b, c, Western blots showing NUCKS1 knockout efficiency in iSLK.219, HFF, and MRC5 cells. d, e, f, Results of TUNEL assays, used to evaluate cell viability, on iSLK.219, HFF, and MRC5 NUCKS1 knockout cells. Error bars represent the standard error.

Source data

Supplementary information

Supplementary Information

Legends for Supplementary Tables 1–11.

Reporting Summary

Peer Review File

Supplementary Table 1

Tapioca evaluation values per dataset used in evaluation, as well as tables describing the features used by Tapioca to predict PPIs.

Supplementary Table 2

Tapioca evaluation of systems dynamic capture.

Supplementary Table 3

Protein–protein interactions with substantially higher scores by submodel than by Tapioca and HumanBase GO term enrichment of proteins with frequently highly differential predicted interactomes between Tapioca and its submodels.

Supplementary Table 4

HumanBase GO term enrichment of proteins with unique PPIs predicted by CF-TPCA or I-PISA-TPCA compared to CF, I-PISA and TPCA datasets alone.

Supplementary Table 5

Evaluation of Euclidean distance-based PPI predictions for TPCA temperature optimization and lysis optimization experiments.

Supplementary Table 6

Spreadsheet containing raw and normalized TPCA data for temperature optimization, lysis optimization and KSHV experiments, as well as an example of calculating ratios from TPCA test mix samples.

Supplementary Table 7

CORUM complex Tapioca scores and percentage of subunits detected throughout KSHV reactivation experiment.

Supplementary Table 8

NUCKS IP–MS and IP–PRM data and HumanBase GO term enrichment of NUCKS IP–MS.

Supplementary Table 9

Interactome similarity scores between HSV-1, HCMV and KSHV proteins.

Supplementary Table 10

HumanBase GO term enrichment of NUCKS1 interactors throughout HSV-1 and HCMV infections.

Supplementary Table 11

UniProt accessions of proteins in NUCKS 48 HPR Leiden communities.

Source data

Source Data Extended Data Fig. 10

Unprocessed western blot.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Reed, T.J., Tyl, M.D., Tadych, A. et al. Tapioca: a platform for predicting de novo protein–protein interactions in dynamic contexts. Nat Methods 21, 488–500 (2024). https://doi.org/10.1038/s41592-024-02179-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41592-024-02179-9

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing