Tapioca: a platform for predicting de novo protein–protein interactions in dynamic contexts

Reed, Tavis. J.; Tyl, Matthew. D.; Tadych, Alicja; Troyanskaya, Olga. G.; Cristea, Ileana. M.

doi:10.1038/s41592-024-02179-9

Article
Published: 15 February 2024

Tapioca: a platform for predicting de novo protein–protein interactions in dynamic contexts

Nature Methods volume 21, pages 488–500 (2024)Cite this article

6618 Accesses
2 Citations
19 Altmetric
Metrics details

Subjects

Abstract

Protein–protein interactions (PPIs) drive cellular processes and responses to environmental cues, reflecting the cellular state. Here we develop Tapioca, an ensemble machine learning framework for studying global PPIs in dynamic contexts. Tapioca predicts de novo interactions by integrating mass spectrometry interactome data from thermal/ion denaturation or cofractionation workflows with protein properties and tissue-specific functional networks. Focusing on the thermal proximity coaggregation method, we improved the experimental workflow. Finely tuned thermal denaturation afforded increased throughput, while cell lysis optimization enhanced protein detection from different subcellular compartments. The Tapioca workflow was next leveraged to investigate viral infection dynamics. Temporal PPIs were characterized during the reactivation from latency of the oncogenic Kaposi’s sarcoma-associated herpesvirus. Together with functional assays, NUCKS was identified as a proviral hub protein, and a broader role was uncovered by integrating PPI networks from alpha- and betaherpesvirus infections. Altogether, Tapioca provides a web-accessible platform for predicting PPIs in dynamic contexts.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Tapioca, a machine learning method for predicting global protein–protein interaction networks in dynamic contexts.**

**Fig. 2: Tapioca balances accurate PPI prediction with the preservation and capture of system dynamics.**

**Fig. 3: Tapioca reliably predicts PPIs from heterogeneous dynamics data and identifies unique subsets of PPIs.**

**Fig. 4: Optimizing the thermal denaturation and lysis conditions in TPCA for studying PPIs.**

**Fig. 5: Leveraging Tapioca to characterize global PPI network dynamics during KSHV reactivation from latency.**

**Fig. 6: NUCKS plays a proviral role across herpesvirus infections.**

Highly accurate protein structure prediction with AlphaFold

Article Open access 15 July 2021

Inferring gene regulatory networks from single-cell multiome data using atlas-scale external data

Article Open access 12 April 2024

Pooled multicolour tagging for visualizing subcellular protein dynamics

Article Open access 19 April 2024

Data availability

The mass spectrometry proteomics data reported in this paper, excluding the PRM data, have been deposited in the ProteomeXchange Consortium⁷⁸ via the PRIDE⁷⁹ partner repository with the dataset identifier PXD041152 and 10.6019/PXD041152. The PRM data were uploaded to Panorama⁸⁰ and can be accessed at panoramaweb.org/HfwV6S.url. All Tapioca predictions can be downloaded from (and scores ≥0.15 viewed) at tapioca.princeton.edu/. Source data are provided with this paper.

Code availability

The code to run or modify Tapioca is provided at github.com/FunctionLab/tapioca and on Code Ocean (codeocean.com/capsule/7217908). There are no restrictions on access to this code.

References

Braun, P. & Gingras, A.-C. History of protein–protein interactions: from egg-white to complex networks. Proteomics 12, 1478–1498 (2012).
Article CAS PubMed Google Scholar
Taylor, I. W. & Wrana, J. L. Protein interaction networks in medicine and disease. Proteomics 12, 1706–1716 (2012).
Article CAS PubMed Google Scholar
Tsitsiridis, G. et al. CORUM: the comprehensive resource of mammalian protein complexes–2022. Nucleic Acids Res. 51, D539–D545 (2023).
Article CAS PubMed Google Scholar
Stark, C. et al. BioGRID: a general repository for interaction datasets. Nucleic Acids Res. 34, D535–D539 (2006).
Article CAS PubMed Google Scholar
Szklarczyk, D. et al. The STRING database in 2021: customizable protein–protein networks, and functional characterization of user-uploaded gene/measurement sets. Nucleic Acids Res. 49, D605–D612 (2021).
Article CAS PubMed Google Scholar
Orchard, S. et al. The MIntAct project—IntAct as a common curation platform for 11 molecular interaction databases. Nucleic Acids Res. 42, D358–D363 (2014).
Article CAS PubMed Google Scholar
Jean Beltran, P. M., Federspiel, J. D., Sheng, X. & Cristea, I. M. Proteomics and integrative omic approaches for understanding host–pathogen interactions and infectious diseases. Mol. Syst. Biol. 13, 922 (2017).
Article PubMed PubMed Central Google Scholar
Greco, T. M., Kennedy, M. A. & Cristea, I. M. Proteomic technologies for deciphering local and global protein interactions. Trends Biochem. Sci. 45, 454–455 (2020).
Article CAS PubMed Google Scholar
Truong, K. & Ikura, M. The use of FRET imaging microscopy to detect protein–protein interactions and protein conformational changes in vivo. Curr. Opin. Struct. Biol. 11, 573–578 (2001).
Article CAS PubMed Google Scholar
Brückner, A., Polge, C., Lentze, N., Auerbach, D. & Schlattner, U. Yeast two-hybrid, a powerful tool for systems biology. Int. J. Mol. Sci. 10, 2763–2788 (2009).
Article PubMed PubMed Central Google Scholar
Yu, X., Petritis, B. & LaBaer, J. Advancing translational research with next-generation protein microarrays. Proteomics 16, 1238–1250 (2016).
Article CAS PubMed PubMed Central Google Scholar
Dionne, U. & Gingras, A.-C. Proximity-dependent biotinylation approaches to explore the dynamic compartmentalized proteome. Front. Mol. Biosci. 9, 852911 (2022).
Article CAS PubMed PubMed Central Google Scholar
Miteva, Y. V., Budayeva, H. G. & Cristea, I. M. Proteomics-based methods for discovery, quantification, and validation of protein–protein interactions. Anal. Chem. 85, 749–768 (2013).
Article CAS PubMed Google Scholar
Fossati, A. et al. PCprophet: a framework for protein complex prediction and differential analysis using proteomic data. Nat. Methods 18, 520–527 (2021).
Article CAS PubMed Google Scholar
Heusel, M. et al. Complex-centric proteome profiling by SEC-SWATH-MS. Mol. Syst. Biol. 15, e8438 (2019).
Article PubMed PubMed Central Google Scholar
Hu, L. Z. et al. EPIC: software toolkit for elution profile-based inference of protein complexes. Nat. Methods 16, 737–742 (2019).
Article CAS PubMed PubMed Central Google Scholar
Skinnider, M. A. & Foster, L. J. Meta-analysis defines principles for the design and analysis of co-fractionation mass spectrometry experiments. Nat. Methods 18, 806–815 (2021).
Article CAS PubMed Google Scholar
Franken, H. et al. Thermal proteome profiling for unbiased identification of direct and indirect drug targets using multiplexed quantitative mass spectrometry. Nat. Protoc. 10, 1567–1593 (2015).
Article CAS PubMed Google Scholar
Mateus, A., Määttä, T. A. & Savitski, M. M. Thermal proteome profiling: unbiased assessment of protein state through heat-induced stability changes. Proteome Sci. 15, 13 (2017).
Article PubMed PubMed Central Google Scholar
Savitski, M. M. et al. Tracking cancer drugs in living cells by thermal profiling of the proteome. Science 346, 1255784 (2014).
Article PubMed Google Scholar
Tan, C. S. H. et al. Thermal proximity coaggregation for system-wide profiling of protein complex dynamics in cells. Science 359, 1170–1177 (2018).
Article ADS CAS PubMed Google Scholar
Beusch, C. M., Sabatier, P. & Zubarev, R. A. Ion-based proteome-integrated solubility alteration assays for systemwide profiling of protein–molecule interactions. Anal. Chem. 94, 7066–7074 (2022).
Article CAS PubMed PubMed Central Google Scholar
Arias, C. et al. KSHV 2.0: a comprehensive annotation of the Kaposi’s sarcoma-associated herpesvirus genome using next-generation sequencing reveals novel genomic and functional features. PLoS Pathog. 10, e1003847 (2014).
Article PubMed PubMed Central Google Scholar
Davis, Z. H. et al. Global mapping of herpesvirus-host protein complexes reveals a transcription strategy for late genes. Mol. Cell 57, 349–360 (2015).
Article CAS PubMed Google Scholar
Wen, K. W. & Damania, B. Kaposi sarcoma-associated herpesvirus (KSHV): molecular biology and oncogenesis. Cancer Lett. 289, 140–150 (2010).
Article CAS PubMed Google Scholar
Justice, J. L. et al. Systematic profiling of protein complex dynamics reveals DNA-PK phosphorylation of IFI16 en route to herpesvirus immunity. Sci. Adv. 7, eabg6680 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Hashimoto, Y., Sheng, X., Murray-Nerger, L. A. & Cristea, I. M. Temporal dynamics of protein complex formation and dissociation during human cytomegalovirus infection. Nat. Commun. 11, 806 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Selkrig, J. et al. SARS-CoV-2 infection remodels the host protein thermal stability landscape. Mol. Syst. Biol. 17, e10188 (2021).
Article CAS PubMed PubMed Central Google Scholar
Mistry, J. et al. Pfam: the protein families database in 2021. Nucleic Acids Res. 49, D412–D419 (2021).
Article CAS PubMed Google Scholar
Menon, R. et al. Single cell transcriptomics identifies focal segmental glomerulosclerosis remission endothelial biomarker. JCI Insight 5, e133267 (2020).
Article PubMed PubMed Central Google Scholar
Meyer, M. et al. Attenuated activation of pulmonary immune cells in mRNA-1273-vaccinated hamsters after SARS-CoV-2 infection. J. Clin. Invest. 131, e148036 (2021).
Article CAS PubMed PubMed Central Google Scholar
Zhou, J. et al. Whole-genome deep learning analysis identifies contribution of noncoding mutations to autism risk. Nat. Genet. 51, 973–980 (2019).
Article CAS PubMed PubMed Central Google Scholar
Chen, X. et al. Tissue-specific enhancer functional networks for associating distal regulatory regions to disease. Cell Syst. 12, 353–362.e6 (2021).
Article CAS PubMed Google Scholar
Krishnan, A. et al. Genome-wide prediction and functional characterization of the genetic basis of autism spectrum disorder. Nat. Neurosci. 19, 1454–1462 (2016).
Article CAS PubMed PubMed Central Google Scholar
Roussarie, J.-P. et al. Selective neuronal vulnerability in Alzheimer’s disease: a network-based analysis. Neuron 107, 821–835.e12 (2020).
Article CAS PubMed PubMed Central Google Scholar
Zhang, Z. et al. Blood RNA alternative splicing events as diagnostic biomarkers for infectious disease. Cell Rep. Methods 3, 100395 (2023).
Article CAS PubMed PubMed Central Google Scholar
George, A. L. et al. Comparison of quantitative mass spectrometric methods for drug target identification by thermal proteome profiling. J. Proteome Res. 22, 2629–2640 (2023).
Article CAS PubMed PubMed Central Google Scholar
Becher, I. et al. Pervasive protein thermal stability variation during the cell cycle. Cell 173, 1495–1507.e18 (2018).
Article CAS PubMed PubMed Central Google Scholar
Skinnider, M. A. et al. An atlas of protein–protein interactions across mouse tissues. Cell 184, 4073–4089.e17 (2021).
Article CAS PubMed Google Scholar
Heusel, M. et al. A global screen for assembly state changes of the mitotic proteome by SEC-SWATH-MS. Cell Syst. 10, 133–155.e6 (2020).
Article CAS PubMed PubMed Central Google Scholar
Stacey, R. G., Skinnider, M. A., Chik, J. H. L. & Foster, L. J. Context-specific interactions in literature-curated protein interaction databases. BMC Genomics 19, 758 (2018).
Article CAS PubMed PubMed Central Google Scholar
Gillespie, M. et al. The reactome pathway knowledgebase 2022. Nucleic Acids Res. 50, D687–D692 (2022).
Article CAS PubMed Google Scholar
Banerjee, A., Lee, A., Campbell, E. & MacKinnon, R. Structure of a pore-blocking toxin in complex with a eukaryotic voltage-dependent K⁺ channel. eLife 2, e00594 (2013).
Article PubMed PubMed Central Google Scholar
Luche, S., Santoni, V. & Rabilloud, T. Evaluation of nonionic and zwitterionic detergents as membrane protein solubilizers in two-dimensional electrophoresis. Proteomics 3, 249–253 (2003).
Article CAS PubMed Google Scholar
Betsinger, C. N. et al. The human cytomegalovirus protein pUL13 targets mitochondrial cristae architecture to increase cellular respiration during infection. Proc. Natl Acad. Sci. USA 118, e2101675118 (2021).
Article CAS PubMed PubMed Central Google Scholar
Federspiel, J. D., Greco, T. M., Lum, K. K. & Cristea, I. M. Hdac4 interactions in Huntington’s disease viewed through the prism of multiomics. Mol. Cell. Proteom. 18, S92–S113 (2019).
Article CAS Google Scholar
Liuzzi, M. et al. A potent peptidomimetic inhibitor of HSV ribonucleotide reductase with antiviral activity in vivo. Nature 372, 695–698 (1994).
Article ADS CAS PubMed Google Scholar
Newcomb, W. W. & Brown, J. C. Structure and capsid association of the herpesvirus large tegument protein UL36. J. Virol. 84, 9408–UL9414 (2010).
Article CAS PubMed PubMed Central Google Scholar
Owen, D. J., Crump, C. M. & Graham, S. C. Tegument assembly and secondary envelopment of alphaherpesviruses. Viruses 7, 5084–5114 (2015).
Article CAS PubMed PubMed Central Google Scholar
Scrima, N. et al. Insights into herpesvirus tegument organization from structural analyses of the 970 central residues of HSV-1 UL36 protein. J. Biol. Chem. 290, 8820–8833 (2015).
Article CAS PubMed PubMed Central Google Scholar
Vittone, V. et al. Determination of interactions between tegument proteins of herpes simplex virus type 1. J. Virol. 79, 9566–9571 (2005).
Article CAS PubMed PubMed Central Google Scholar
Draganova, E. B., Valentin, J. & Heldwein, E. E. The ins and outs of herpesviral capsids: divergent structures and assembly mechanisms across the three subfamilies. Viruses 13, 1913 (2021).
Article CAS PubMed PubMed Central Google Scholar
Grzesik, P. et al. Incorporation of the Kaposi’s sarcoma-associated herpesvirus capsid vertex-specific component (CVSC) into self-assembled capsids. Virus Res. 236, 9–13 (2017).
Article CAS PubMed Google Scholar
Huang, P., Cai, Y., Zhao, B. & Cui, L. Roles of NUCKS1 in diseases: susceptibility, potential biomarker, and regulatory mechanisms. BioMed. Res. Int. 2018, e7969068 (2018).
Article Google Scholar
Østvold, A. C., Grundt, K. & Wiese, C. NUCKS1 is a highly modified, chromatin-associated protein involved in a diverse set of biological and pathophysiological processes. Biochem. J. 479, 1205–1220 (2022).
Article PubMed Google Scholar
Kim, H.-Y. et al. NUCKS1, a novel Tat coactivator, plays a crucial role in HIV-1 replication by increasing Tat-mediated viral transcription on the HIV-1 LTR promoter. Retrovirology 11, 67 (2014).
Article CAS PubMed PubMed Central Google Scholar
Cannon, J. S., Hamzeh, F., Moore, S., Nicholas, J. & Ambinder, R. F. Human herpesvirus 8-encoded thymidine kinase and phosphotransferase homologues confer sensitivity to ganciclovir. J. Virol. 73, 4786–4793 (1999).
Article CAS PubMed PubMed Central Google Scholar
Jordan, A. & Reichard, P. Ribonucleotide reductases. Annu. Rev. Biochem. 67, 71–98 (1998).
Article CAS PubMed Google Scholar
Kuang, E., Tang, Q., Maul, G. G. & Zhu, F. Activation of p90 ribosomal S6 kinase by ORF45 of Kaposi’s sarcoma-associated herpesvirus and its role in viral lytic replication. J. Virol. 82, 1838–1850 (2008).
Article CAS PubMed Google Scholar
Licata, L. et al. MINT, the molecular interaction database: 2012 update. Nucleic Acids Res. 40, D857–D861 (2012).
Article CAS PubMed Google Scholar
Hernández Durán, A., Grünewald, K. & Topf, M. Conserved central intraviral protein interactome of the Herpesviridae family. mSystems 4, e00295-19 (2019).
Article PubMed PubMed Central Google Scholar
Jarzab, A. et al. Meltome atlas—thermal proteome stability across the tree of life. Nat. Methods 17, 495–503 (2020).
Article CAS PubMed Google Scholar
Wong, A. K., Krishnan, A. & Troyanskaya, O. G. GIANT 2.0: genome-scale integrated analysis of gene networks in tissues. Nucleic Acids Res. 46, W65–W70 (2018).
Article CAS PubMed PubMed Central Google Scholar
Harris, C. R. et al. Array programming with NumPy. Nature 585, 357–362 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
MathSciNet Google Scholar
Virtanen, P. et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods 17, 261–272 (2020).
Article CAS PubMed PubMed Central Google Scholar
McKinney, W. Data structures for statistical computing in Python. In Proc. 9th Python in Science Conference (eds van der Walt, S. & Millman, J.) 56–61 (2010); https://doi.org/10.25080/Majora-92bf1922-00a
The Pandas Development Team. pandas-dev/pandas: Pandas (v.2.2.0rc0). Zenodo https://doi.org/10.5281/zenodo.3509134 (2023).
Cock, P. J. A. et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25, 1422–1423 (2009).
Article CAS PubMed PubMed Central Google Scholar
Uhlén, M. et al. Tissue-based map of the human proteome. Science 347, 1260419 (2015).
Article PubMed Google Scholar
Kennedy, M. A. et al. A TRUSTED targeted mass spectrometry assay for pan-herpesvirus protein detection. Cell Rep. 39, 110810 (2022).
Article CAS PubMed PubMed Central Google Scholar
MacLean, B. et al. Skyline: an open source document editor for creating and analyzing targeted proteomics experiments. Bioinformatics 26, 966–968 (2010).
Article CAS PubMed PubMed Central Google Scholar
Mateus, A. et al. Thermal proteome profiling for interrogating protein interactions. Mol. Syst. Biol. 16, e9232 (2020).
Article CAS PubMed PubMed Central Google Scholar
Diner, B. A., Lum, K. K., Javitt, A. & Cristea, I. M. Interactions of the antiviral factor interferon gamma-inducible protein 16 (IFI16) mediate immune signaling and herpes simplex virus-1 immunosuppression. Mol. Cell. Proteom. 14, 2341–2356 (2015).
Article CAS Google Scholar
Waskom, M. L. seaborn: statistical data visualization. J. Open Source Softw. 6, 3021 (2021).
Article ADS Google Scholar
Hunter, J. D. Matplotlib: a 2D graphics environment. Comput. Sci. Eng. 9, 90–95 (2007).
Article Google Scholar
Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003).
Article CAS PubMed PubMed Central Google Scholar
Deutsch, E. W. et al. The ProteomeXchange consortium in 2020: enabling ‘big data’ approaches in proteomics. Nucleic Acids Res. 48, D1145–D1152 (2020).
CAS PubMed Google Scholar
Perez-Riverol, Y. et al. The PRIDE database resources in 2022: a hub for mass spectrometry-based proteomics evidences. Nucleic Acids Res. 50, D543–D552 (2022).
Article CAS PubMed Google Scholar
Sharma, V. et al. Panorama Public: a public repository for quantitative data sets processed in Skyline. Mol. Cell. Proteom. 17, 1239–1244 (2018).
Article CAS Google Scholar

Download references

Acknowledgements

We thank J. E. Hutton III for mass spectrometry support, J. L. Justice for the creation of the Tapioca logo, and all members of the Cristea laboratory and Troyanskaya laboratory at Princeton University and the Flatiron Institute for helpful discussions. We are grateful for funding from the NIH NIGMS (R01GM114141, I.M.C.; T32GM007388, M.D.T.; R01GM071966, O.G.T.), NIAID (AI174515, I.M.C.), NHGRI (R01HG005998, O.G.T.), Stand Up To Cancer Convergence (3.1416, I.M.C.), Simons Foundation grant (395506) to O.G.T., the CHDI Foundation (I.M.C.) and a Pre-Doctoral Fellowship from the New Jersey Commission on Cancer Research (COCR23PRF019) to M.D.T. This material is based on work supported by the National Science Foundation Graduate Research Fellowship Program under grant no. DGE-2039656 (awarded to T.J.R.). Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Author information

These authors jointly supervised this work: Olga. G. Troyanskaya, Ileana. M. Cristea.

Authors and Affiliations

Lewis-Sigler Institute for Integrative Genomics, Princeton University, Carl Icahn Laboratory, Princeton, NJ, USA
Tavis. J. Reed, Alicja Tadych & Olga. G. Troyanskaya
Department of Computer Science, Princeton University, Princeton, NJ, USA
Tavis. J. Reed, Alicja Tadych & Olga. G. Troyanskaya
Department of Molecular Biology, Princeton University, Princeton, NJ, USA
Tavis. J. Reed, Matthew. D. Tyl & Ileana. M. Cristea
Flatiron Institute, Simons Foundation, New York City, NY, USA
Olga. G. Troyanskaya

Authors

Tavis. J. Reed
View author publications
You can also search for this author in PubMed Google Scholar
Matthew. D. Tyl
View author publications
You can also search for this author in PubMed Google Scholar
Alicja Tadych
View author publications
You can also search for this author in PubMed Google Scholar
Olga. G. Troyanskaya
View author publications
You can also search for this author in PubMed Google Scholar
Ileana. M. Cristea
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

T.J.R., O.G.T. and I.M.C. designed the research. T.J.R. designed and developed Tapioca. T.J.R. performed TPCA experiments, M.D.T. performed virology assays and T.J.R. and M.D.T. performed IP experiments. T.J.R. performed all data analysis. A.T. developed the Tapioca website. Paper writing was done by T.J.R., M.D.T., O.G.T. and I.M.C.

Corresponding authors

Correspondence to Olga. G. Troyanskaya or Ileana. M. Cristea.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Methods thanks Ben Collins, Mikhail Savitski and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available. Primary Handling Editor: Arunima Singh, in collaboration with the Nature Methods team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Evaluation of Tapioca, its sub-models, and Euclidean distance (Euc.) based PPI predictions.

a, 5-fold cross validation based evaluation by the area under a precision recall curve (AUPRC), n = 48 biologically independent samples. For boxplots, boxes show median, 25^th and 75^th percentile values, with the line within the box representing the median value, whiskers represent +/− 1.5 interquartile range, and points are outliers. b, 5-fold cross validation-based evaluation by one minus the false positivity rate (1-FPR), n = 48 biologically independent samples. For boxplots, boxes show median, 25^th and 75^th percentile values, with the line within the box representing the median value, whiskers represent +/− 1.5 interquartile range, and points are outliers. c, d,5-fold cross validation-based evaluation by (C) one minus the false positivity rate (1-FPR) and (D) the area under a precision recall curve (AUPRC). Logistic regression outperforms other machine learning methods by 1-FPR. Although random forest outperforms other sub-models by AUPRC, Tapioca integration of random forest sub-models gives inconsistent results, showing this machine learning method fails to generalize. Boxplot definitions can be found in the methods. For both figures C and D, n = 48 biologically independent samples. For boxplots, boxes show median, 25^th and 75^th percentile values, with the line within the box representing the median value, whiskers represent +/− 1.5 interquartile range, and points are outliers. e, f, Performance of (E) naïve bayes and (F) random forest based models. The standard deviation (SD) of the difference in PPI scores at 0 and 15 hours post infection in HSV-1 infection was used to assess dynamics. The random forest models often showed negative correlation with the dynamics data used in Tapioca sub-model integration, resulting in a high SD for the random forest Tapioca model. Given these negative correlations, it is likely that the random forest Tapioca model is failing to capture the true system dynamics, instead predicting inappropriate and artificial fluctuations. The 1-FPR values shown are based on only the values from the 0 and 15 hours post infection in HSV-1 datasets (that is, not all 48 datasets used in Fig. 1, Extended Data Fig. 1a–d).

Extended Data Fig. 2 KSHV TPCA data quality, number of predicted PPIs, and CORUM complex assembly.

a, Density plots comparing the log abundance for all temperatures, all time points, and all proteins (observed in both compared replicates) compared between all possible replicate pairs. b, The number of total predicted protein-protein interactions and the number of protein-protein interactions involving at least one KSHV protein at each time point. c, Depicted are 355 CORUM complexes (of a total of 1250 detected CORUM complexes) identified as assembled during at least one time point during KSHV reactivation from latency. Complexes are included in the heatmap if they were always observed with ≥ 50% subunit detection at all time points and achieved a minimum Tapioca score of 0.4 in at least one time point. The list of complexes within the heatmap can be found in Supplementary Table 8a.

Extended Data Fig. 3 Temporal dynamics of KSHV biological process interactions in KSHV reactivation of latency.

K-means clusters of the temporal dynamics of GO term enrichment of host interactors of KSHV proteins; the solid line represents the mean value, and the shaded region represents the 95% confidence interval.

Extended Data Fig. 4 NUCKS IP-MS Experiment.

Volcano plots of NUCKS IP-MS experiments performed using different combinations of two antibodies, two lysis buffers, and two strengths of nucleases. Interactors that pass significance thresholds, represented by dotted lines (fold change ≥ 2, p-value ≤ 0.05), are colored red (non-interactors that pass fold change ≤ −2, p-value ≤ 0.05 are also colored red). KSHV proteins are colored blue, regardless of significance. NUCKS is shown as a black ‘X’. All KSHV proteins and the top five interactors (ranked by p-value) are labeled. There was a wide distribution in the number of statistically significant PPIs identified by the different IP conditions, ranging from 1 to 304 proteins. In total, 392 interactors of NUCKS were identified. Significance was determined by two-tailed Student’s t test.

Extended Data Fig. 5 Comparison of IP-MS and Tapioca predicted NUCKS interactors.

a, UpSet plot comparing predicted NUCKS interactors by each IP-MS condition and Tapioca. Between 0% and 31% of interactors were commonly identified between any given pair of IP-MS conditions. The results of IPs are known to be heavily influenced by the lysis conditions and antibodies used, thus it is unsurprising that differential interactomes were identified between conditions. Nevertheless, the observed interactomes were largely thematically similar, capturing NUCKS interactors involved in chromosome regulation, RNA transport and processing, and DNA biosynthetic processes. Thus, it would seem that these IP conditions are leading to the capture of different subpopulations from the same overall NUCKS interactome. Significance was determined by two-tailed Student’s t test. b, Tapioca score versus the negative base 10 log of the IP-MS p-value for all proteins predicted to interact with NUCKS at 48 HPR that were detected by both Tapioca and IP-MS. We observed a little correlation (r = 0.037) between Tapioca score and the IP -log₁₀(p-value) for proteins in both interactomes (for proteins observed in both IP and TPCA buffer conditions, the maximum -log₁₀(p-value) was used). Significance was determined by two-tailed Student’s t test. c, The Tapioca score distributions for Tapioca predicted NUCKS interactors detected and not detected by IP-MS. There was a statistically significant difference in the distribution of Tapioca scores of NUCKS interactors detected and not detected by IP-MS, with those detected by IP-MS tending to have a higher Tapioca score. Significance was determined by two-tailed Student’s t test. d, Venn diagram of Tapioca and IP-MS NUCKS interactors showing the direct overlap, the overlap of IP-MS unique proteins which are known interactors of Tapioca unique interactors (and vice versa; see Methods), and the proteins with no connection between Tapioca and IP-MS. There was an overlap of 61 Tapioca and IP-MS identified NUCKS interactors, and 314 IP-MS unique proteins had known interactions with Tapioca unique proteins. Thus, it would seem that, similar to using different IP conditions between IP-MS experiments, Tapioca and IP-MS likely capture different subpopulations of the NUCKS interactome, here likely due to the major differences between IP and TPCA (Tapioca’s source of dynamics data in the KSHV experimentation) the methodologies.

Extended Data Fig. 6 Temporal dynamics of NUCKS biological process interactions in KSHV reactivation of latency. Tapioca randomization, and Tapioca versus sub-model scores.

a, K-means clusters of the temporal dynamics of GO term enrichment of host interactors of the host protein NUCKS; the solid line represents the mean value, and the shaded region represents the 95% confidence interval. b, Evaluation by area under a true positive rate versus false positive rate curve (AUC) of Tapioca predictions on datasets with randomized dynamics data, n = 48 biologically independent samples. Due to the use of static interaction data (which was not randomized), many sub-models better than random predictions, as intended. This is a byproduct of using a static gold standard. See Fig. 2, for evaluation of systems dynamics. For boxplots, boxes show median, 25^th and 75^th percentile values, with the line within the box representing the median value, whiskers represent +/− 1.5 interquartile range, and points are outliers. c, The distribution of score difference for a given PPI between Tapioca and a given sub-model, for PPIs that achieve a score of at least 0.5 by either the sub-model and/or Tapioca. Violin plot definitions can be found in the methods, n = 48 biologically independent samples. d, UpSet plot comparing, between sub-models, the proteins that most often have highly differentially predicted interactomes (99.9^th percentile) between the given sub-model and Tapioca. For brevity, only comparisons with an intersection size of > = 30 are shown. For violin plots, the white dot represents the median, the thick black bar represents the +/− 1.5 interquartile range, and the thin grey line represents the total range, excluding outliers.

Extended Data Fig. 7 GO Term enrichment of proteins with frequently highly differential predicted interactomes between Tapioca and one of its sub-models.

a, GO term enrichments from HumanBase (https://hb.flatironinstitute.org/) for the given sub-models. For an explanation of the HumanBase GO term enrichment nodes see Methods. The rest of the associated GO terms for each module (for example, M1) can be found in Supplementary Table 3a–h.

Extended Data Fig. 8 Differences in TPCA and IP-MS sample prep and detected interactomes.

IP-MS and TPCA methodologies take distinct approaches in the determine the interactors of a given protein, resulting in the detection of different subsets of true (biologically relevant) and false (biologically irrelevant) interactions. In IP-MS, cells are lysed prior to the detection of PPIs, which can result in the loss of interactors that do not survive the stringency of the lysis conditions. In addition, the loss of cellular compartments can lead to interactions that do not occur in the true biological system, leading to contamination of the MS determined interactome. Additionally, during the immunoaffinity purification contaminates can form non-specific interactions with the antibodies and beads used. In contrast, in TPCA, PPIs are ‘captured’ prior to cell lysis, during thermal denaturation, preventing non-specific interactions with contaminates and limiting the loss of interactions. Instead, interactome contaminates and loss of true PPIs are introduced upon interpretation of protein melting curves into PPIs. The random overlap of curves can lead to the prediction of false interactions, and true interactors with distant melting curves can be missed.

Extended Data Fig. 9 NUCKS melting curve behavior and 48 HPR interactome.

a, Diagram of how protein-non-protein interactions can influence the observed melting curve of a protein. b, The melting curve of NUCKS at 0 HPR and the melting curves three histone proteins that NUCKS interacts with (by both Tapioca and IP-MS); the solid line represents the mean value, and the shaded region represents the 95% confidence interval. c, Example of how NUCKS interactors become stabilized upon interaction with NUCKS; the solid line represents the mean value, and the shaded region represents the 95% confidence interval. d, Example of the distinct curve shapes of CORUM complexes with which NUCKS interacts with (Tapioca predicted interaction with ≥ 50% of complex subunits and interaction with at least one subunit confirmed by IP-MS). The median melting curve for all proteins is represented by the dotted blue line, and all melting curves (clipped to those with a maximum value of less than 1.3 for visualization) are shown in grey. The median melting curves per replicate is also plotted, with the shaded region representing the 95% confidence interval. e, Leiden community clustering on NUCKS 48 HPR interactome (only NUCKS interactors are shown, not NUCKS; edges shown are those with scores in the top 97^th percentile amongst NUCKS interactors).

Extended Data Fig. 10 NUCKS1 knockout efficiency.

a, b, c, Western blots showing NUCKS1 knockout efficiency in iSLK.219, HFF, and MRC5 cells. d, e, f, Results of TUNEL assays, used to evaluate cell viability, on iSLK.219, HFF, and MRC5 NUCKS1 knockout cells. Error bars represent the standard error.

Source data

Supplementary information

Supplementary Information

Legends for Supplementary Tables 1–11.

Reporting Summary

Peer Review File

Supplementary Table 1

Tapioca evaluation values per dataset used in evaluation, as well as tables describing the features used by Tapioca to predict PPIs.

Supplementary Table 2

Tapioca evaluation of systems dynamic capture.

Supplementary Table 3

Protein–protein interactions with substantially higher scores by submodel than by Tapioca and HumanBase GO term enrichment of proteins with frequently highly differential predicted interactomes between Tapioca and its submodels.

Supplementary Table 4

HumanBase GO term enrichment of proteins with unique PPIs predicted by CF-TPCA or I-PISA-TPCA compared to CF, I-PISA and TPCA datasets alone.

Supplementary Table 5

Evaluation of Euclidean distance-based PPI predictions for TPCA temperature optimization and lysis optimization experiments.

Supplementary Table 6

Spreadsheet containing raw and normalized TPCA data for temperature optimization, lysis optimization and KSHV experiments, as well as an example of calculating ratios from TPCA test mix samples.

Supplementary Table 7

CORUM complex Tapioca scores and percentage of subunits detected throughout KSHV reactivation experiment.

Supplementary Table 8

NUCKS IP–MS and IP–PRM data and HumanBase GO term enrichment of NUCKS IP–MS.

Supplementary Table 9

Interactome similarity scores between HSV-1, HCMV and KSHV proteins.

Supplementary Table 10

HumanBase GO term enrichment of NUCKS1 interactors throughout HSV-1 and HCMV infections.

Supplementary Table 11

UniProt accessions of proteins in NUCKS 48 HPR Leiden communities.

Source data

Source Data Extended Data Fig. 10

Unprocessed western blot.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Reed, T.J., Tyl, M.D., Tadych, A. et al. Tapioca: a platform for predicting de novo protein–protein interactions in dynamic contexts. Nat Methods 21, 488–500 (2024). https://doi.org/10.1038/s41592-024-02179-9

Download citation

Received: 06 April 2023
Accepted: 12 January 2024
Published: 15 February 2024
Issue Date: March 2024
DOI: https://doi.org/10.1038/s41592-024-02179-9

Subjects

Abstract

Access options

Similar content being viewed by others

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Extended data

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links