Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

BIONIC: biological network integration using convolutions

Abstract

Biological networks constructed from varied data can be used to map cellular function, but each data type has limitations. Network integration promises to address these limitations by combining and automatically weighting input information to obtain a more accurate and comprehensive representation of the underlying biology. We developed a deep learning-based network integration algorithm that incorporates a graph convolutional network framework. Our method, BIONIC (Biological Network Integration using Convolutions), learns features that contain substantially more functional information compared to existing approaches. BIONIC has unsupervised and semisupervised learning modes, making use of available gene function annotations. BIONIC is scalable in both size and quantity of the input networks, making it feasible to integrate numerous networks on the scale of the human genome. To demonstrate the use of BIONIC in identifying new biology, we predicted and experimentally validated essential gene chemical–genetic interactions from nonessential gene profiles in yeast.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: BIONIC algorithm overview.
Fig. 2: Comparison of BIONIC integration to three input networks.
Fig. 3: Comparison of BIONIC to existing integration approaches.
Fig. 4: Supervised performance of BIONIC compared with an existing supervised integration approach.
Fig. 5: Network quantity and network size performance comparison across integration methods.
Fig. 6: BIONIC essential gene chemical–genetic interaction predictions.

Similar content being viewed by others

Data availability

All data, standards, BIONIC yeast features and chemical–genetic interaction data are available in the following Figshare repository: https://figshare.com/projects/BIONIC_Biological_Network_Integration_using_Convolutions/122585. There are no restrictions on the data. Source data are provided with this paper.

Code availability

The BIONIC code is available at https://github.com/bowang-lab/BIONIC77. Code to reproduce the main figure analyses (Figs. 26) is available at https://github.com/duncster94/BIONIC-analyses78 and a library implementing the coannotation prediction, module detection and gene function prediction evaluations is available at https://github.com/duncster94/BIONIC-evals79. The BIONIC integrated yeast features (PEG features) can be explored at https://bionicviz.com.

References

  1. Fraser, A. G. & Marcotte, E. M. A probabilistic view of gene function. Nat. Genet. 36, 559 (2004).

    Article  CAS  PubMed  Google Scholar 

  2. Malod-Dognin, N. et al. Towards a data-integrated cell. Nat. Commun. 10, 805 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Wang, P., Gao, L., Hu, Y. & Li, F. Feature related multi-view nonnegative matrix factorization for identifying conserved functional modules in multiple biological networks. BMC Bioinf. 19, 394 (2018).

    Article  CAS  Google Scholar 

  4. Argelaguet, R. et al. Multi-omics factor analysis—a framework for unsupervised integration of multi-omics data sets. Mol. Syst. Biol. 14, e8124 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  5. Mostafavi, S., Ray, D., Warde-Farley, D., Grouios, C. & Morris, Q. GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function. Genome Biol. 9, S4 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  6. Wang, B. et al. Similarity network fusion for aggregating data types on a genomic scale. Nat. Methods 11, 333 (2014).

    Article  CAS  PubMed  Google Scholar 

  7. Cho, H. et al. Compact integration of multi-network topology for functional analysis of genes. Cell Syst. 3, 540–548.e5 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Huttenhower, C., Hibbs, M., Myers, C. & Troyanskaya, O. G. A scalable method for integration and functional analysis of multiple microarray datasets. Bioinformatics 22, 2890–2897 (2006).

    Article  CAS  PubMed  Google Scholar 

  9. von Mering, C. et al. STRING: a database of predicted functional associations between proteins. Nucleic Acids Res. 31, 258–261 (2003).

    Article  Google Scholar 

  10. Alexeyenko, A. & Sonnhammer, E. L. L. Global networks of functional coupling in eukaryotes from comprehensive data integration. Genome Res. 19, 1107–1116 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Gligorijević, V., Barot, M. & Bonneau, R. deepNF: deep network fusion for protein function prediction. Bioinformatics 34, 3873–3881 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  12. Perozzi, B., Al-Rfou, R. & Skiena, S. DeepWalk: online learning of social representations. In Proc. 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (eds Macskassy, S. & Perlich, C.) 701–710 (Association for Computing Machinery, 2014).

  13. Grover, A. & Leskovec, J. node2vec: scalable feature learning for networks. KDD 2016, 855–864 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  14. Kipf, T. N. & Welling, M. Semi-supervised classification with graph convolutional networks. In Proc. International Conference on Learning Representations (2017).

  15. Defferrard, M., Bresson, X. & Vandergheynst, P. Convolutional neural networks on graphs with fast localized spectral filtering. In Proc. Advances in Neural Information Processing Systems (NIPS 2016) Vol. 29, 3844-3852 (Curran Associates, Inc., 2016).

  16. Hamilton, W., Ying, Z. & Leskovec, J. Inductive representation learning on large graphs. In Proc. Advances in Neural Information Processing Systems (NIPS 2017) Vol. 30, 1024-1034 (Curran Associates, Inc., 2017).

  17. Veličković, P. et al. Graph attention networks. In Proc. International Conference on Learning Representations (2018).

  18. Piotrowski, J. S. et al. Functional annotation of chemical libraries across diverse biological processes. Nat. Chem. Biol. 13, 982–993 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 770–778 (IEEE, 2016).

  20. Krogan, N. J. et al. Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature 440, 637–643 (2006).

    Article  CAS  PubMed  Google Scholar 

  21. Hu, Z., Killion, P. J. & Iyer, V. R. Genetic reconstruction of a functional transcriptional regulatory network. Nat. Genet. 39, 683–687 (2007).

    Article  CAS  PubMed  Google Scholar 

  22. Costanzo, M. et al. A global genetic interaction network maps a wiring diagram of cellular function. Science 353, aaf1420 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  23. Myers, C. L. et al. Discovery of biological networks from diverse functional genomic data. Genome Biol. 6, R114 (2005).

    Article  PubMed  PubMed Central  Google Scholar 

  24. Cortes, C. & Vapnik, V. Support-vector networks. Mach. Learn. 20, 273–297 (1995).

    Article  Google Scholar 

  25. Vo, T. V. et al. A proteome-wide fission yeast interactome reveals network evolution principles from yeasts to human. Cell 164, 310–323 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Martín, R. et al. A PP2A-B55-mediated crosstalk between TORC1 and TORC2 regulates the differentiation response in fission yeast. Curr. Biol. 27, 175–188 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  27. Ryan, C. J. et al. Hierarchical modularity and the evolution of genetic interactomes across species. Mol. Cell 46, 691–704 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Ashburner, M. et al. Gene Ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25, 25–29 (2000).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Orchard, S. et al. The MIntAct project–IntAct as a common curation platform for 11 molecular interaction databases. Nucleic Acids Res. 42, D358–363 (2014).

    Article  CAS  PubMed  Google Scholar 

  30. Kanehisa, M. & Goto, S. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 28, 27–30 (2000).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Fernandez, C. F., Pannone, B. K., Chen, X., Fuchs, G. & Wolin, S. L. An Lsm2-Lsm7 complex in Saccharomyces cerevisiae associates with the small nucleolar RNA snR5. Mol. Biol. Cell 15, 2842–2852 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Chowdhury, A., Mukhopadhyay, J. & Tharun, S. The decapping activator Lsm1p-7p-Pat1p complex has the intrinsic ability to distinguish between oligoadenylated and polyadenylated RNAs. RNA 13, 998–1016 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Wilson, J. D., Baybay, M., Sankar, R., Stillman, P. & Popa, A. M. Analysis of population functional connectivity data via multilayer network embeddings. Netw. Sci. 9, 99–122 (2021).

    Article  Google Scholar 

  34. Huttlin, E. L. et al. Architecture of the human interactome defines protein communities and disease networks. Nature 545, 505 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Huttlin, E. L. et al. The bioplex network: a systematic exploration of the human interactome. Cell 162, 425–440 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Hein, M. Y. et al. A human interactome in three quantitative dimensions organized by stoichiometries and abundances. Cell 163, 712–723 (2015).

    Article  CAS  PubMed  Google Scholar 

  37. Rolland, T. et al. A proteome-scale map of the human interactome network. Cell 159, 1212–1226 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Roemer, T. & Boone, C. Systems-level antimicrobial drug and drug synergy discovery. Nat. Chem. Biol. 9, 222–231 (2013).

    Article  CAS  PubMed  Google Scholar 

  39. Ayscough, K. R. et al. High rates of actin filament turnover in budding yeast and roles for actin in establishment and maintenance of cell polarity revealed using the actin inhibitor latrunculin-A. J. Cell Biol. 137, 399–416 (1997).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Persaud, R. et al. Clionamines stimulate autophagy, inhibit Mycobacterium tuberculosis survival in macrophages, and target Pik1. Cell Chem. Biol. 29, 870–882 (2021).

  41. Simpkins, S. W. et al. Using BEAN-counter to quantify genetic interactions from multiplexed barcode sequencing experiments. Nat. Protoc. 14, 415–440 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Kato, N., Takahashi, S., Nogawa, T., Saito, T. & Osada, H. Construction of a microbial natural product library for chemical biology studies. Curr. Opin. Chem. Biol. 16, 101–108 (2012).

    Article  CAS  PubMed  Google Scholar 

  43. Protchenko, O., Rodriguez-Suarez, R., Androphy, R., Bussey, H. & Philpott, C. C. A screen for genes of heme uptake identifies the FLC family required for import of FAD into the endoplasmic reticulum. J. Biol. Chem. 281, 21445–21457 (2006).

    Article  CAS  PubMed  Google Scholar 

  44. Kitagaki, H., Wu, H., Shimoi, H. & Ito, K. Two homologous genes, DCW1 (YKL046c) and DFG5, are essential for cell growth and encode glycosylphosphatidylinositol (GPI)-anchored membrane proteins required for cell wall biogenesis in Saccharomyces cerevisiae. Mol. Microbiol. 46, 1011–1022 (2002).

    Article  CAS  PubMed  Google Scholar 

  45. Ram, A. F. et al. Loss of the plasma membrane-bound protein Gas1p in Saccharomyces cerevisiae results in the release of beta1,3-glucan into the medium and induces a compensation mechanism to ensure cell wall integrity. J. Bacteriol. 180, 1418–1424 (1998).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Tomishige, N. et al. Mutations that are synthetically lethal with a gas1Delta allele cause defects in the cell wall of Saccharomyces cerevisiae. Mol. Genet. Genomics 269, 562–573 (2003).

    Article  CAS  PubMed  Google Scholar 

  47. Ragni, E., Fontaine, T., Gissi, C., Latgè, J. P. & Popolo, L. The Gas family of proteins of Saccharomyces cerevisiae: characterization and evolutionary analysis. Yeast 24, 297–308 (2007).

    Article  CAS  PubMed  Google Scholar 

  48. Neiman, A. M., Mhaiskar, V., Manus, V., Galibert, F. & Dean, N. Saccharomyces cerevisiae HOC1, a suppressor of pkc1, encodes a putative glycosyltransferase. Genetics 145, 637–645 (1997).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Simpkins, S. W. et al. Predicting bioprocess targets of chemical compounds through integration of chemical-genetic and genetic interactions. PLoS Comput. Biol. 14, e1006532 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  50. Pasikowska, M., Palamarczyk, G. & Lehle, L. The essential endoplasmic reticulum chaperone Rot1 is required for protein N- and O-glycosylation in yeast. Glycobiology 22, 939–947 (2012).

    Article  CAS  PubMed  Google Scholar 

  51. Machi, K. et al. Rot1p of Saccharomyces cerevisiae is a putative membrane protein required for normal levels of the cell wall 1,6-beta-glucan. Microbiology 150, 3163–3173 (2004).

    Article  CAS  PubMed  Google Scholar 

  52. Levinson, J. N., Shahinian, S., Sdicu, A.-M., Tessier, D. C. & Bussey, H. Functional, comparative and cell biological analysis of Saccharomyces cerevisiae Kre5p. Yeast 19, 1243–1259 (2002).

    Article  CAS  PubMed  Google Scholar 

  53. Azuma, M., Levinson, J. N., Pagé, N. & Bussey, H. Saccharomyces cerevisiae Big1p, a putative endoplasmic reticulum membrane protein required for normal levels of cell wall beta-1,6-glucan. Yeast 19, 783–793 (2002).

    Article  CAS  PubMed  Google Scholar 

  54. Roemer, T., Delaney, S. & Bussey, H. SKN1 and KRE6 define a pair of functional homologs encoding putative membrane proteins involved in beta-glucan synthesis. Mol. Cell. Biol. 13, 4039–4048 (1993).

    CAS  PubMed  PubMed Central  Google Scholar 

  55. Kubo, K. et al. Jerveratrum-type steroidal alkaloids inhibit β-1,6-glucan biosynthesis in fungal cell walls. Microbiol. Spectr. 10, e0087321 (2022).

    Article  PubMed  Google Scholar 

  56. Usaj, M. et al. TheCellMap.org: a web-accessible database for visualizing and mining the global yeast genetic interaction network. G3 7, 1539–1549 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Elnaggar, A. et al. ProtTrans: towards cracking the language of lifes code through self-supervised deep learning and high performance computing. IEEE Trans. Pattern Anal. Mach. Intell. https://doi.org/10.1109/TPAMI.2021.3095381 (2021).

  58. Almagro Armenteros, J. J., Sønderby, C. K., Sønderby, S. K., Nielsen, H. & Winther, O. DeepLoc: prediction of protein subcellular localization using deep learning. Bioinformatics 33, 3387–3395 (2017).

    Article  PubMed  Google Scholar 

  59. Mattiazzi Usaj, M. et al. Systematic genetics and single‐cell imaging reveal widespread morphological pleiotropy and cell‐to‐cell variability. Mol. Syst. Biol. 16, 30 (2020).

    Article  Google Scholar 

  60. Paszke, A. et al. Automatic differentiation in PyTorch. in NIPS Autodiff Workshop (2017).

  61. Fey, M. & Lenssen, J. E. Fast Graph Representation Learning with PyTorch Geometric. in ICLR 2019 Workshop on Representation Learning on Graphs and Manifolds (2019).

  62. 1. Kingma, D. P. & Ba, J. Adam: A Method for Stochastic Optimization. in 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (eds. Bengio, Y. & LeCun, Y.) (2015).

  63. Stark, C. et al. BioGRID: a general repository for interaction datasets. Nucleic Acids Res. 34, D535–D539 (2006).

    Article  CAS  PubMed  Google Scholar 

  64. Hibbs, M. A. et al. Exploring the functional landscape of gene expression: directed search of large microarray compendia. Bioinformatics 23, 2692–2699 (2007).

    Article  CAS  PubMed  Google Scholar 

  65. Myers, C. L., Barrett, D. R., Hibbs, M. A., Huttenhower, C. & Troyanskaya, O. G. Finding function: evaluation methods for functional genomic data. BMC Genomics 7, 187 (2006).

    Article  PubMed  PubMed Central  Google Scholar 

  66. Aggarwal, C.C., Hinneburg, A., Keim, D.A. (2001). On the Surprising Behavior of Distance Metrics in High Dimensional Space. In: Van den Bussche, J., Vianu, V. (eds) Database Theory — ICDT 2001. ICDT 2001. Lecture Notes in Computer Science, vol 1973. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44503-X_27

  67. Davis, J. & Goadrich, M. The relationship between Precision-Recall and ROC curves. In Proc. 23rd International Conference on Machine Learning: June 2529, 2006; Pittsburgh, Pennsylvania (eds Cohen, W. W. & Moore, A.) 233–240 (ACM Press, 2006).

  68. Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).

    Article  Google Scholar 

  69. Friedman, J. H. Greedy function approximation: a gradient boosting machine. Ann. Stat. 29, 1189–1232 (2001).

    Article  Google Scholar 

  70. Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  71. Platt, J. C. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. in Advances in Large Margin Classifiers (eds Smola, A. J. et al.) 61-74 (MIT Press, 1999).

  72. Deshpande, R. et al. Efficient strategies for screening large-scale genetic interaction networks. Preprint at bioRxiv https://doi.org/10.1101/159632 (2017).

  73. Beyer, H. Tukey & John, W. Exploratory Data Analysis. Addison-Wesley Publishing Company Reading, Mass.—Menlo Park, cal., London, Amsterdam, Don Mills, Ontario, Sydney 1977, XVI, 688S. Biom. J. 23, 413–414 (1981).

    Article  Google Scholar 

  74. Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B Stat. Methodol. 57, 289–300 (1995).

    Google Scholar 

  75. Kitamura, A., Someya, K., Hata, M., Nakajima, R. & Takemura, M. Discovery of a small-molecule inhibitor of β-1,6-glucan synthesis. Antimicrob. Agents Chemother. 53, 670–677 (2009).

    Article  CAS  PubMed  Google Scholar 

  76. Yamanaka, D. et al. Development of a novel β-1,6-glucan-specific detection system using functionally-modified recombinant endo-β-1,6-glucanase. J. Biol. Chem. 295, 5362–5376 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  77. Forster, D. Biological Network Integration using Convolutions (BIONIC) v.0.2.4. Zenodo https://doi.org/10.5281/zenodo.6762584 (2022).

  78. Forster, D. BIONIC analyses v.0.1.0. Zenodo https://doi.org/10.5281/zenodo.6762596 (2022).

  79. Forster, D. BIONIC evaluations (BIONIC-evals) v.0.1.0. Zenodo https://doi.org/10.5281/zenodo.6762602 (2022).

Download references

Acknowledgements

We thank B. Andrews, M. Costanzo and C. Myers for their insightful comments. We also thank M. Fey for adding important features to the PyTorch Geometric library for us. This work was supported by NRNB (US National Institutes of Health, National Center for Research Resources grant number P41 GM103504). Funding for continued development and maintenance of Cytoscape is provided by the US National Human Genome Research Institute under award number HG009979. This work was also supported by the Canadian Institutes of Health Research Foundation grant number FDN-143264, US National Institutes of Health grant number R01HG005853 and joint funding by Genome Canada (OGI-163) and the Ministry of Economic Development, Job Creation and Trade, under the program Bioinformatics and Computational Biology. This work was supported by the National Research Council of Canada through the AI for Design program. This work was supported by CIFAR AI Chair programs. This work was also supported by JSPS KAKENHI grant numbers JP15H04483 (C.B. and Y.O.), JP17H06411 (C.B. and Y.Y.), JP18K14351 (K.I.-N.), JP19H03205 (Y.O.), JP20K07487 (D.Y.) and a RIKEN Foreign Postdoctoral Fellowship (S.C.L.). This research was enabled in part by support provided by SciNet and the Digital Research Alliance of Canada.

Author information

Authors and Affiliations

Authors

Contributions

D.T.F. conceived and developed the method and computational experiments. S.C.L. and M.Y. performed the chemical–genetic screens. Z.L. provided resources for the TS mutant collection. L.A.V.I. preprocessed and provided the chemical–genetic data. H.O. provided the chemical matter and information about the screened compounds. S.C.L. and Z.L. constructed the drug-hypersensitive TS mutant collection. K.I.-N., D.Y. and H.O. performed the jervine biochemical validation. D.T.F., S.C.L., Y.Y., Y.O., B.W., G.D.B. and C.B. wrote the manuscript. B.W., G.D.B. and C.B. conceived and supervised the project.

Corresponding authors

Correspondence to Bo Wang, Gary D. Bader or Charles Boone.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Methods thanks Kevin Yuk-Lap Yip and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Lin Tang, in collaboration with the Nature Methods team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Detailed view of individual BIONIC network encoder.

A more detailed view of an individual network encoder, including residual connections. A network specific graph convolutional network is used to encode the input network for increasing neighborhood sizes. The first GCN in the sequence learns features for a given node based on the node’s immediate neighborhood (1st order features). The next GCN learns features based on the node’s second order neighborhood (2nd order features), and so on. The node feature matrices learned by each GCN pass are summed together to create the final learned, network-specific features. Summing the outputs of the various GCNs in this way creates residual connections, allowing features from multiple neighborhood sizes to generate the final learned features, rather than just the final neighborhood size. This figure shows three GCN layers, but BIONIC uses the same pattern of connections for any number of GCN layers. Note that the GCN layers for a given encoder share their weights, so in effect, there is a single GCN layer for each encoder.

Extended Data Fig. 2 Comparison of individual network features produced by BIONIC.

A comparison of individual networks (denoted ‘Net’), their corresponding features encoded using the unsupervised BIONIC (denoted ‘BIONIC’), as well as the BIONIC integration of these networks (denoted ‘GI+COEX+PPI BIONIC’). BP = Biological Processes, GI = Genetic Interaction, COEX = Co-expression, PPI = Protein-protein Interaction. These are the same networks and evaluations used in Fig. 2. Data are presented as mean values. Error bars indicate the 95% confidence interval for n = 10 independent samples.

Extended Data Fig. 3 Dynamics of BIONIC feature space through training.

Comparison of pairwise gene similarities (cosine similarity in the case of BIONIC, direct binary adjacency in the case of the network), as defined by IntAct Complexes for known co-complex relationships (positive pairs) and no co-complex relationships (negative pairs), between a yeast PPI network (as used in the Fig. 2 analyses) and the unsupervised BIONIC features produced from this network. The BIONIC similarities are shown throughout the training process (epochs), whereas the input network is constant so its pairwise similarities do not change. ‘Network’ denotes the input PPI network, ‘BIONIC’ denotes the features learned from this network using BIONIC.

Extended Data Fig. 4 Coverage of BIONIC and input network captured modules.

Coverage of functional gene modules by individual networks and the unsupervised BIONIC integration of these networks (denoted BIONIC), as determined by a parameter optimized module detection analysis where the clustering parameters were optimized for each module individually. The number of captured modules is reported for a range of overlap scores (Jaccard threshold). Higher threshold indicates greater correspondence between the clusters obtained from the dataset and their respective modules given by the standard. PPI = protein-protein interaction. These are the same networks and BIONIC features as Fig. 2.

Extended Data Fig. 5 Captured modules comparison for BIONIC and input networks for optimal clustering parameters.

Known protein complexes (as defined by the IntAct standard) captured by individual networks and the unsupervised BIONIC integration of these networks (denoted BIONIC). Hierarchical clustering was performed on the datasets and resulting clusters were compared to known IntAct complexes and scored for set overlap using the Jaccard score (ranging from 0 to 1). The clustering algorithm parameters were optimized for each module individually, unlike the analysis in Fig. 2 where the clustering parameters were optimized for the standard as a whole. Each point is a protein complex, as in Fig. 2c. The dashed line indicates instances where the given data sets achieve the same score for a given module. Histograms indicate the distribution of overlap (Jaccard) scores for the given dataset, and the labelled dashed line indicates the mean of this distribution. The individual modules shown here as well as for the KEGG Pathways and IntAct Complexes module standards can be found in Supplementary Data File 4. The LSM2-7 complex is indicated by the arrows. PPI = protein-protein interaction. This analysis uses the same networks and BiONIC features as Fig. 2.

Extended Data Fig. 6 Interpretability of BIONIC feature space.

Co-annotation evaluations of the unsupervised BIONIC features subset to different clusters of feature dimensions (denoted ‘Cluster’). The number of feature dimensions for each cluster is given in parenthesis. The performance of the original BIONIC features (denoted BIONIC (512)) is also displayed. Data are presented as mean values. Bars indicate 95% confidence interval for n = 10 independent samples.

Extended Data Fig. 7 Integration method performance for yeast-two-hybrid network inputs.

Performance comparison of 5 yeast-two-hybrid network integrations across functional standards, evaluation types and unsupervised integration methods. Data are presented as mean values. Bars indicate 95% confidence interval for n = 10 independent samples. BP = Biological Process, multi-n2v = multi-node2vec.

Extended Data Fig. 8 Effects of label poisoning on BIONIC semi-supervised and unsupervised performance.

Semi-supervised BIONIC comparisons. a) A label poisoning experiment, where progressively more permutation noise is added to the label sets the semi-supervised BIONIC is trained on. ‘Noise’ indicates the proportion of permutation noise applied (multiply by 100 for percentages). Data are presented as mean values. Bars indicate 95% confidence interval for n = 10 independent samples. b) UMAP plots comparing the embedding space of the TFIID complex and the 100 nearest neighbors of this complex for unsupervised and semi-supervised BIONIC over a range of label noise values. SS = average silhouette score of TFIID members.

Extended Data Fig. 9 Computational scalability of BIONIC.

Graphics processing unit (GPU) memory usage in gigabytes (left) and average wall clock epoch time in minutes (right) for a range of network sizes and number of networks. GB = gigabyte, min = minutes. Gray squares indicate a scenario where BIONIC exceeded the maximum memory of the GPU and failed to complete. The experiments were run on a Titan Xp GPU with a 2.4 GHz Intel Xeon CPU and 32 GB of system memory.

Extended Data Fig. 10 β-1,6-glucan levels in yeast strains.

The amount of glucan per cell was calculated using pustulan as a standard. Data are presented as mean values. Error bars indicate standard deviation for n = 3 biologically independent samples. kre6Δ compared to wild type p-value = 0.01473, Jervine compares to wild type p-value = 0.01520. * Significant difference (p-value < 0.05 after Bonferroni correction, Welch’s one-sided t-test).

Supplementary information

Supplementary Information

Supplementary Figs. 1–7 and Notes 1–4.

Reporting Summary

Supplementary Data 1

Hyperparameter optimization results. Hyperparameter optimization results across integration methods integrating three S. pombe networks. The chosen (best) hyperparameter combinations for each method are highlighted.

Supplementary Data 2

Integrated network details. Publication, gene count, edge count and experimental type for each yeast network and each human network used in Figs. 2–6. Rows in yellow indicate the three yeast networks used in Figs. 2–4 and 6 integrations.

Supplementary Data 3

Evaluation standards details. Gene count, coannotation count, module count and class count details for each standard used in the Figs. 2–5 evaluations.

Supplementary Data 4

Module detection results. Overlap of standard-optimized clusters obtained from the Fig. 2c module detection analysis for networks as well as integration methods. Module standards are IntAct complexes, KEGG pathways and GO Biological Processes.

Supplementary Data 5

Extended Data Figs. 4–5 Module detection results. Overlap of known per-module-optimized clusters obtained from the Figs. 2 and 3 input networks and integration methods, with IntAct complex, KEGG pathway and GO Biological Process modules.

Supplementary Data 6

50 compound TS Aalele screen results. Files containing the TS allele chemical–genetic scores and IQR scores, screened against 50 compounds (at multiple concentrations) that were selected by BIONIC.

Supplementary Data 7

Essential gene compound sensitivity predictions. Essential yeast gene compound sensitivity predictions for 50 selected compounds using BIONIC.

Supplementary Data 8

Integrated BIONIC features. Learned BIONIC features from yeast networks (protein–protein interaction, coexpression and genetic interaction) integrated and used in Figs. 2, 3 and 6.

Supplementary Data 9

Evaluation standards. Yeast evaluation standards for coannotation prediction, module detection and gene function prediction used in Figs. 2a, 3a, 4 and 5a as well as the human coannotation standard used in Fig. 5b.

Source data

Source Data Fig. 2

Statistical source data.

Source Data Fig. 3

Statistical source data.

Source Data Fig. 4

Statistical source data.

Source Data Fig. 5

Statistical source data.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Forster, D.T., Li, S.C., Yashiroda, Y. et al. BIONIC: biological network integration using convolutions. Nat Methods 19, 1250–1261 (2022). https://doi.org/10.1038/s41592-022-01616-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41592-022-01616-x

This article is cited by

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research