NicheNet: modeling intercellular communication by linking ligands to target genes

Abstract

Computational methods that model how gene expression of a cell is influenced by interacting cells are lacking. We present NicheNet (https://github.com/saeyslab/nichenetr), a method that predicts ligand–target links between interacting cells by combining their expression data with prior knowledge on signaling and gene regulatory networks. We applied NicheNet to tumor and immune cell microenvironment data and demonstrate that NicheNet can infer active ligands and their gene regulatory effects on interacting cells.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: General workflow and validation of NicheNet.
Fig. 2: NicheNet prioritizes CAF-ligands based on their potential to regulate p-EMT genes in adjacent malignant cells in HNSCC.

Data availability

No new data were generated for this study. All data used in this study are publicly available: accession codes for the public ligand treatment expression datasets are given in Supplementary Table 2, accession codes for the public case study datasets are given in the Methods sections Collection and processing of the human single-cell transcriptomics data from Puram et al. and Collection and processing of the mouse single-cell transcriptomics data from Medaglia et al. The data source networks, NicheNet’s ligand–target matrix and the processed ligand treatment and case study expression datasets are available on Zenodo (https://doi.org/10.5281/zenodo.3260758)59. Databases used to create the NicheNet model are mentioned and referred to in the Methods section Collection of ligand–receptor, intracellular signaling and gene regulatory interactions, in Supplementary Table 1 and in the Supplementary Notes. Source data underlying Figs. 1 and 2 are accessible from the respective figure legends.

Code availability

An open-source R implementation of NicheNet is available at GitHub (https://github.com/saeyslab/nichenetr). The release includes tutorials and example vignettes for the following analyses: ligand activity analysis on a gene set of interest, single-cell ligand activity analysis; ligand-to-target signaling path visualization and assessment of how well prioritized ligands explain changes in gene expression in the receiver cell. In addition, we provide tutorials on model validation (evaluation of target gene and ligand activity), model construction and parameter optimization such that users can benchmark their methods against NicheNet and construct their own models (for example, with their own data sources to include more cell type specificity). The final networks, ligand–target matrix and processed expression data used in the vignettes are available at Zenodo (https://doi.org/10.5281/zenodo.3260758)59. The scripts used for processing of data sources and to carry out all analyses from this study are also available at Zenodo (https://doi.org/10.5281/zenodo.3462199).

References

  1. 1.

    Yosef, N. & Regev, A. Writ large: genomic dissection of the effect of cellular environment on immune response. Science 354, 64–68 (2016).

  2. 2.

    Tanay, A. & Regev, A. Scaling single-cell genomics from phenomenology to mechanism. Nature 541, 331–338 (2017).

  3. 3.

    Ståhl, P. L. et al. Visualization and analysis of gene expression in tissue sections by spatial transcriptomics. Science 353, 78–82 (2016).

  4. 4.

    Rodriques, S. G. et al. Slide-seq: a scalable technology for measuring genome-wide expression at high spatial resolution. Science 363, 1463–1467 (2019).

  5. 5.

    Eng, C.-H. L. et al. Transcriptome-scale super-resolved imaging in tissues by RNA seqFISH. Nature 568, 235 (2019).

  6. 6.

    Vickovic, S. et al. High-definition spatial transcriptomics for in situ tissue profiling. Nat. Methods 16, 987–990 (2019).

  7. 7.

    Medaglia, C. et al. Spatial reconstruction of immune niches by combining photoactivatable reporters and scRNA-seq. Science 358, 1622–1626 (2017).

  8. 8.

    Boisset, J.-C. et al. Mapping the physical network of cellular interactions. Nat. Methods 15, 547–553 (2018).

  9. 9.

    Puram, S. V. et al. Single-cell transcriptomic analysis of primary and metastatic tumor ecosystems in head and neck cancer. Cell 171, 1611–1624.e24 (2017).

  10. 10.

    Camp, J. G. et al. Multilineage communication regulates human liver bud development from pluripotency. Nature 546, 533 (2017).

  11. 11.

    Pavličev, M. et al. Single-cell transcriptomics of the human placenta: inferring the cell communication network of the maternal-fetal interface. Genome Res. 27, 349–361 (2017).

  12. 12.

    Zepp, J. A. et al. Distinct mesenchymal lineages and niches promote epithelial self-renewal and myofibrogenesis in the lung. Cell 170, 1134–1148.e10 (2017).

  13. 13.

    Cohen, M. et al. Lung single-cell signaling interaction map reveals basophil role in macrophage imprinting. Cell 175, 1031–1044 (2018).

  14. 14.

    Vento-Tormo, R. et al. Single-cell reconstruction of the early maternal–fetal interface in humans. Nature 563, 347 (2018).

  15. 15.

    Choi, H. et al. Transcriptome analysis of individual stromal cell populations identifies stroma-tumor crosstalk in mouse lung cancer model. Cell Rep. 10, 1187–1201 (2015).

  16. 16.

    Komurov, K. Modeling community-wide molecular networks of multicellular systems. Bioinformatics 28, 694–700 (2012).

  17. 17.

    Krämer, A., Green, J., Pollard, J. & Tugendreich, S. Causal analysis approaches in Ingenuity Pathway Analysis. Bioinformatics 30, 523–530 (2014).

  18. 18.

    Valcourt, U., Kowanetz, M., Niimi, H., Heldin, C.-H. & Moustakas, A. TGF-β and the Smad signaling pathway support transcriptomic reprogramming during epithelial-mesenchymal cell transition. Mol. Biol. Cell 16, 1987–2002 (2005).

  19. 19.

    Bonnardel, J. et al. Stellate cells, hepatocytes, and endothelial cells imprint the kupffer cell identity on monocytes colonizing the liver macrophage niche. Immunity 51, 638–354 (2019).

  20. 20.

    Kanehisa, M., Furumichi, M., Tanabe, M., Sato, Y. & Morishima, K. KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res. 45, D353–D361 (2016).

  21. 21.

    Ramilowski, J. A. et al. A draft network of ligand–receptor-mediated multicellular signalling in human. Nat. Commun. 6, 7866 (2015).

  22. 22.

    Pawson, A. J. et al. The IUPHAR/BPS Guide to PHARMACOLOGY: an expert-driven knowledgebase of drug targets and their ligands. Nucleic Acids Res. 42, D1098–D1106 (2014).

  23. 23.

    Rouillard, A. D. et al. The harmonizome: a collection of processed datasets gathered to serve and mine knowledge about genes and proteins. Database 2016, baw100 (2016).

  24. 24.

    Türei, D., Korcsmáros, T. & Saez-Rodriguez, J. OmniPath: guidelines and gateway for literature-curated signaling pathway resources. Nat. Methods 13, 966–967 (2016).

  25. 25.

    Cerami, E. G. et al. Pathway Commons, a web resource for biological pathway data. Nucleic Acids Res. 39, D685–D690 (2011).

  26. 26.

    Li, T. et al. A scored human protein–protein interaction network to catalyze genomic interpretation. Nat. Methods 14, 61–64 (2017).

  27. 27.

    Kamburov, A., Stelzl, U., Lehrach, H. & Herwig, R. The ConsensusPathDB interaction database: 2013 update. Nucleic Acids Res. 41, D793–D800 (2013).

  28. 28.

    Vinayagam, A. et al. A directed protein interaction network for investigating intracellular signal transduction. Sci. Signal. 4, rs8–rs8 (2011).

  29. 29.

    Van Landeghem, S. et al. Exploring biomolecular literature with EVEX: connecting genes through events, homology, and indirect associations. Adv. Bioinforma. 2012, e582765 (2012).

  30. 30.

    Lachmann, A. & Ma’ayan, A. KEA: kinase enrichment analysis. Bioinformatics 25, 684–686 (2009).

  31. 31.

    Linding, R. et al. NetworKIN: a resource for exploring cellular phosphorylation networks. Nucleic Acids Res. 36, D695–D699 (2008).

  32. 32.

    Duan, G., Li, X. & Köhn, M. The human DEPhOsphorylation database DEPOD: a 2015 update. Nucleic Acids Res. 43, D531–D535 (2015).

  33. 33.

    Liu, Z.-P., Wu, C., Miao, H. & Wu, H. RegNetwork: an integrated database of transcriptional and post-transcriptional regulatory networks in human and mouse. Database 2015, bav095 (2015).

  34. 34.

    Han, H. et al. TRRUST: a reference database of human transcriptional regulatory interactions. Sci. Rep. 5, 11432 (2015).

  35. 35.

    Bovolenta, L. A., Acencio, M. L. & Lemke, N. HTRIdb: an open-access database for experimentally verified human transcriptional regulation interactions. BMC Genomics 13, 405 (2012).

  36. 36.

    Griffon, A. et al. Integrative analysis of public ChIP-seq experiments reveals a complex multi-cell regulatory landscape. Nucleic Acids Res. 43, e27–e27 (2015).

  37. 37.

    Jojic, V. et al. Identification of transcriptional regulators in the mouse immune system. Nat. Immunol. 14, 633–643 (2013).

  38. 38.

    Heng, T. S. P. et al. The immunological genome project: networks of gene expression in immune cells. Nat. Immunol. 9, 1091–1094 (2008).

  39. 39.

    Lachmann, A. et al. ChEA: transcription factor regulation inferred from integrating genome-wide ChIP-X experiments. Bioinformatics 26, 2438–2444 (2010).

  40. 40.

    Consortium, T. E. P. The ENCODE (ENCyclopedia Of DNA Elements) Project. Science 306, 636–640 (2004).

  41. 41.

    Mathelier, A. et al. JASPAR 2014: an extensively expanded and updated open-access database of transcription factor binding profiles. Nucleic Acids Res. 42, D142–D147 (2014).

  42. 42.

    Matys, V. et al. TRANSFAC® and its module TRANSCompel®: transcriptional gene regulation in eukaryotes. Nucleic Acids Res. 34, D108–D110 (2006).

  43. 43.

    Xie, X., Rigor, P. & Baldi, P. MotifMap: a human genome-wide map of candidate regulatory motif sites. Bioinformatics 25, 167–174 (2009).

  44. 44.

    Barrett, T. et al. NCBI GEO: archive for functional genomics data sets—update. Nucleic Acids Res. 41, D991–D995 (2013).

  45. 45.

    Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102, 15545–15550 (2005).

  46. 46.

    Page, L., Brin, S., Motwani, R. & Winograd, T. The PageRank Citation Ranking: Bringing Order to the Web (Stanford InfoLab, 1999).

  47. 47.

    Jeh, G. & Widom, J. Scaling personalized web search. in Proc. 12th International Conference on World Wide Web (eds., G. Hencsey & B. White) 271–279 (ACM, 2003).

  48. 48.

    Kolesnikov, N. et al. ArrayExpress update—simplifying data submissions. Nucleic Acids Res. 43, D1113–D1116 (2015).

  49. 49.

    Irizarry, R. A. et al. Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostat. Oxf. Engl. 4, 249–264 (2003).

  50. 50.

    Gautier, L., Cope, L., Bolstad, B. M. & Irizarry, R. A. affy—analysis of Affymetrix GeneChip data at the probe level. Bioinformatics 20, 307–315 (2004).

  51. 51.

    Ritchie, M. E. et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43, e47 (2015).

  52. 52.

    Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B Methodol. 57, 289–300 (1995).

  53. 53.

    Michaud, J. et al. Integrative analysis of RUNX1 downstream pathways and target genes. BMC Genomics 9, 363 (2008).

  54. 54.

    Janky, R. et al. iRegulon: from a gene list to a gene regulatory network using large motif and track collections. PLoS Comput. Biol. 10, e1003731 (2014).

  55. 55.

    Bischl, B. et al. mlrMBO: a modular framework for model-based optimization of expensive black-box functions. Preprint at https://arxiv.org/abs/1703.03373 (2017).

  56. 56.

    Horn, D. & Bischl, B. Multi-objective parameter configuration of machine learning algorithms using model-based optimization. in 2016 IEEE Symposium Series on Computational Intelligence (SSCI) (eds., Y. Jin & S. Kollias) 1–8 (2016).

  57. 57.

    Deb, K., Pratap, A., Agarwal, S. & Meyarivan, T. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6, 182–197 (2002).

  58. 58.

    Butler, A., Hoffman, P., Smibert, P., Papalexi, E. & Satija, R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 36, 411–420 (2018).

  59. 59.

    Browaeys, R., Saelens, W. & Saeys, Y. Development, Evaluation and Application of NicheNet: Datasets Version 2 (Zenodo, 2019); https://doi.org/10.5281/zenodo.3260758

Download references

Acknowledgements

R.B. and W.S. are supported by a PhD fellowship from The Research Foundation—Flanders (grant nos 1181318N to R.B. and 11Z4518N to W.S.). Y.S. is an ISAC Marylou Ingram scholar. This research received funding from the Flemish Government under the “Onderzoeksprogramma Artificiële Intelligentie (AI) Vlaanderen” program. The authors would also like to thank G. Bader (University of Toronto) for his insightful comments on the manuscript, P. De Bleser (Ghent University) for providing us with the processed ChIP-seq data from ReMap, and M. Guilliams, W. T’Jonck, J. Bonnardel and C. Scott (Ghent University) for helpful discussions about this project.

Author information

R.B., W.S. and Y.S. conceived the method. R.B. and W.S. designed the experiments. R.B. performed the experiments and implemented the method. W.S. and Y.S. supervised the work. R.B., W.S. and Y.S. wrote and approved the manuscript.

Correspondence to Yvan Saeys.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nicole Rusk and Lei Tang were the primary editors on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Integrated supplementary information

Supplementary Figure 1 Schematic overview of the methodology proposed to predict ligand-target links through data-integration.

Interactions inferred from ligand-receptor, signaling, and gene regulatory data sources are aggregated in respectively a weighted integrated ligand-signaling and gene regulatory network (line thickness corresponds to edge weight in this visualization). Having these two different integrated networks instead of one network is necessary because the edges represent a different type of interaction: a protein-protein interaction in the ligand-signaling network, and a gene regulatory interaction in the gene regulatory network. Data sources are aggregated in a weighted manner, meaning that a weight is assigned to every data source proportional to how the data source contributes to the final model performance, as automatically determined via model-based optimization via mlrMBO. These networks are then used to first link ligands to downstream transcriptional regulators and secondly to link these regulators to target genes. To link ligands to regulators, the weighted integrated ligand-signaling network is used to calculate for every ligand of interest an importance score for each downstream gene. Here we applied Personalized PageRank to calculate these importance scores which can be considered as a proxy for the probability that a particular ligand signals to a particular regulator. In the second phase, the matrix of ligand-regulator signaling scores is multiplied with the adjacency matrix of the weighted gene regulatory network to obtain ligand-target regulatory potential scores.

Supplementary Figure 2 Clustering of the ligand treatment datasets by their profile of differentially expressed genes.

The blue color indicates whether a gene (in the rows; n = 8986) was differentially expressed in a particular dataset (in the columns; GEO accession and profiled ligand are indicated; n = 111) or not. Dendrograms are result of hierarchical clustering with complete linkage and Pearson correlation as clustering distance.

Supplementary Figure 3 Limitations of using ligand treatment datasets for validation of NicheNet’s prior ligand-target model.

a) Popularity bias in the selection of ligands for which validation ligand treatment datasets were retrieved. All ligands in the NicheNet ligand-receptor data sources were ranked according to the number of PubMed publications in which they are described. Vertical black lines in the barcode plot are drawn when the transcriptional response to that ligand is assessed in one of the ligand treatment datasets used for validation in this study (111 datasets; 51 unique ligands). b) Fraction of differentially expressed (DE) genes that overlap between several ligand treatment datasets that profile the response to the same ligand. For ligands of which the transcriptional response was profiled in multiple expression datasets, we show which fraction of all DE genes of a ligand was observed in n of the ligand treatment datasets. A low overlap between different datasets of the same ligand indicate the influence of context-specificity (for example profiled cell type) on the transcriptional response to a ligand.

Supplementary Figure 4 Graphical overview of the procedure to assess the performance of NicheNet in predicting ligand activity from expression data.

First, we collected gene expression datasets in which the transcriptional response of cells was assessed after treatment with a specific ligand or ligand combination (blue: differentially expressed target genes). Secondly, we determined ligand activity prediction accuracy, by testing the ability of the model in predicting with which ligand the cells were treated based on the expression data. In other words, we evaluated how well NicheNet prioritizes ligands according to their potential to regulate a set of affected genes. This procedure is based on the following assumption: the better a ligand predicts the transcriptional response compared to other ligands, the more likely it is that this ligand is active. Therefore, we used the ligand-target matrix (purple palette of regulatory potential scores) to calculate feature importance measures (red palette, example measures: Pearson correlation coefficient, area under the receiver operating characteristic curve) for all ligands in every expression dataset separately. These feature importance measures are proportional to the ability of a ligand to predict the observed differential expression (for example, B -> A: feature importance measures indicate how well ligand B can predict the transcriptional response to ligand A). Then, we assessed how well each of these feature importance scores can predict whether a ligand was truly added to the cells (the activity state; green: active). The final model performance in ligand prediction is then the performance of the most predictive feature importance measure.

Supplementary Figure 5 Evaluation of the performance of NicheNet in predicting the transcriptional response: comparison to randomized networks, NicheNet models with fewer data sources, and NicheNet without parameter optimization.

For 111 ligand treatment expression datasets, we calculated several classification evaluation metrics to assess how well NicheNet predicts the set of genes that are differentially expressed in cells upon stimulation with a particular ligand. The target gene prediction performance was compared between: 1) 100 models constructed from 100 randomized networks; 2) 280 incomplete unoptimized NicheNet one-vs-one-vs-one models for which ligand-target regulatory potential scores were calculated after using only one comprehensive ligand-receptor, one signaling, and one gene regulatory database; 3) NicheNet containing all data sources, but without parameter optimization; 4) NicheNet containing all data sources and with optimized parameters. For 1) and 2), we calculated the median performance for each ligand treatment dataset over respectively all 100 and 280 models. Each dot indicates the performance for one dataset, and the black line indicates the median performance over all datasets. Several classification evaluation metrics were calculated: AUC-iRegulon (corrected): area under the cumulative recovery curve (considering the top 3% of the ranking), corrected for random prediction; AUPR (corrected): area under the precision-recall curve, corrected for random prediction; AUROC: area under the receiver operating characteristic curve; Mean-rank gene-set enrichment: negative natural logarithm of the mean-rank gene-set enrichment p-value. Red dashed line: performance of random guessing.

Supplementary Figure 6 Evaluation of the performance of NicheNet in predicting ligand activity: comparison to randomized networks, NicheNet models with fewer data sources, NicheNet without parameter optimization, CCCExplorer, and IPA Upstream Regulator Analysis.

For 111 ligand treatment expression datasets, we assessed the ability in predicting with which ligand cells were treated based on the set of genes that are differentially expressed upon ligand treatment. Ligand prediction performance was compared between: 1) 100 models constructed from 100 randomized networks; 2) 280 incomplete unoptimized NicheNet one-vs-one-vs-one models for which ligand-target regulatory potential scores were calculated after using only one comprehensive ligand-receptor, one signaling, and one gene regulatory database; 3) NicheNet containing all data sources, but without parameter optimization; 4) NicheNet containing all data sources and with optimized parameters; 5) Upstream Regulator Analysis of IPA; 6) CCCExplorer (adapted to run on ligand treatment datasets). For 1) and 2), we calculated the median performance for each ligand treatment dataset over respectively all 100 and 280 models. Each dot indicates the performance for one dataset, and the black line indicates the median performance over all datasets. Several classification evaluation metrics were calculated: AUPR (corrected): area under the precision-recall curve, corrected for random prediction; AUROC: area under the receiver operating characteristic curve and Pearson correlation. Red dashed line: performance of random guessing.

Supplementary Figure 7 Performance of several feature importance scores in predicting ligand activity.

For 111 ligand treatment expression datasets, we assessed the ability of NicheNet in predicting with which ligand cells were treated based on the set of genes that are differentially expressed upon ligand treatment. For this procedure, ligand-target regulatory potential scores were used to calculate feature importance measures for all ligands in every expression dataset separately (n = 111 datasets; 51 unique ligands) and these importance measures were in a next step used to predict the ligand activity state of each ligand in each dataset. The performance of each feature importance measure in ligand activity prediction (as evaluated by the AUROC and AUPR) is shown for different feature importance measures: spearman: Spearman’s rank correlation; pearson: Pearson correlation; mean_rank_GST_log_pval : negative natural logarithm of the mean-rank gene-set enrichment p-value; auc_iregulon_corrected: area under the cumulative recovery curve (considering the top 3% of the ranking), corrected for random prediction; aupr_corrected: area under the precision-recall curve, corrected for random prediction; auroc: area under the receiver operating characteristic curve. Boxplot elements: center line: median; box limits: upper and lower quartiles; whiskers: 1.5x interquartile range; points: outliers. The violin plots show the probability density of the data (tails of the violins are trimmed to the range of the data). Red dashed line: performance of random guessing.

Supplementary Figure 8 Comparison between NicheNet, one-vs-one-vs-one models and leave-one-in models in target gene and ligand activity prediction performance.

To investigate the importance of the integration of multiple complementary data sources, we compared the performance of NicheNet to ‘one-vs-one-vs-one’ and ‘leave-one-in’ models. In one-vs-one-vs-one models, ligand-target regulatory potential scores were calculated after using only one comprehensive ligand-receptor, one signaling and one gene regulatory database (n = 280 one-vs-one-vs-one models in total). In ligand-signaling leave-one-in models, one ligand-signaling data source was used with all gene regulatory data sources; in gene regulatory leave-one-in models the opposite (n = 37 ligand-signaling and n = 20 gene regulatory leave-one-in models in total). For each separate one-vs-one-vs-one and leave-one-in model, we calculated the median target gene and ligand activity prediction performances over all ligand treatment datasets (111 datasets for 51 unique ligands) and compared to the median performances of the complete NicheNet model. AUPR (corrected): area under the precision-recall curve, corrected for random prediction; AUROC: area under the receiver operating characteristic curve. Red dashed line: performance of random guessing.

Supplementary Figure 9 Target gene and ligand activity prediction performance of NicheNet of complete and leave-one-in models.

To assess the information content of individual data sources, we constructed ‘leave-one-in’ models of ligand-target regulatory potential scores in which every data source in a specific layer (ligand-signaling or gene regulatory) was left out, except a single data source of interest. Hereby, differences in model performances between leave-one-in models of the same layer can be attributed to differences in information content between the corresponding data sources. It should be remarked that a distinction between ligand-receptor and signaling interaction data sources was in this figure only made for visualization purposes, not for the construction of leave-one-in models. The target gene and ligand activity prediction performances of the individual ligand-signaling (red and dark blue) and gene regulatory (orange) leave-one-in models were compared to the performance of the complete final model (light blue) (n = 111 ligand treatment datasets for 51 unique ligands). AUPR (corrected): area under the precision-recall curve, corrected for random prediction; AUROC: area under the receiver operating characteristic curve. Boxplot elements: center line: median; box limits: upper and lower quartiles; whiskers: 1.5x interquartile range; points: outliers. Red dashed line: performance of random guessing.

Supplementary Figure 10 The contribution of individual data sources to target gene and ligand activity prediction performance can vary over different ligand treatment datasets.

To assess the effect of individual data sources on target gene and ligand activity performance prediction performance on individual ligand treatment datasets, we evaluated the performance of ‘leave-one-in’ models of ligand-target regulatory potential scores in which every data source in a specific layer (ligand-signaling or gene regulatory) was left out, except an individual data source of interest. Hereby, differences in model performances between leave-one-in models of the same layer can be attributed to differences in information content between the corresponding data sources. It should be remarked that a distinction between ligand-receptor and signaling interaction data sources was in this figure only made for visualization purposes, not for the construction of leave-one-in models. For two pairs of ligand treatment datasets, we compared in (a) the target gene prediction performances and in (b) the ligand activity prediction performances of the individual ligand-signaling (red and dark blue) and gene regulatory (orange) leave-one-in models to each other and to the complete final model (light blue). AUROC: area under the receiver operating characteristic curve. Red dashed line: performance of random guessing.

Supplementary Figure 11 Evaluation of cell type bias in target gene and ligand activity prediction performance of NicheNet.

To evaluate the possible presence of a cell type bias, we assessed the predictive performance of NicheNet on ligand treatment datasets that were only selected out of the total set of 111 datasets if they measure the response to a ligand for which the response was profiled in at least two different cell types that were stimulated by at least two different ligands. For the 43 selected ligand treatment expression datasets, we show the target gene (a-b) and ligand activity (c-d) prediction performances in function of the ligand added to the cells and the cell type that was stimulated. For each ligand-cell type combination, we also indicate the number of analyzed datasets. When multiple datasets were analyzed for a specific ligand-cell type combination, the average performance is shown on the heatmap. AUROC: area under the receiver operating characteristic curve; AUPR (corrected): area under the precision-recall curve, corrected for random prediction.

Supplementary Figure 12 Ligand rankings in ligand activity prediction: comparison between NicheNet, CCCExplorer and IPA Upstream Regulator Analysis.

For 111 ligand treatment datasets (profiling the transcriptional response to one or two out of 51 different ligands), the 51 corresponding ligands were ranked by NicheNet according to how well they predict the transcriptional response observed in a particular dataset (as evaluated via the Pearson correlation); by IPA Upstream Regulator Analysis and CCCExplorer according to the negative logarithm of the enrichment p-value given by both methods. We plot here the fraction of datasets for which the true active ligand was present in the top n of the ranking of ligands in function of n.

Supplementary Figure 13 Popularity bias of NicheNet, CCCExplorer and IPA Upstream Regulator Analysis in ligand activity prediction performance.

To analyze popularity bias in ligand activity prediction performance, we assessed performance after iteratively leaving out the ligand treatment datasets profiling the transcriptional response to the top n most popular ligands. Following ligand activity prediction evaluation metrics were calculated: AUPR (corrected): area under the precision-recall curve, corrected for random prediction; AUROC: area under the receiver operating characteristic curve. To analyze popularity bias and compare between NicheNet, CCCExplorer and IPA Upstream Regulator Analysis, we assessed the relation between popularity of ligands and median performance when datasets of these ligands are still included (n = 51 different ligands). The shown smoothing lines are the result of fitting a linear regression model to analyze this relation. The accompanying shaded region indicates the 95% confidence interval. The popularity of ligands is determined by the number of studies in PubMed in which a ligand is described.

Supplementary Figure 14 NicheNet can prioritize interactions between CAF-ligands and malignant cell receptors involved in the regulation of expression of p-EMT genes.

Summary result of NicheNet analysis: results are shown for the 20 (out of 131) CAF-ligands best predicting the p-EMT gene set. Top left: scaled expression of ligands in CAFs, averaged per tumor. Top center: Ligand-receptor interactions between CAF-ligands and receptors strongly expressed in malignant cells. Top right: outcome of NicheNet’s ligand activity prediction on the p-EMT gene set. As the ligand activity metric, we used the Pearson correlation coefficient (PCC) between regulatory potential scores and p-EMT gene set assignments (n = 6072 genes, of which 96 belong to the p-EMT program). Bottom center: scaled expression of receptors in malignant cells, averaged per tumor. Bottom right: each cell was scored for expression of genes belonging to the p-EMT program, and an average p-EMT score was calculated for every tumor.

Supplementary Figure 15 Network showing high-confidence interactions forming putative signaling paths between the ligand TGFB3 and its predicted target genes TGFBI, LAMC2 and TNC.

From NicheNet’s weighted integrated signaling and gene regulatory networks, we inferred the most important signaling mediators and transcriptional regulators involved in the signaling from TGFB3 to TGFBI, LAMC2, and TNC. Shown here is the network of interactions (as documented by NicheNet’s data sources) between ligand, signaling mediators, transcriptional regulators, and target genes. The ligand node is indicated in red, the target gene nodes in blue and nodes of signaling and transcriptional regulators in grey. Edge line thickness is proportional to the weight of the represented interaction in the weighted integrated networks. Edges representing signaling interactions are colored red, gene regulatory interactions blue.

Supplementary information

Supplementary Information

Supplementary Figs. 1–15 and Notes 1–15.

Reporting Summary

Supplementary Table 1

NicheNet’s data sources.

Supplementary Table 2

Ligand treatment datasets for validation of NicheNet’s prior model.

Supplementary Table 3

NicheNet’s performance in target gene prediction for 111 different ligand-treatment datasets.

Supplementary Table 4

Information on parameter optimization and data source characterization.

Supplementary Table 5

Results of NicheNet analysis on malignant cells and CAFs in HNSCC.

Supplementary Table 6

Literature validation of the role of top-ranked CAF-ligands in regulating EMT.

Supplementary Table 7

Ligand-target signaling network between the ligand TGFB3 and the predicted target genes TGFBI, LAMC2 and TNC.

Supplementary Table 8

Results NicheNet analysis to correlate single-cell ligand activities to pEMT scores.

Supplementary Table 9

Results of NicheNet analysis on immune cells in the inguinal lymph node after viral infection.

Source data

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Browaeys, R., Saelens, W. & Saeys, Y. NicheNet: modeling intercellular communication by linking ligands to target genes. Nat Methods (2019). https://doi.org/10.1038/s41592-019-0667-5

Download citation