SCENIC: single-cell regulatory network inference and clustering

Aibar, Sara; González-Blas, Carmen Bravo; Moerman, Thomas; Huynh-Thu, Vân Anh; Imrichova, Hana; Hulselmans, Gert; Rambow, Florian; Marine, Jean-Christophe; Geurts, Pierre; Aerts, Jan; van den Oord, Joost; Atak, Zeynep Kalender; Wouters, Jasper; Aerts, Stein

doi:10.1038/nmeth.4463

Brief Communication
Published: 09 October 2017

SCENIC: single-cell regulatory network inference and clustering

Nature Methods volume 14, pages 1083–1086 (2017)Cite this article

145k Accesses
2301 Citations
103 Altmetric
Metrics details

Subjects

Abstract

We present SCENIC, a computational method for simultaneous gene regulatory network reconstruction and cell-state identification from single-cell RNA-seq data (http://scenic.aertslab.org). On a compendium of single-cell data from tumors and brain, we demonstrate that cis-regulatory analysis can be exploited to guide the identification of transcription factors and cell states. SCENIC provides critical biological insights into the mechanisms driving cellular heterogeneity.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: The SCENIC workflow and its application to the mouse brain.**

**Figure 2: Cross-species comparison of neuronal networks and cell types.**

**Figure 3: SCENIC overcomes tumor effects and unravels relevant cell states and GRNs in cancer.**

Bayesian inference of gene expression states from single-cell RNA-seq data

Article 29 April 2021

Population-level comparisons of gene regulatory networks modeled on high-throughput single-cell transcriptomics data

Article Open access 04 March 2024

Sc-compReg enables the comparison of gene regulatory networks between conditions using single-cell data

Article Open access 06 August 2021

Accession codes

Primary accessions

Gene Expression Omnibus

GSE99466

Referenced accessions

Gene Expression Omnibus

References

Linnarsson, S. & Teichmann, S.A. Genome Biol. 17, 97 (2016).
Article PubMed PubMed Central Google Scholar
Wagner, A., Regev, A. & Yosef, N. Nat. Biotechnol. 34, 1145–1160 (2016).
Article CAS PubMed PubMed Central Google Scholar
Stegle, O., Teichmann, S.A. & Marioni, J.C. Nat. Rev. Genet. 16, 133–145 (2015).
Article CAS PubMed Google Scholar
Raj, A. & van Oudenaarden, A. Cell 135, 216–226 (2008).
Article CAS PubMed PubMed Central Google Scholar
Moignard, V. et al. Nat. Biotechnol. 33, 269–276 (2015).
Article CAS PubMed PubMed Central Google Scholar
Pina, C. et al. Cell Rep. 11, 1503–1510 (2015).
Article CAS PubMed PubMed Central Google Scholar
Guo, M., Wang, H., Potter, S.S., Whitsett, J.A. & Xu, Y. PLoS Comput. Biol. 11, e1004575 (2015).
Article PubMed PubMed Central Google Scholar
Huynh-Thu, V.A., Irrthum, A., Wehenkel, L. & Geurts, P. PLoS One 5, e12776 (2010).
Article PubMed PubMed Central Google Scholar
Zeisel, A. et al. Science 347, 1138–1142 (2015).
Article CAS PubMed Google Scholar
Kiselev, V.Y. et al. Nat. Methods 14, 483–486 (2017).
Article CAS PubMed PubMed Central Google Scholar
Lake, B.B. et al. Science 352, 1586–1590 (2016).
Article CAS PubMed PubMed Central Google Scholar
Darmanis, S. et al. Proc. Natl. Acad. Sci. USA 112, 7285–7290 (2015).
Article CAS PubMed PubMed Central Google Scholar
Tirosh, I. et al. Nature 539, 309–313 (2016).
Article PubMed PubMed Central Google Scholar
Tirosh, I. et al. Science 352, 189–196 (2016).
Article CAS PubMed PubMed Central Google Scholar
Alizadeh, A.A. et al. Nat. Med. 21, 846–853 (2015).
Article CAS PubMed PubMed Central Google Scholar
Johnson, W.E., Li, C. & Rabinovic, A. Biostatistics 8, 118–127 (2007).
Article PubMed Google Scholar
Ritchie, M.E. et al. Nucleic Acids Res. 43, e47 (2015).
Article PubMed PubMed Central Google Scholar
Perotti, V. et al. Oncogene 35, 2862–2872 (2016).
Article CAS PubMed Google Scholar
Chang, C.-Y. et al. Nature 495, 98–102 (2013).
Article CAS PubMed PubMed Central Google Scholar
Denny, S.K. et al. Cell 166, 328–342 (2016).
Article CAS PubMed PubMed Central Google Scholar
Müller, M.R. & Rao, A. Nat. Rev. Immunol. 10, 645–656 (2010).
Article PubMed Google Scholar
Regev, A. et al. bioRxiv Preprint at: http://www.biorxiv.org/content/early/2017/05/08/121202 (2017).
Zaharia, M. et al. In Proc. of the 9th USENIX Conference on Networked Systems Design and Implementation 2–2 (USENIX Association, 2012).
Marbach, D. et al. Nat. Methods 9, 796–804 (2012).
Article CAS PubMed PubMed Central Google Scholar
Islam, S. et al. Nat. Methods 11, 163–166 (2014).
Article CAS PubMed Google Scholar
Crow, M., Paul, A., Ballouz, S., Huang, Z.J. & Gillis, J. Genome Biol. 17, 101 (2016).
Article PubMed PubMed Central Google Scholar
Lun, A.T.L., McCarthy, D.J. & Marioni, J.C. F1000Res. 5, 2122 (2016).
PubMed PubMed Central Google Scholar
Friedman, J.H. Ann. Stat. 29, 1189–1232 (2001).
Article Google Scholar
Chen, T. & Guestrin, C. In Proc.of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 785–794 (ACM, 2016).
Freund, Y. & Schapire, R.E. Jinko Chino Gakkaishi 14, 771–780 (1999).
Google Scholar
Sławek, J. & Arodz´, T. BMC Syst. Biol. 7, 106 (2013).
Article PubMed PubMed Central Google Scholar
Dean, J. & Ghemawat, S. Commun. ACM 51, 107–113 (2008).
Article Google Scholar
Aerts, S. et al. PLoS Biol. 8, e1000435 (2010).
Article PubMed PubMed Central Google Scholar
Herrmann, C., Van de Sande, B., Potier, D. & Aerts, S. Nucleic Acids Res. 40, e114 (2012).
Article CAS PubMed PubMed Central Google Scholar
Janky, R. et al. PLoS Comput. Biol. 10, e1003731 (2014).
Article PubMed PubMed Central Google Scholar
Krijthe, J. Rtsne: t-distributed stochastic neighbor embedding using Barnes-Hut implementation https://github.com/jkrijthe/Rtsne (2015).
Marques, S. et al. Science 352, 1326–1329 (2016).
Article CAS PubMed PubMed Central Google Scholar
Macosko, E.Z. et al. Cell 161, 1202–1214 (2015).
Article CAS PubMed PubMed Central Google Scholar
Durinck, S. et al. Bioinformatics 21, 3439–3440 (2005).
Article CAS PubMed Google Scholar
Warde-Farley, D. et al. Nucleic Acids Res. 38, W214–W220 (2010).
Article CAS PubMed PubMed Central Google Scholar
Leek, J. sva: Surrogate Variable Analysis. R package version 3.24.4 (2017).
Smyth, G. limma: Linear models for microarray data. (2015).
Forbes, S.A. et al. Nucleic Acids Res. 45, D777–D783 (2017).
Article CAS PubMed Google Scholar
Edgar, R., Domrachev, M. & Lash, A.E. Nucleic Acids Res. 30, 207–210 (2002).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

This work is funded by The Research Foundation - Flanders (FWO; grants G.0640.13 and G.0791.14 to S. Aerts; G092916N to J.-C.M.), Special Research Fund (BOF) KU Leuven (grants PF/10/016 and OT/13/103 to S. Aerts), Foundation Against Cancer (2012-F2, 2016-070 and 2015-143 to S. Aerts) and ERC Consolidator Grant (724226_cis-CONTROL to S. Aerts). S. Aibar is supported by a PDM Postdoctoral Fellowship from the KU Leuven. Z.K.A. and J.W. are supported by postdoctoral fellowships from Kom op Tegen Kanker; V.A.H.-T. is supported by the F.R.S.-FNRS Belgium; and H.I. is supported by a PhD fellowship from the agency for Innovation by Science and Technology (IWT). Funding for T.M. and J.A. is provided by Symbiosys and IMEC HI^2 Data Science. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript. T.M. would like to thank J. Simm for helpful comments and suggestions regarding gradient boosting.

Author information

Authors and Affiliations

VIB Center for Brain & Disease Research, Laboratory of Computational Biology, Leuven, Belgium
Sara Aibar, Carmen Bravo González-Blas, Hana Imrichova, Gert Hulselmans, Zeynep Kalender Atak, Jasper Wouters & Stein Aerts
Department of Human Genetics, KU Leuven, Leuven, Belgium
Sara Aibar, Carmen Bravo González-Blas, Hana Imrichova, Gert Hulselmans, Zeynep Kalender Atak, Jasper Wouters & Stein Aerts
KU Leuven ESAT/STADIUS, VDA-lab, Leuven, Belgium
Thomas Moerman & Jan Aerts
IMEC Smart Applications and Innovation Services, Leuven, Belgium
Thomas Moerman & Jan Aerts
Department of Electrical Engineering and Computer Science, University of Liège, Liège, Belgium
Vân Anh Huynh-Thu & Pierre Geurts
VIB Center for Cancer Biology, Laboratory for Molecular Cancer Biology, Leuven, Belgium
Florian Rambow & Jean-Christophe Marine
Department of Oncology, KU Leuven, Leuven, Belgium
Florian Rambow & Jean-Christophe Marine
Department of Imaging and Pathology Translational Cell and Tissue Research, KU Leuven, Leuven, Belgium
Joost van den Oord & Jasper Wouters

Authors

Sara Aibar
View author publications
You can also search for this author in PubMed Google Scholar
Carmen Bravo González-Blas
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Moerman
View author publications
You can also search for this author in PubMed Google Scholar
Vân Anh Huynh-Thu
View author publications
You can also search for this author in PubMed Google Scholar
Hana Imrichova
View author publications
You can also search for this author in PubMed Google Scholar
Gert Hulselmans
View author publications
You can also search for this author in PubMed Google Scholar
Florian Rambow
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Christophe Marine
View author publications
You can also search for this author in PubMed Google Scholar
Pierre Geurts
View author publications
You can also search for this author in PubMed Google Scholar
Jan Aerts
View author publications
You can also search for this author in PubMed Google Scholar
Joost van den Oord
View author publications
You can also search for this author in PubMed Google Scholar
Zeynep Kalender Atak
View author publications
You can also search for this author in PubMed Google Scholar
Jasper Wouters
View author publications
You can also search for this author in PubMed Google Scholar
Stein Aerts
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

S. Aerts and S. Aibar conceived the study; S. Aibar implemented SCENIC and related packages with help of V.A.H.-T. and P.G. for GENIE3 and G.H. for RcisTarget; S. Aibar and C.B.G.-B. analyzed the data with the help of Z.K.A. and H.I.; T.M. and J.A. implemented GRNBoost; J.W. performed the IHC and knockdown experiments; F.R., J.-C.M. and J.v.d.O. contributed reagents and helped with the interpretation of the melanoma analyses; S. Aibar, J.W. and S. Aerts wrote the manuscript.

Corresponding author

Correspondence to Stein Aerts.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 The SCENIC workflow.

(a) In the first step, co-expression modules between transcription factors and candidate target genes are inferred with GENIE3 (Random Forest) or GRNBoost (Gradient Boosting). Each module consists of a transcription factor together with its predicted targets, purely based on co-expression. (b) In the second step, each co-expressed module is analyzed with RcisTarget to identify enriched motifs; only modules and targets for which the motif of the TF is enriched are retained. Each TF together with its potential direct targets is a regulon. (c) In the third step, the activity of each regulon in each cell is evaluated using AUCell, which calculates the Area Under the recovery Curve, integrating the expression ranks across all genes in a regulon. The AUCell scores are used to generate the Regulon Activity Matrix. This matrix can be binarized by setting an AUC threshold for each regulon, which will determine in which cells the regulon is “on”. (d) The Regulon Activity Matrix can be used to cluster the cells (e.g. t-SNE) and, thereby, identify cell types and states based on the shared activity of a regulatory subnetwork.

Supplementary Figure 2 AUCell applied to gene signatures of known cell types.

(a) AUC distributions for multiple gene-sets scored on the mouse brain data set with AUCell. The AUC represents the activity of the regulon or gene signature in each of the cells. The selection of cells with the regulon “active” is based on the distribution of the AUC across all the cells in the dataset. The ideal situation of a regulon or gene signature being active in only a subset of the cells would return a bimodal distribution (e.g. neurons or oligodendrocytes), or a distribution with a long tail (e.g. microglia). On the contrary, normal-like distributions are more likely to occur from non-differentially expressed gene sets. This situation is illustrated here through random gene sets (e.g. gene names taken randomly from the dataset) and housekeeping-like gene sets (genes detected in most cells). AUCell automatically explores the distributions of AUC scores and calculates several possible thresholds for each gene-set: (1) Inflection point of the density curve, which is usually a good option for the ideal situation with bimodal distributions (blue), and (2) Outliers of the global distribution (grey/green) sub-distributions (adjusting a mixture of two or three normal distributions, red or pink). The thresholds associated to these distributions are plotted in dashed lines over the histograms; the selected threshold for each gene-set is highlighted with a thicker continuous line. Note that the threshold selection in the current version is not exhaustive, and we highly recommend checking the AUC histograms and manually adjusting the threshold if needed. We also recommend being cautious about gene-sets with few genes (10-15) and thresholds that are set extremely low. (b) Expression-based t-SNEs (mouse brain dataset by Zeisel et al.) colored according to the AUC of each cell for the given gene-set. Shades of pink/red are used when the cell AUC is greater than the assignment threshold, in shades of blue otherwise. (c) Sensitivity and specificity calculated using the cell type provided in Zeisel et al. as correct labels, and the automatic AUCell assignment thresholds. Using the AUC (left) and the mean of the gene-set expression after normalization with Scran²⁶ (right).

Source of the gene sets: Cahoy et al.⁴⁵ (gene signatures with more than 1000 genes, top row), Lein et al.⁴⁶ (gene signatures with less than 100 genes for astrocytes and neurons), and Lavin et al.⁴⁷ (microglia: microglia versus other tissue-resident macrophages).

Supplementary Figure 3 Validation of the regulon-centric approach.

(a) Comparison of cell clustering resulting from the SCENIC regulon activity and from the TF expression alone. Left: Clustered binary regulon activity matrix. Right: Clustering based on the normalized expression of the 92 TFs (within-cluster size factor normalization with scran, heatmap color: median centered by gene). (b) AUC histograms for a few key regulons. The AUC allows to split the populations of cells with high versus low activity of a regulon. (c) t-SNE on the expression matrix (same input as to SCENIC: UMI counts with no further normalization) and (d) t-SNE on the binary regulon activity matrix. Both t-SNEs are PCA-based and colored according to the number of genes detected (expression over 0) in each cell. The clustering based on SCENIC effectively corrects for the intra-cluster bias, while the true biological difference between neurons (more genes expressed) and glia (less genes expressed) is unaffected.^12,48 (e) Comparison of SCENIC with alternative approaches for identification of cell-type associated TFs (see Methods for details). The bar plot on the right shows the number of TFs identified by each method (white) and the number of TFs in the validation set (colored). SCENIC retains more transcription factors compared to a differential expression analysis. (f) Proportion of TFs that can be detected by SCENIC. Venn diagram comparing the TFs detected in the mouse brain at protein level by Zhou et al.⁴⁹, in the scRNA-seq dataset by Zeisel et al,⁴⁹ and the TFs available in RcisTarget databases (i.e. known motif).

Supplementary Figure 4 Microglia gene regulatory network and association of brain networks with Alzheimer’s disease.

(a) The regulons associated to microglia, inferred on the mouse brain data, can be summarized based on the binding motif of the associated TF (network built in iRegulon). The genes that are included in a previously published microglia signature (Lavin et al.⁴⁷) are indicated by a larger font size; the color of the node indicates the number of regulators (lighter: fewer, darker: more). The predicted network for microglia contains many well-known regulators of microglial fate and/or microglial activation, including PU.1, Nfkb, Irf, and AP-1/Maf. (b) When we compared the predicted microglial network to previously published gene signatures of microglial “activation” in a mouse Alzheimer's disease model, we found the microglia network to be strongly activated and the neuronal network to be down-regulated during AD progression, indicating that the microglia network captures a relevant regulatory program. The plots shown are results of GSEA analysis of the networks associated to each of the wild type cell types (union of the genes in the regulons) against the gene expression-based ranking in a mouse model for Alzheimer's disease (AD). Dataset by Gjoneska et al.⁵⁰: transcriptional changes in hippocampus of CK-p25 mouse models of AD compared to CK littermate controls, 2 and 6 weeks after p25 induction.

Supplementary Figure 5 SCENIC is robust to down-sampling of cells and sparse expression matrices.

(a) SCENIC run on 100 random cells from the mouse brain dataset (Zeisel et al.) provides similar results to the run on the whole dataset (left column: cell states matching the cell types provided in the publication, similar relevant regulons per state, and significant overlap of targets). The GRN inferred with the 100 cells is then evaluated with AUCell on all cells to confirm that the network is generalizable to cells not included in the GRN inference. On the right column, same approach but simulating a sparse dataset (UMI count matrix divided by three and truncated, resulting on a median of 1121 detected genes). Many relevant TFs are not detected in the sparse dataset (so the associated regulons will be missed) but SCENIC is still able to identify the main cell types. (b) Evaluation of the stability of the results from SCENIC with 10 runs of SCENIC on the mouse brain dataset (Zeisel et al.). Top: Binary regulon activity matrices and t-SNEs (colored by the author’s cell-type labels). The aggregation of the 10 binary matrices illustrates the stability of the results across runs, and the large majority of top regulators are found in 10/10 runs. (c) t-SNE on the AUC regulon activity resulting from running SCENIC on 10 random subsets of 100 cells.

Supplementary Figure 6 SCENIC results for the human brain single-nuclei data set (Lake et al., 2016).

(a) Binary regulon activity matrix. (b) Three GRN-based subpopulations of human interneurons, with the main DNA motifs and transcription factors defining these groups (NFIX for the VIP interneurons, and ESRRG for the PV interneurons), and the top known markers identified in each (note that the TFs themselves are also up-regulated in the respective clusters). Bottom box: Expression of markers for interneuron subtypes.^51,52 In the first plot, interneurons are colored according to the marker with highest Z-score. The remaining plots are colored based on gene expression (grey: no expression, dark red: high expression, yellow: intermediate).

Supplementary Figure 7 Expression-based clustering of mouse and human brain cells.

Hierarchical clustering based on the merged expression matrix, Z-score normalized, of human and mouse brain cells. Clustering groups cells by species, then by cell type. The thumbnail shows Figure 2c, for comparison, where SCENIC yields a primary clustering of the cell types.

Supplementary Figure 8 SCENIC overcomes tumor batch effect and recovers relevant cell types and GRNs in oligodendroglioma.

(a) Comparison of batch-effect removal methods on the oligodendroglioma dataset. t-SNEs and diffusion plots on the raw expression matrix (first row), after correcting by tumor of origin with Combat or Limma (rows 2-3), or on the binary activity matrix from SCENIC (row 4). The cells are colored based on the tumor of origin or GRN activity (red: astrocyte-like regulons, green: oligodendrocyte-like regulons, blue: regulons related to cell cycle or stemness). (b) Simplified binary regulon activity matrix (output of SCENIC) for the oligodendroglioma dataset. Highlighted regulons (colored TF names) are known to be characteristic in oligodendrocytes or astrocytes, respectively.

Supplementary Figure 9 Oligodendrocyte differentiation is driven by discrete changes in gene regulatory networks.

Binary activity matrix highlighting transcription factors groups and their motifs. In the resulting t-SNE, cells are colored based on the average binary activity of three selected groups of regulons (red, green, blue), which correspond to the three main states in the differentiation trajectory: OPC network, driven by Bcl6 and co-regulatory factors; intermediate network driven by Bach2 and other factors; and mature oligodendrocyte network with many transcription factors including Etv1 and Nfel2l2. Sox10 is found as regulator of all subtypes of oligodendrocytes. Within the set of mature oligodendrocytes (blue cells), two outgroups are detected that fall slightly outside the differentiation trajectory: oligodendrocytes with neuronal properties (B) and oligodendrocytes with AP-1 activation signatures (C). Next to each cluster in the t-SNE, enriched GO terms are shown. In the box the t-SNEs are colored in an alternative scheme, showing other oligodendrocyte networks and states. Note that in spite to being cells in differentiation, the dominant networks are rather discrete. This suggests that transitions between these main states must occur rapidly, since only few cells were found in transition.

Supplementary Figure 10 SCENIC reveals melanoma heterogeneity.

(a) t-SNE on the binary activity matrix after applying SCENIC. (b) Binary regulon activity matrix for the melanoma dataset. The color bar above the heatmap indicates the tumor of origin; regulons associated to the cell cycle (green), MITF^low, invasive (pink) and MITF^high, usually known as proliferative (blue) states are zoomed in. (c) Details for the three most dominant networks. “Confirmed by ChIP-seq”: a tick indicates that the regulon presents enrichment of targets in a ChIP-seq dataset for the same transcription factor. (d) Comparison of TF expression and regulon activity. For four transcription factors: histogram of AUC values, together with the chosen cutoff (orange dashed line). In the second column, the cells with AUC value over the cutoff are shown in blue. These are the cells where the regulon is considered active (i.e. “1” in the binary activity matrix). In the third column, the actual AUC values are used to color the cells. In the fourth column, the expression of the transcription factor itself is shown. The discriminative power of the TFs is much lower than that of the regulons. (e) Expression of known melanoma markers. Note that the MITF^low cluster shows up-regulated WNT5A, LOXL2 and ZEB1 expression (both known markers of the invasive state^53,54), and correlates significantly with previously published invasive gene signatures (Figure S19). However, unlike the ‘classical’ invasive cell state, this MITF^low state retains SOX10 expression. (f) Comparison of melanoma bulk signatures with single-cell states. Left: The bulk signatures derived from invasive and proliferative melanoma states (e.g., Hoek and Verfaillie) are significantly enriched in the respective up- or down-regulated side of the gene ranking based on the single-cell states. In the GSEA, the ranking (x-axis) is based on the contrast between MITF-high versus MITF-low states. Right: Similar GSEA analysis, but now only the NES scores are shown. This analysis is the reciprocal to the previous one, whereby the ranking is based on the contrast in bulk samples, and the signatures tested are derived from differential expression between the single-cell MITF-high and MITF-low states.

Note that cells in the MITF^high state also have high activity of STAT and IRF downstream targets. This is difficult to detect in bulk samples because of the complex mixture of malignant cells with tumor infiltrating lymphocytes (TIL) where STAT and IRF also play an important role⁵⁵. Here, we find that the MITF^high cells themselves have higher STAT activity than the MITF^low cells (we excluded all benign cells from the analysis, including immune cells). This has important consequences for the interpretation and prediction of resistance to immune therapy, because these cancer cells with high STAT and IRF activity are likely most sensitive to immunotherapy. Indeed, a recent study identified the JAK-STAT-IRF axis as driver for the expression of two major targets in immune therapy: PD-L1 and PD-L2; which results in an inhibition of the anti-tumor immune response on the one hand, but an increased response to anti-PD(L)1 immune therapies on the other⁵⁵. Note also that the MITF-low “invasive” state largely shared by two of the 14 tumor biopsies, were both resected from auxiliary lymph nodes. This state, unlike the in vitro invasive state, which is driven by AP-1 and TEAD factors, features distinct transcription factors, including NFATC2 and NFIB, which we confirmed to be expressed in early metastatic melanoma cells (i.e. in the initial, small tumors in the sentinel lymph node, by immunohistochemistry). Using gene expression analysis after NFATC2 knock down (Supplementary Fig.13), we identified NFATC2 as a transcriptional repressor of the AP-1 and TEAD target genes. Thus, these observations suggest that NFATC2 may act as a transcriptional break that cells need to overcome to switch to a full-blown invasive cell state. NFATC2 is itself a JUN target⁵⁶, and may constitute a negative feedback mechanism. A similar repressor function of NFATC2 has been previously observed in breast cancer⁵⁷. We believe that this is the (biological) reason why AP-1 and TEAD are not detected as regulons in this data set. Note that for TEAD there is an additional reason that it cannot be detected as regulon, because our SCENIC run selects TFs co-expressed with their targets, while TEADs are regulated at the protein level.

Supplementary Figure 11 Validation of cycling cells and comparison with other methods.

(a) Identification of cell cycle cells based on the Z-score of gene sets related to cell cycle. (b) Heatmap showing the Z-score of cell cycle gene-sets on the Oligodendroglioma dataset. The blue/red bars on the top of the heatmap highlight the cells selected as cycling (red) by three approaches: (1) the AUC scores of SCENIC's E2F1 regulon; (2) the G1/S scores according to Tirosh et al. and the G2/M scores according to Tirosh et al; Tirosh et al. approach with a permissive cut-off (cells are classified as cycling if their G1/S and G2/M scores are above twice the mean within the cell population); Tirosh et al. approach with a more restrictive cut-off (cells are classified as cycling if their G1/S and G2/M scores are above four times the mean within the cell population); and (3) the GO genesets based Z-score. (c,d) Comparison of the capacity of different methods to identify the cycling cells: (c) Number of cells recovered, sorted by True Positive Rate (TPR, also known as sensitivity or recall). Colored bars represent the number of cell cycle cells identified by the method, in white the number of non-cell cycle cells included in the selected cluster (the cluster with the highest number of cell cycle cells). (d) Recall vs Precision (SCENIC_A: High-confidence cells, SCENIC_B: Lower confidence/regulon activity).

Supplementary Figure 12 Immunohistochemistry of NFATC2, NFIB and ZEB1 on human melanomas.

Complementary to Figure 3i. Here, we show additional biopsies, in different stages of melanoma progression (RGP: radial growth phase, VGP: vertical growth phase, SLN: sentinel lymp node (with small metastases), Metastasis: (full-blown) metastases).

The strongest positive signals for both NFATC2 and NFIB can be seen in the sentinel lymph nodes.

Supplementary Figure 13 Validation of target gene predictions included in the regulons.

(a) Z-score normalized expression of NFIB and NFATC2 across melanoma cell lines from COSMIC. A375 was selected for the knock-down based on the expression of key markers which resemble the MITF-low state. (b) Knock down of NFATC2 in the A375 melanoma cell line. GSEA plot for genes differentially expressed after NFATC2 knock-down: the predicted NFATC2 targets are significantly up-regulated in the NFATC2 knock-down. (c-f) Enrichment of ChIP-seq signal in selected regulons. c-d: Aggregation plots for MITF and STAT1 ChIP-seq signal on the predicted target CRMs of MITF, STAT, and NFATC2 (i.e. regulatory regions for genes in their respective regulons). e-f: Comparison of the ChIP-seq signal on the regulons (predicted CRMs in a window of 10kb around the TSS), with TF motif occurrences in promoter-proximal regions (of randomly selected genes, and of co-expressed genes outside the regulons). The enrichment for the genes in the regulon compared to the control confirms that SCENIC increases the specificity for finding direct targets compared to individual/alternative approaches (e.g. only co-expression or motif analysis).

Supplementary Figure 14 SCENIC analysis of >40000 single cells from the mouse retina.

Drop-seq data on cells from the mouse retina was analyzed with SCENIC by running GENIE3 on a subset of ~11K cells, and evaluating the resulting GRN on all cells. (a) tSNE colored according to the expected cell types as annotated by Macosko et al. (b, c). Main networks according to the activity of regulons, shown based in the red-green-blue coloring scheme. In these tSNEs, the cells shown in grey are not taking into account for the coloring. Logos corresponding to the significant motifs found in iRegulon for the regulons identified in Müller glia and the corresponding GO terms are included. The identified master regulators (such as Sox8/9, Hes1, Rax, Nr1h4, Srf, and Nr2e1 for the Muller glia, which are confirmed in literature^58–61, illustrate that correct networks can be inferred even on sparse data. The sub-sampling approach is especially interesting on sparse datasets, such as the ones resulting from Drop-seq or 10x, because the best quality-cells can be used to infer the GRN, and then this high-quality GRN can be scored on all cells.

Supplementary Figure 15 GRNBoost benchmark.

(a) Comparison of the performance of GRNBoost and GENIE3. Left: As a biological validation for GRNBoost, we devised a gene-set-based precision and recall benchmark. We asked wh1ether GRNBoost and GENIE3 predict similar sets of target genes for a selection of transcription factors (Olig1, Sox10, Rel, Lef1, Neurod1, Dlx1) from the Zeisel et al. mouse brain expression data. Using the network inferred by GENIE3, we constructed for each of the 6 TFs its ranked list of target genes. For each ranked list, we used GOrilla⁶² to query the top enriched GO term for each TF target list. For each of the 6 top GO terms, we consulted QuickGO⁶³ to obtain the set of association protein annotation gene symbols (filter by mouse taxon), keeping only the genes present in the expression matrix, to obtain six "master lists" of gene symbols. These master lists were finally used to calculate the precision and recall scores for the sets of target genes predicted by both GENIE3 and GRNBoost for each of the aforementioned TFs. Right: To benchmark the time performance of GRNBoost, and as proof of concept, we used the Chromium Megacell demonstration dataset, which contains over 1 million cells from embryonic mouse brain. We took random subsets of 3000, 10000 and 100,000 cells to infer the GRNs. (b) Evaluation of stability across multiple runs. Each scatter plot compares the ranking for the targets of each TF across two independent runs (with equal or different numbers of cells).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Aibar, S., González-Blas, C., Moerman, T. et al. SCENIC: single-cell regulatory network inference and clustering. Nat Methods 14, 1083–1086 (2017). https://doi.org/10.1038/nmeth.4463

Download citation

Received: 05 December 2016
Accepted: 07 September 2017
Published: 09 October 2017
Issue Date: 01 November 2017
DOI: https://doi.org/10.1038/nmeth.4463

This article is cited by

Sexual dimorphism in atherosclerotic plaques of aged Ldlr−/− mice
- Virginia Smit
- Jill de Mol
- Amanda C. Foks
Immunity & Ageing (2024)
Data normalization for addressing the challenges in the analysis of single-cell transcriptomic datasets
- Raquel Cuevas-Diaz Duran
- Haichao Wei
- Jiaqian Wu
BMC Genomics (2024)
Single cell analysis reveals the roles and regulatory mechanisms of type-I interferons in Parkinson’s disease
- Pusheng Quan
- Xueying Li
- Lifen Yao
Cell Communication and Signaling (2024)
Revealing the role of SPP1+ macrophages in glioma prognosis and therapeutic targeting by investigating tumor-associated macrophage landscape in grade 2 and 3 gliomas
- Wenshu Tang
- Cario W. S. Lo
- Brian H. Y. Chung
Cell & Bioscience (2024)
A comparison of marker gene selection methods for single-cell RNA sequencing data
- Jeffrey M. Pullin
- Davis J. McCarthy
Genome Biology (2024)

Subjects

Abstract

Access options

Similar content being viewed by others

Accession codes

Primary accessions

Gene Expression Omnibus

Referenced accessions

Gene Expression Omnibus

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Integrated supplementary information

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links