Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

SCENIC: single-cell regulatory network inference and clustering

Abstract

We present SCENIC, a computational method for simultaneous gene regulatory network reconstruction and cell-state identification from single-cell RNA-seq data (http://scenic.aertslab.org). On a compendium of single-cell data from tumors and brain, we demonstrate that cis-regulatory analysis can be exploited to guide the identification of transcription factors and cell states. SCENIC provides critical biological insights into the mechanisms driving cellular heterogeneity.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Figure 1: The SCENIC workflow and its application to the mouse brain.
Figure 2: Cross-species comparison of neuronal networks and cell types.
Figure 3: SCENIC overcomes tumor effects and unravels relevant cell states and GRNs in cancer.

Accession codes

Primary accessions

Gene Expression Omnibus

Referenced accessions

Gene Expression Omnibus

References

  1. 1

    Linnarsson, S. & Teichmann, S.A. Genome Biol. 17, 97 (2016).

    Article  Google Scholar 

  2. 2

    Wagner, A., Regev, A. & Yosef, N. Nat. Biotechnol. 34, 1145–1160 (2016).

    CAS  Article  Google Scholar 

  3. 3

    Stegle, O., Teichmann, S.A. & Marioni, J.C. Nat. Rev. Genet. 16, 133–145 (2015).

    CAS  Article  Google Scholar 

  4. 4

    Raj, A. & van Oudenaarden, A. Cell 135, 216–226 (2008).

    CAS  Article  Google Scholar 

  5. 5

    Moignard, V. et al. Nat. Biotechnol. 33, 269–276 (2015).

    CAS  Article  Google Scholar 

  6. 6

    Pina, C. et al. Cell Rep. 11, 1503–1510 (2015).

    CAS  Article  Google Scholar 

  7. 7

    Guo, M., Wang, H., Potter, S.S., Whitsett, J.A. & Xu, Y. PLoS Comput. Biol. 11, e1004575 (2015).

    Article  Google Scholar 

  8. 8

    Huynh-Thu, V.A., Irrthum, A., Wehenkel, L. & Geurts, P. PLoS One 5, e12776 (2010).

    Article  Google Scholar 

  9. 9

    Zeisel, A. et al. Science 347, 1138–1142 (2015).

    CAS  Article  Google Scholar 

  10. 10

    Kiselev, V.Y. et al. Nat. Methods 14, 483–486 (2017).

    CAS  Article  Google Scholar 

  11. 11

    Lake, B.B. et al. Science 352, 1586–1590 (2016).

    CAS  Article  Google Scholar 

  12. 12

    Darmanis, S. et al. Proc. Natl. Acad. Sci. USA 112, 7285–7290 (2015).

    CAS  Article  Google Scholar 

  13. 13

    Tirosh, I. et al. Nature 539, 309–313 (2016).

    Article  Google Scholar 

  14. 14

    Tirosh, I. et al. Science 352, 189–196 (2016).

    CAS  Article  Google Scholar 

  15. 15

    Alizadeh, A.A. et al. Nat. Med. 21, 846–853 (2015).

    CAS  Article  Google Scholar 

  16. 16

    Johnson, W.E., Li, C. & Rabinovic, A. Biostatistics 8, 118–127 (2007).

    Article  Google Scholar 

  17. 17

    Ritchie, M.E. et al. Nucleic Acids Res. 43, e47 (2015).

    Article  Google Scholar 

  18. 18

    Perotti, V. et al. Oncogene 35, 2862–2872 (2016).

    CAS  Article  Google Scholar 

  19. 19

    Chang, C.-Y. et al. Nature 495, 98–102 (2013).

    CAS  Article  Google Scholar 

  20. 20

    Denny, S.K. et al. Cell 166, 328–342 (2016).

    CAS  Article  Google Scholar 

  21. 21

    Müller, M.R. & Rao, A. Nat. Rev. Immunol. 10, 645–656 (2010).

    Article  Google Scholar 

  22. 22

    Regev, A. et al. bioRxiv Preprint at: http://www.biorxiv.org/content/early/2017/05/08/121202 (2017).

  23. 23

    Zaharia, M. et al. In Proc. of the 9th USENIX Conference on Networked Systems Design and Implementation 2–2 (USENIX Association, 2012).

  24. 24

    Marbach, D. et al. Nat. Methods 9, 796–804 (2012).

    CAS  Article  Google Scholar 

  25. 25

    Islam, S. et al. Nat. Methods 11, 163–166 (2014).

    CAS  Article  Google Scholar 

  26. 26

    Crow, M., Paul, A., Ballouz, S., Huang, Z.J. & Gillis, J. Genome Biol. 17, 101 (2016).

    Article  Google Scholar 

  27. 27

    Lun, A.T.L., McCarthy, D.J. & Marioni, J.C. F1000Res. 5, 2122 (2016).

    PubMed  PubMed Central  Google Scholar 

  28. 28

    Friedman, J.H. Ann. Stat. 29, 1189–1232 (2001).

    Article  Google Scholar 

  29. 29

    Chen, T. & Guestrin, C. In Proc.of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 785–794 (ACM, 2016).

  30. 30

    Freund, Y. & Schapire, R.E. Jinko Chino Gakkaishi 14, 771–780 (1999).

    Google Scholar 

  31. 31

    Sławek, J. & Arodz´, T. BMC Syst. Biol. 7, 106 (2013).

    Article  Google Scholar 

  32. 32

    Dean, J. & Ghemawat, S. Commun. ACM 51, 107–113 (2008).

    Article  Google Scholar 

  33. 33

    Aerts, S. et al. PLoS Biol. 8, e1000435 (2010).

    Article  Google Scholar 

  34. 34

    Herrmann, C., Van de Sande, B., Potier, D. & Aerts, S. Nucleic Acids Res. 40, e114 (2012).

    CAS  Article  Google Scholar 

  35. 35

    Janky, R. et al. PLoS Comput. Biol. 10, e1003731 (2014).

    Article  Google Scholar 

  36. 36

    Krijthe, J. Rtsne: t-distributed stochastic neighbor embedding using Barnes-Hut implementation https://github.com/jkrijthe/Rtsne (2015).

  37. 37

    Marques, S. et al. Science 352, 1326–1329 (2016).

    CAS  Article  Google Scholar 

  38. 38

    Macosko, E.Z. et al. Cell 161, 1202–1214 (2015).

    CAS  Article  Google Scholar 

  39. 39

    Durinck, S. et al. Bioinformatics 21, 3439–3440 (2005).

    CAS  Article  Google Scholar 

  40. 40

    Warde-Farley, D. et al. Nucleic Acids Res. 38, W214–W220 (2010).

    CAS  Article  Google Scholar 

  41. 41

    Leek, J. sva: Surrogate Variable Analysis. R package version 3.24.4 (2017).

  42. 42

    Smyth, G. limma: Linear models for microarray data. (2015).

  43. 43

    Forbes, S.A. et al. Nucleic Acids Res. 45, D777–D783 (2017).

    CAS  Article  Google Scholar 

  44. 44

    Edgar, R., Domrachev, M. & Lash, A.E. Nucleic Acids Res. 30, 207–210 (2002).

    CAS  Article  Google Scholar 

Download references

Acknowledgements

This work is funded by The Research Foundation - Flanders (FWO; grants G.0640.13 and G.0791.14 to S. Aerts; G092916N to J.-C.M.), Special Research Fund (BOF) KU Leuven (grants PF/10/016 and OT/13/103 to S. Aerts), Foundation Against Cancer (2012-F2, 2016-070 and 2015-143 to S. Aerts) and ERC Consolidator Grant (724226_cis-CONTROL to S. Aerts). S. Aibar is supported by a PDM Postdoctoral Fellowship from the KU Leuven. Z.K.A. and J.W. are supported by postdoctoral fellowships from Kom op Tegen Kanker; V.A.H.-T. is supported by the F.R.S.-FNRS Belgium; and H.I. is supported by a PhD fellowship from the agency for Innovation by Science and Technology (IWT). Funding for T.M. and J.A. is provided by Symbiosys and IMEC HI^2 Data Science. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript. T.M. would like to thank J. Simm for helpful comments and suggestions regarding gradient boosting.

Author information

Affiliations

Authors

Contributions

S. Aerts and S. Aibar conceived the study; S. Aibar implemented SCENIC and related packages with help of V.A.H.-T. and P.G. for GENIE3 and G.H. for RcisTarget; S. Aibar and C.B.G.-B. analyzed the data with the help of Z.K.A. and H.I.; T.M. and J.A. implemented GRNBoost; J.W. performed the IHC and knockdown experiments; F.R., J.-C.M. and J.v.d.O. contributed reagents and helped with the interpretation of the melanoma analyses; S. Aibar, J.W. and S. Aerts wrote the manuscript.

Corresponding author

Correspondence to Stein Aerts.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 The SCENIC workflow.

(a) In the first step, co-expression modules between transcription factors and candidate target genes are inferred with GENIE3 (Random Forest) or GRNBoost (Gradient Boosting). Each module consists of a transcription factor together with its predicted targets, purely based on co-expression. (b) In the second step, each co-expressed module is analyzed with RcisTarget to identify enriched motifs; only modules and targets for which the motif of the TF is enriched are retained. Each TF together with its potential direct targets is a regulon. (c) In the third step, the activity of each regulon in each cell is evaluated using AUCell, which calculates the Area Under the recovery Curve, integrating the expression ranks across all genes in a regulon. The AUCell scores are used to generate the Regulon Activity Matrix. This matrix can be binarized by setting an AUC threshold for each regulon, which will determine in which cells the regulon is “on”. (d) The Regulon Activity Matrix can be used to cluster the cells (e.g. t-SNE) and, thereby, identify cell types and states based on the shared activity of a regulatory subnetwork.

Supplementary Figure 2 AUCell applied to gene signatures of known cell types.

(a) AUC distributions for multiple gene-sets scored on the mouse brain data set with AUCell. The AUC represents the activity of the regulon or gene signature in each of the cells. The selection of cells with the regulon “active” is based on the distribution of the AUC across all the cells in the dataset. The ideal situation of a regulon or gene signature being active in only a subset of the cells would return a bimodal distribution (e.g. neurons or oligodendrocytes), or a distribution with a long tail (e.g. microglia). On the contrary, normal-like distributions are more likely to occur from non-differentially expressed gene sets. This situation is illustrated here through random gene sets (e.g. gene names taken randomly from the dataset) and housekeeping-like gene sets (genes detected in most cells). AUCell automatically explores the distributions of AUC scores and calculates several possible thresholds for each gene-set: (1) Inflection point of the density curve, which is usually a good option for the ideal situation with bimodal distributions (blue), and (2) Outliers of the global distribution (grey/green) sub-distributions (adjusting a mixture of two or three normal distributions, red or pink). The thresholds associated to these distributions are plotted in dashed lines over the histograms; the selected threshold for each gene-set is highlighted with a thicker continuous line. Note that the threshold selection in the current version is not exhaustive, and we highly recommend checking the AUC histograms and manually adjusting the threshold if needed. We also recommend being cautious about gene-sets with few genes (10-15) and thresholds that are set extremely low. (b) Expression-based t-SNEs (mouse brain dataset by Zeisel et al.) colored according to the AUC of each cell for the given gene-set. Shades of pink/red are used when the cell AUC is greater than the assignment threshold, in shades of blue otherwise. (c) Sensitivity and specificity calculated using the cell type provided in Zeisel et al. as correct labels, and the automatic AUCell assignment thresholds. Using the AUC (left) and the mean of the gene-set expression after normalization with Scran26 (right).

Source of the gene sets: Cahoy et al.45 (gene signatures with more than 1000 genes, top row), Lein et al.46 (gene signatures with less than 100 genes for astrocytes and neurons), and Lavin et al.47 (microglia: microglia versus other tissue-resident macrophages).

Supplementary Figure 3 Validation of the regulon-centric approach.

(a) Comparison of cell clustering resulting from the SCENIC regulon activity and from the TF expression alone. Left: Clustered binary regulon activity matrix. Right: Clustering based on the normalized expression of the 92 TFs (within-cluster size factor normalization with scran, heatmap color: median centered by gene). (b) AUC histograms for a few key regulons. The AUC allows to split the populations of cells with high versus low activity of a regulon. (c) t-SNE on the expression matrix (same input as to SCENIC: UMI counts with no further normalization) and (d) t-SNE on the binary regulon activity matrix. Both t-SNEs are PCA-based and colored according to the number of genes detected (expression over 0) in each cell. The clustering based on SCENIC effectively corrects for the intra-cluster bias, while the true biological difference between neurons (more genes expressed) and glia (less genes expressed) is unaffected.12,48 (e) Comparison of SCENIC with alternative approaches for identification of cell-type associated TFs (see Methods for details). The bar plot on the right shows the number of TFs identified by each method (white) and the number of TFs in the validation set (colored). SCENIC retains more transcription factors compared to a differential expression analysis. (f) Proportion of TFs that can be detected by SCENIC. Venn diagram comparing the TFs detected in the mouse brain at protein level by Zhou et al.49, in the scRNA-seq dataset by Zeisel et al,49 and the TFs available in RcisTarget databases (i.e. known motif).

Supplementary Figure 4 Microglia gene regulatory network and association of brain networks with Alzheimer’s disease.

(a) The regulons associated to microglia, inferred on the mouse brain data, can be summarized based on the binding motif of the associated TF (network built in iRegulon). The genes that are included in a previously published microglia signature (Lavin et al.47) are indicated by a larger font size; the color of the node indicates the number of regulators (lighter: fewer, darker: more). The predicted network for microglia contains many well-known regulators of microglial fate and/or microglial activation, including PU.1, Nfkb, Irf, and AP-1/Maf. (b) When we compared the predicted microglial network to previously published gene signatures of microglial “activation” in a mouse Alzheimer's disease model, we found the microglia network to be strongly activated and the neuronal network to be down-regulated during AD progression, indicating that the microglia network captures a relevant regulatory program. The plots shown are results of GSEA analysis of the networks associated to each of the wild type cell types (union of the genes in the regulons) against the gene expression-based ranking in a mouse model for Alzheimer's disease (AD). Dataset by Gjoneska et al.50: transcriptional changes in hippocampus of CK-p25 mouse models of AD compared to CK littermate controls, 2 and 6 weeks after p25 induction.

Supplementary Figure 5 SCENIC is robust to down-sampling of cells and sparse expression matrices.

(a) SCENIC run on 100 random cells from the mouse brain dataset (Zeisel et al.) provides similar results to the run on the whole dataset (left column: cell states matching the cell types provided in the publication, similar relevant regulons per state, and significant overlap of targets). The GRN inferred with the 100 cells is then evaluated with AUCell on all cells to confirm that the network is generalizable to cells not included in the GRN inference. On the right column, same approach but simulating a sparse dataset (UMI count matrix divided by three and truncated, resulting on a median of 1121 detected genes). Many relevant TFs are not detected in the sparse dataset (so the associated regulons will be missed) but SCENIC is still able to identify the main cell types. (b) Evaluation of the stability of the results from SCENIC with 10 runs of SCENIC on the mouse brain dataset (Zeisel et al.). Top: Binary regulon activity matrices and t-SNEs (colored by the author’s cell-type labels). The aggregation of the 10 binary matrices illustrates the stability of the results across runs, and the large majority of top regulators are found in 10/10 runs. (c) t-SNE on the AUC regulon activity resulting from running SCENIC on 10 random subsets of 100 cells.

Supplementary Figure 6 SCENIC results for the human brain single-nuclei data set (Lake et al., 2016).

(a) Binary regulon activity matrix. (b) Three GRN-based subpopulations of human interneurons, with the main DNA motifs and transcription factors defining these groups (NFIX for the VIP interneurons, and ESRRG for the PV interneurons), and the top known markers identified in each (note that the TFs themselves are also up-regulated in the respective clusters). Bottom box: Expression of markers for interneuron subtypes.51,52 In the first plot, interneurons are colored according to the marker with highest Z-score. The remaining plots are colored based on gene expression (grey: no expression, dark red: high expression, yellow: intermediate).

Supplementary Figure 7 Expression-based clustering of mouse and human brain cells.

Hierarchical clustering based on the merged expression matrix, Z-score normalized, of human and mouse brain cells. Clustering groups cells by species, then by cell type. The thumbnail shows Figure 2c, for comparison, where SCENIC yields a primary clustering of the cell types.

Supplementary Figure 8 SCENIC overcomes tumor batch effect and recovers relevant cell types and GRNs in oligodendroglioma.

(a) Comparison of batch-effect removal methods on the oligodendroglioma dataset. t-SNEs and diffusion plots on the raw expression matrix (first row), after correcting by tumor of origin with Combat or Limma (rows 2-3), or on the binary activity matrix from SCENIC (row 4). The cells are colored based on the tumor of origin or GRN activity (red: astrocyte-like regulons, green: oligodendrocyte-like regulons, blue: regulons related to cell cycle or stemness). (b) Simplified binary regulon activity matrix (output of SCENIC) for the oligodendroglioma dataset. Highlighted regulons (colored TF names) are known to be characteristic in oligodendrocytes or astrocytes, respectively.

Supplementary Figure 9 Oligodendrocyte differentiation is driven by discrete changes in gene regulatory networks.

Binary activity matrix highlighting transcription factors groups and their motifs. In the resulting t-SNE, cells are colored based on the average binary activity of three selected groups of regulons (red, green, blue), which correspond to the three main states in the differentiation trajectory: OPC network, driven by Bcl6 and co-regulatory factors; intermediate network driven by Bach2 and other factors; and mature oligodendrocyte network with many transcription factors including Etv1 and Nfel2l2. Sox10 is found as regulator of all subtypes of oligodendrocytes. Within the set of mature oligodendrocytes (blue cells), two outgroups are detected that fall slightly outside the differentiation trajectory: oligodendrocytes with neuronal properties (B) and oligodendrocytes with AP-1 activation signatures (C). Next to each cluster in the t-SNE, enriched GO terms are shown. In the box the t-SNEs are colored in an alternative scheme, showing other oligodendrocyte networks and states. Note that in spite to being cells in differentiation, the dominant networks are rather discrete. This suggests that transitions between these main states must occur rapidly, since only few cells were found in transition.

Supplementary Figure 10 SCENIC reveals melanoma heterogeneity.

(a) t-SNE on the binary activity matrix after applying SCENIC. (b) Binary regulon activity matrix for the melanoma dataset. The color bar above the heatmap indicates the tumor of origin; regulons associated to the cell cycle (green), MITFlow, invasive (pink) and MITFhigh, usually known as proliferative (blue) states are zoomed in. (c) Details for the three most dominant networks. “Confirmed by ChIP-seq”: a tick indicates that the regulon presents enrichment of targets in a ChIP-seq dataset for the same transcription factor. (d) Comparison of TF expression and regulon activity. For four transcription factors: histogram of AUC values, together with the chosen cutoff (orange dashed line). In the second column, the cells with AUC value over the cutoff are shown in blue. These are the cells where the regulon is considered active (i.e. “1” in the binary activity matrix). In the third column, the actual AUC values are used to color the cells. In the fourth column, the expression of the transcription factor itself is shown. The discriminative power of the TFs is much lower than that of the regulons. (e) Expression of known melanoma markers. Note that the MITFlow cluster shows up-regulated WNT5A, LOXL2 and ZEB1 expression (both known markers of the invasive state53,54), and correlates significantly with previously published invasive gene signatures (Figure S19). However, unlike the ‘classical’ invasive cell state, this MITFlow state retains SOX10 expression. (f) Comparison of melanoma bulk signatures with single-cell states. Left: The bulk signatures derived from invasive and proliferative melanoma states (e.g., Hoek and Verfaillie) are significantly enriched in the respective up- or down-regulated side of the gene ranking based on the single-cell states. In the GSEA, the ranking (x-axis) is based on the contrast between MITF-high versus MITF-low states. Right: Similar GSEA analysis, but now only the NES scores are shown. This analysis is the reciprocal to the previous one, whereby the ranking is based on the contrast in bulk samples, and the signatures tested are derived from differential expression between the single-cell MITF-high and MITF-low states.

Note that cells in the MITFhigh state also have high activity of STAT and IRF downstream targets. This is difficult to detect in bulk samples because of the complex mixture of malignant cells with tumor infiltrating lymphocytes (TIL) where STAT and IRF also play an important role55. Here, we find that the MITFhigh cells themselves have higher STAT activity than the MITFlow cells (we excluded all benign cells from the analysis, including immune cells). This has important consequences for the interpretation and prediction of resistance to immune therapy, because these cancer cells with high STAT and IRF activity are likely most sensitive to immunotherapy. Indeed, a recent study identified the JAK-STAT-IRF axis as driver for the expression of two major targets in immune therapy: PD-L1 and PD-L2; which results in an inhibition of the anti-tumor immune response on the one hand, but an increased response to anti-PD(L)1 immune therapies on the other55. Note also that the MITF-low “invasive” state largely shared by two of the 14 tumor biopsies, were both resected from auxiliary lymph nodes. This state, unlike the in vitro invasive state, which is driven by AP-1 and TEAD factors, features distinct transcription factors, including NFATC2 and NFIB, which we confirmed to be expressed in early metastatic melanoma cells (i.e. in the initial, small tumors in the sentinel lymph node, by immunohistochemistry). Using gene expression analysis after NFATC2 knock down (Supplementary Fig.13), we identified NFATC2 as a transcriptional repressor of the AP-1 and TEAD target genes. Thus, these observations suggest that NFATC2 may act as a transcriptional break that cells need to overcome to switch to a full-blown invasive cell state. NFATC2 is itself a JUN target56, and may constitute a negative feedback mechanism. A similar repressor function of NFATC2 has been previously observed in breast cancer57. We believe that this is the (biological) reason why AP-1 and TEAD are not detected as regulons in this data set. Note that for TEAD there is an additional reason that it cannot be detected as regulon, because our SCENIC run selects TFs co-expressed with their targets, while TEADs are regulated at the protein level.

Supplementary Figure 11 Validation of cycling cells and comparison with other methods.

(a) Identification of cell cycle cells based on the Z-score of gene sets related to cell cycle. (b) Heatmap showing the Z-score of cell cycle gene-sets on the Oligodendroglioma dataset. The blue/red bars on the top of the heatmap highlight the cells selected as cycling (red) by three approaches: (1) the AUC scores of SCENIC's E2F1 regulon; (2) the G1/S scores according to Tirosh et al. and the G2/M scores according to Tirosh et al; Tirosh et al. approach with a permissive cut-off (cells are classified as cycling if their G1/S and G2/M scores are above twice the mean within the cell population); Tirosh et al. approach with a more restrictive cut-off (cells are classified as cycling if their G1/S and G2/M scores are above four times the mean within the cell population); and (3) the GO genesets based Z-score. (c,d) Comparison of the capacity of different methods to identify the cycling cells: (c) Number of cells recovered, sorted by True Positive Rate (TPR, also known as sensitivity or recall). Colored bars represent the number of cell cycle cells identified by the method, in white the number of non-cell cycle cells included in the selected cluster (the cluster with the highest number of cell cycle cells). (d) Recall vs Precision (SCENICA: High-confidence cells, SCENICB: Lower confidence/regulon activity).

Supplementary Figure 12 Immunohistochemistry of NFATC2, NFIB and ZEB1 on human melanomas.

Complementary to Figure 3i. Here, we show additional biopsies, in different stages of melanoma progression (RGP: radial growth phase, VGP: vertical growth phase, SLN: sentinel lymp node (with small metastases), Metastasis: (full-blown) metastases).

The strongest positive signals for both NFATC2 and NFIB can be seen in the sentinel lymph nodes.

Supplementary Figure 13 Validation of target gene predictions included in the regulons.

(a) Z-score normalized expression of NFIB and NFATC2 across melanoma cell lines from COSMIC. A375 was selected for the knock-down based on the expression of key markers which resemble the MITF-low state. (b) Knock down of NFATC2 in the A375 melanoma cell line. GSEA plot for genes differentially expressed after NFATC2 knock-down: the predicted NFATC2 targets are significantly up-regulated in the NFATC2 knock-down. (c-f) Enrichment of ChIP-seq signal in selected regulons. c-d: Aggregation plots for MITF and STAT1 ChIP-seq signal on the predicted target CRMs of MITF, STAT, and NFATC2 (i.e. regulatory regions for genes in their respective regulons). e-f: Comparison of the ChIP-seq signal on the regulons (predicted CRMs in a window of 10kb around the TSS), with TF motif occurrences in promoter-proximal regions (of randomly selected genes, and of co-expressed genes outside the regulons). The enrichment for the genes in the regulon compared to the control confirms that SCENIC increases the specificity for finding direct targets compared to individual/alternative approaches (e.g. only co-expression or motif analysis).

Supplementary Figure 14 SCENIC analysis of >40000 single cells from the mouse retina.

Drop-seq data on cells from the mouse retina was analyzed with SCENIC by running GENIE3 on a subset of ~11K cells, and evaluating the resulting GRN on all cells. (a) tSNE colored according to the expected cell types as annotated by Macosko et al. (b, c). Main networks according to the activity of regulons, shown based in the red-green-blue coloring scheme. In these tSNEs, the cells shown in grey are not taking into account for the coloring. Logos corresponding to the significant motifs found in iRegulon for the regulons identified in Müller glia and the corresponding GO terms are included. The identified master regulators (such as Sox8/9, Hes1, Rax, Nr1h4, Srf, and Nr2e1 for the Muller glia, which are confirmed in literature58–61, illustrate that correct networks can be inferred even on sparse data. The sub-sampling approach is especially interesting on sparse datasets, such as the ones resulting from Drop-seq or 10x, because the best quality-cells can be used to infer the GRN, and then this high-quality GRN can be scored on all cells.

Supplementary Figure 15 GRNBoost benchmark.

(a) Comparison of the performance of GRNBoost and GENIE3. Left: As a biological validation for GRNBoost, we devised a gene-set-based precision and recall benchmark. We asked wh1ether GRNBoost and GENIE3 predict similar sets of target genes for a selection of transcription factors (Olig1, Sox10, Rel, Lef1, Neurod1, Dlx1) from the Zeisel et al. mouse brain expression data. Using the network inferred by GENIE3, we constructed for each of the 6 TFs its ranked list of target genes. For each ranked list, we used GOrilla62 to query the top enriched GO term for each TF target list. For each of the 6 top GO terms, we consulted QuickGO63 to obtain the set of association protein annotation gene symbols (filter by mouse taxon), keeping only the genes present in the expression matrix, to obtain six "master lists" of gene symbols. These master lists were finally used to calculate the precision and recall scores for the sets of target genes predicted by both GENIE3 and GRNBoost for each of the aforementioned TFs. Right: To benchmark the time performance of GRNBoost, and as proof of concept, we used the Chromium Megacell demonstration dataset, which contains over 1 million cells from embryonic mouse brain. We took random subsets of 3000, 10000 and 100,000 cells to infer the GRNs. (b) Evaluation of stability across multiple runs. Each scatter plot compares the ranking for the targets of each TF across two independent runs (with equal or different numbers of cells).

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–15 and Supplementary Note 1

Life Sciences Reporting Summary

Supplementary Table 1

Results of NFATC2 knock-down versus control

Supplementary Software

R-packages: SCENIC, GENIE3, RcisTarget, and AUCell, as available in the moment of publication. Links to the most recent version are available at http://scenic.aertslab.org.

Source data

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Aibar, S., González-Blas, C., Moerman, T. et al. SCENIC: single-cell regulatory network inference and clustering. Nat Methods 14, 1083–1086 (2017). https://doi.org/10.1038/nmeth.4463

Download citation

Further reading

Search

Quick links