A scalable SCENIC workflow for single-cell gene regulatory network analysis

Abstract

This protocol explains how to perform a fast SCENIC analysis alongside standard best practices steps on single-cell RNA-sequencing data using software containers and Nextflow pipelines. SCENIC reconstructs regulons (i.e., transcription factors and their target genes) assesses the activity of these discovered regulons in individual cells and uses these cellular activity patterns to find meaningful clusters of cells. Here we present an improved version of SCENIC with several advances. SCENIC has been refactored and reimplemented in Python (pySCENIC), resulting in a tenfold increase in speed, and has been packaged into containers for ease of use. It is now also possible to use epigenomic track databases, as well as motifs, to refine regulons. In this protocol, we explain the different steps of SCENIC: the workflow starts from the count matrix depicting the gene abundances for all cells and consists of three stages. First, coexpression modules are inferred using a regression per-target approach (GRNBoost2). Next, the indirect targets are pruned from these modules using cis-regulatory motif discovery (cisTarget). Lastly, the activity of these regulons is quantified via an enrichment score for the regulon’s target genes (AUCell). Nonlinear projection methods can be used to display visual groupings of cells based on the cellular activity patterns of these regulons. The results can be exported as a loom file and visualized in the SCope web application. This protocol is illustrated on two use cases: a peripheral blood mononuclear cell data set and a panel of single-cell RNA-sequencing cancer experiments. For a data set of 10,000 genes and 50,000 cells, the pipeline runs in <2 h.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: Schematic overview of the pipeline.
Fig. 2: Speed comparison of complete SCENIC workflow.
Fig. 3: Summary statistics for the unfiltered counts matrix for the PBMC study case.
Fig. 4: Summary statistics for the counts matrix after filtering for the PBMC study case.
Fig. 5: Table of enriched motifs from cisTarget for a selected set of regulons related to B cells, generated within the SCENIC workflow.
Fig. 6: AUC distribution across cells for three sample PBMC regulons.
Fig. 7: Dimensionality reduction plots for the PBMC study case.
Fig. 8: The SCope tool enables interactive comparison of multiple visualizations for the PBMC study case.
Fig. 9: The SCope tool allows exploration of regulons.
Fig. 10: Regulon specificity score for each PBMC subtype.
Fig. 11: Extended analysis of the EBF1 regulon performed in iRegulon.
Fig. 12: Overview of cancer single cell transcriptomics experiments.
Fig. 13: Binary heat map for the skin cutaneous melanoma (SKCM) data set.

Data availability

All data analyzed within this protocol are publicly available. The PBMC 10k data set is directly available for download from the 10x Genomics company website: https://support.10xgenomics.com/single-cell-gene-expression/datasets/3.0.0/pbmc_10k_v3. The following data sets are available from the National Center for Biotechnology Information’s GEO and are accessible through GEO Series accession numbers: GSE60361 (mouse brain data set), GSE115978 (human cutaneous melanoma), and GSE103322 (human HNSC). The non-small cell lung carcinoma data set can be downloaded from ArrayExpress (experiments E-MTAB-6149 and E-MTAB-6653). Additional metadata are available as the supplementary information files from the original publications that generated these data sets. The online version of the case studies used in this protocol is available on GitHub (https://github.com/aertslab/SCENICprotocol), including Jupyter notebooks, and the Nextflow project code, along with associated installation and usage instructions.

Code availability

SCENIC is available as a Python package at https://pypi.org/project/pyscenic/, and its source code is available on GitHub (https://github.com/aertslab/pySCENIC). The code in this manuscript has been peer reviewed.

References

  1. 1.

    Satija, R., Farrell, J. A., Gennert, D., Schier, A. F. & Regev, A. Spatial reconstruction of single-cell gene expression data. Nat. Biotechnol. 33, 495–502 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  2. 2.

    Wolf, A. F., Angerer, P. & Theis, F. J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 19, 15 (2018).

    PubMed  PubMed Central  Google Scholar 

  3. 3.

    Aibar, S. et al. SCENIC: single-cell regulatory network inference and clustering. Nat. Methods 14, 1083–1086 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  4. 4.

    Tommaso, P. et al. Nextflow enables reproducible computational workflows. Nat. Biotechnol. 35, 316–319 (2017).

    PubMed  Google Scholar 

  5. 5.

    Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  6. 6.

    Bray, N. L., Pimentel, H., Melsted, P. & Pachter, L. Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol. 34, 525–527 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  7. 7.

    Parekh, S., Ziegenhain, C., Vieth, B., Enard, W. & Hellmann, I. zUMIs—a fast and flexible pipeline to process RNA sequencing data with UMIs. Gigascience 7, 1–9 (2018).

    CAS  Google Scholar 

  8. 8.

    Srivastava, A., Malik, L., Smith, T., Sudbery, I. & Patro, R. Alevin efficiently estimates accurate gene abundances from dscRNA-seq data. Genome Biol. 20, 65 (2019).

    PubMed  PubMed Central  Google Scholar 

  9. 9.

    Ilicic, T. et al. Classification of low quality cells from single-cell RNA-seq data. Genome Biol. 17, 29 (2016).

    PubMed  PubMed Central  Google Scholar 

  10. 10.

    Huynh-Thu, V., Irrthum, A., Wehenkel, L. & Geurts, P. Inferring regulatory networks from expression data using tree-based methods. PLoS ONE 5, e12776 (2010).

    PubMed  PubMed Central  Google Scholar 

  11. 11.

    Moerman, T. et al. GRNBoost2 and Arboreto: efficient and scalable inference of gene regulatory networks. Bioinformatics 35, 2159–2161 (2018).

    Google Scholar 

  12. 12.

    Gaiteri, C., Ding, Y., French, B., Tseng, G. & Sibille, E. Beyond modules and hubs: the potential of gene coexpression networks for investigating molecular mechanisms of complex brain disorders. Genes Brain Behav. 13, 13–24 (2014).

    CAS  PubMed  Google Scholar 

  13. 13.

    Janky, R. et al. iRegulon: from a gene list to a gene regulatory network using large motif and track collections. PLoS Comput. Biol. 10, e1003731 (2014).

    PubMed  PubMed Central  Google Scholar 

  14. 14.

    Herrmann, C., de Sande, B., Potier, D. & Aerts, S. i-cisTarget: an integrative genomics method for the prediction of regulatory features and cis-regulatory modules. Nucleic Acids Res. 40, 1–44 (2012).

    Google Scholar 

  15. 15.

    Imrichová, H., Hulselmans, G., Atak, Z., Potier, D. & Aerts, S. i-cisTarget 2015 update: generalized cis-regulatory enrichment analysis in human, mouse and fly. Nucleic Acids Res. 43, W57–W64 (2015).

    PubMed  PubMed Central  Google Scholar 

  16. 16.

    Becht, E. et al. Dimensionality reduction for visualizing single-cell data using UMAP. Nat. Biotechnol. 9, 26 (2018).

    Google Scholar 

  17. 17.

    Davie, K. et al. A single-cell transcriptome atlas of the aging Drosophila brain. Cell 174, 1–38 (2018).

    Google Scholar 

  18. 18.

    Potier, D. et al. Mapping gene regulatory networks in Drosophila eye development by large-scale transcriptome perturbations and motif inference. Cell Rep. 9, 2290–2303 (2014).

    CAS  PubMed  Google Scholar 

  19. 19.

    Svensson, V., Vento-Tormo, R. & Teichmann, S. A. Exponential scaling of single-cell RNA-seq in the past decade. Nat. Protoc. 13, 599–604 (2018).

    CAS  PubMed  Google Scholar 

  20. 20.

    Sanguinetti, G. & Huynh-Thu, V. A. Gene Regulatory Networks: Methods and Protocols (Springer, 2019).

  21. 21.

    Kurtzer, G. M., Sochat, V. & Bauer, M. W. Singularity: scientific containers for mobility of compute. PLoS ONE 12, e0177459 (2017).

    PubMed  PubMed Central  Google Scholar 

  22. 22.

    Pratapa, A., Jalihal, A. P., Law, J. N., Bharadwaj, A. & Murali, T. Benchmarking algorithms for gene regulatory network inference from single-cell transcriptomic data. Nat. Methods 17, 147–154 (2020).

  23. 23.

    Lambert, S. A. et al. The human transcription factors. Cell 172, 650–665 (2018).

    CAS  PubMed  Google Scholar 

  24. 24.

    Fiers, M. W. et al. Mapping gene regulatory networks from single-cell omics data. Brief. Funct. Genomics 17, 246–254 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  25. 25.

    de Smet, R. & Marchal, K. Advantages and limitations of current network inference methods. Nat. Rev. Microbiol. 8, 717–729 (2010).

    PubMed  Google Scholar 

  26. 26.

    Guo, M., Wang, H., Potter, S. S., Whitsett, J. A. & Xu, Y. SINCERA: a pipeline for single-cell RNA-seq profiling analysis. PLoS Comput. Biol. 11, e1004575 (2015).

    PubMed  PubMed Central  Google Scholar 

  27. 27.

    Mohammadi, S., Ravindra, V., Gleich, D. F. & Grama, A. A geometric approach to characterize the functional identity of single cells. Nat. Commun. 9, 1516 (2018).

    PubMed  PubMed Central  Google Scholar 

  28. 28.

    van Dijk, D. et al. Recovering gene interactions from single-cell data using data diffusion. Cell 174, 716–729 (2018).

    PubMed  PubMed Central  Google Scholar 

  29. 29.

    Deshpande, A., Chu, L.-F., Stewart, R. & Gitter, A. Network inference with Granger causality ensembles on single-cell transcriptomic data. Preprint at https://www.biorxiv.org/content/10.1101/534834v1 (2019).

  30. 30.

    Zeisel, A. et al. Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq. Science 347, 1138–1142 (2015).

    CAS  Google Scholar 

  31. 31.

    Chen, X., Teichmann, S. A. & Meyer, K. B. From tissues to cell types and back: single-cell gene expression analysis of tissue architecture. Annu. Rev. Biomed. Data Sci. 1, 1–23 (2018).

    Google Scholar 

  32. 32.

    Tirosh, I. & Suvà, M. L. Deciphering human tumor biology by single-cell expression profiling. Annu. Rev. Cancer Biol. 3, 1–16 (2018).

    Google Scholar 

  33. 33.

    Obaldia, M. & Bhandoola, A. Transcriptional regulation of innate and adaptive lymphocyte lineages. Annu. Rev. Immunol. 33, 1–36 (2014).

    Google Scholar 

  34. 34.

    Laresgoiti, U. et al. E2F2 and CREB cooperatively regulate transcriptional activity of cell cycle genes. Nucleic Acids Res. 41, 10185–10198 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  35. 35.

    Knox, J. J., Cosma, G. L., Betts, M. R. & McLane, L. M. Characterization of T-bet and eomes in peripheral human immune cells. Front. Immunol. 5, 217 (2014).

    PubMed  PubMed Central  Google Scholar 

  36. 36.

    Lin, Y. C. et al. A global network of transcription factors, involving E2A, EBF1 and Foxo1, that orchestrates B cell fate. Nat. Immunol. 11, 635 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  37. 37.

    Boller, S. & Grosschedl, R. The regulatory network of B-cell differentiation: a focused view of early B-cell factor 1 function. Immunol. Rev. 261, 102–115 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  38. 38.

    Suo, S. et al. Revealing the critical regulators of cell identity in the mouse cell atlas. Cell Rep. 25, 1436–1445 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  39. 39.

    Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003).

    CAS  PubMed  PubMed Central  Google Scholar 

  40. 40.

    Jerby-Arnon, L. et al. A cancer cell program promotes T cell exclusion and resistance to checkpoint blockade. Cell 175, 984–997 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  41. 41.

    Puram, S. V. et al. Single-cell transcriptomic analysis of primary and metastatic tumor ecosystems in head and neck cancer. Cell 171, 1611–1624 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  42. 42.

    Lambrechts, D. et al. Phenotype molding of stromal cells in the lung tumor microenvironment. Nat. Med. 24, 1277–1289 (2018).

    CAS  Google Scholar 

  43. 43.

    Pavlidis, P. & Noble, W. S. Analysis of strain and regional variation in gene expression in mouse brain. Genome Biol. 2, research0042.1 (2001).

    Google Scholar 

  44. 44.

    Zhang, B. & Horvath, S. A general framework for weighted gene co-expression network analysis. Stat. Appl. Genet. Mol. 4, Article17 (2005).

    Google Scholar 

  45. 45.

    Frith, M. C., Li, M. C. & Weng, Z. Cluster-Buster: finding dense clusters of motifs in DNA sequences. Nucleic Acids Res. 31, 3666–3668 (2003).

    CAS  PubMed  PubMed Central  Google Scholar 

  46. 46.

    Zweig, A. S., Karolchik, D., Kuhn, R. M., Haussler, D. & Kent, J. W. UCSC genome browser tutorial. Genomics 92, 75–84 (2008).

    CAS  PubMed  Google Scholar 

  47. 47.

    Aerts, S. et al. Gene prioritization through genomic data fusion. Nat. Biotechnol. 24, 537–544 (2006).

    CAS  PubMed  Google Scholar 

  48. 48.

    Consortium, E. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).

    Google Scholar 

  49. 49.

    Vilella, A. J. et al. EnsemblCompara GeneTrees: complete, duplication-aware phylogenetic trees in vertebrates. Genome Res. 19, 327–335 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  50. 50.

    Gupta, S., Stamatoyannopoulos, J. A., Bailey, T. L. & Noble, W. Quantifying similarity between motifs. Genome Biol. 8, R24 (2007).

    PubMed  PubMed Central  Google Scholar 

  51. 51.

    Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102, 15545–15550 (2005).

    CAS  PubMed  Google Scholar 

  52. 52.

    Luecken, M. D. & Theis, F. J. Current best practices in single-cell RNA-seq analysis: a tutorial. Mol. Syst. Biol. 15, e8746 (2019).

    PubMed  PubMed Central  Google Scholar 

  53. 53.

    van der Maaten, L. & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008).

    Google Scholar 

  54. 54.

    Blondel, V. D., Guillaume, J.-L., Lambiotte, R. & Lefebvre, E. Fast unfolding of communities in large networks. J. Stat. Mech. 2008, P10008 (2008).

    Google Scholar 

  55. 55.

    Pliner, H. A., Shendure, J. & Trapnell, C. Supervised classification enables rapid annotation of cell atlases. Nat. Methods 16, 983–986 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

This work was funded by VLAIO (no. HBC.2017.1003 to J.R., Y.S., and S. Aerts); by an ERC Consolidator Grant (no. 724226_cis-CONTROL to S. Aerts); and by the KU Leuven (grant no. C14/18/092 to S. Aerts). Computing was performed at the Vlaams Supercomputer Center. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Author information

Affiliations

Authors

Contributions

Conceptualization: B.V.d.S., C.F., J.R., Y.S., and S. Aerts; methodology: B.V.d.S., C.F., K.D., M.D.W., G.H., S. Aibar, R.S., W.S., R.C., Q.R., T.V., D.D.M., J.R., Y.S., and S. Aerts; software: B.V.d.S., C.F., K.D., M.D.W., G.H., S. Aibar, R.S., W.S., R.C., Q.R., T.V., and D.D.M.; validation, resources, and data curation: B.V.d.S. and C.F.; writing—original draft: B.V.d.S., C.F., and S. Aerts; writing—review and editing: B.V.d.S., C.F., and S. Aerts; visualization: B.V.d.S., C.F., and S. Aerts; supervision: S. Aerts., Y.S., and J.R.

Corresponding author

Correspondence to Stein Aerts.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Related links

Key references using this protocol

Davie, K. et al. Cell 174, 982–998 (2018): https://doi.org/10.1016/j.cell.2018.05.057

Lambrechts, D. et al. Nat. Med. 24, 1277–1289 (2018): https://doi.org/10.1038/s41591-018-0096-5

Wouters, J. et al. Preprint at bioRxiv (2019): https://www.biorxiv.org/content/10.1101/715995v2

Aibar, S. et al. Nat. Methods 14, 1083–1086 (2017): https://doi.org/10.1038/nmeth.4463

Supplementary information

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Van de Sande, B., Flerin, C., Davie, K. et al. A scalable SCENIC workflow for single-cell gene regulatory network analysis. Nat Protoc 15, 2247–2276 (2020). https://doi.org/10.1038/s41596-020-0336-2

Download citation

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.