To the Editor — As single-cell RNA sequencing (scRNA-seq) becomes widespread, accessible and scalable computational pipelines for data analysis are needed. We introduce an interactive computational environment for single-cell studies based on Galaxy1, with functions from established workflows. Single Cell Interactive Application (SCiAp) provides easy access to data from the Human Cell Atlas (HCA) and EMBL-EBI’s Single Cell Expression Atlas (SCEA)2 projects and can be deployed on different computing platforms, making single-cell data analysis of large-scale projects accessible to the scientific community.
Consortia such as the HCA, the Fly Cell Atlas and others are generating large numbers of scRNA-seq datasets that will be available for researchers to reuse alongside the analysis of their own datasets. For instance, the SCEA provides scRNA-seq datasets comprising over 3 million cells from 14 species, including a wide variety of cell types and tissues. This large collection of scRNA-seq data demands adequate computational infrastructure, analysis tools and workflows to help researchers make the most of it.
The Galaxy framework has enabled flexible and scalable deployment across multiple clouds through the Galaxy–Kubernetes integration3, thereby supporting analysis of large datasets. Galaxy offers a user-friendly framework for building and sharing workflows. It is supported by a vibrant community of bioinformaticians who continually enrich the tool repository with analysis methods for applications such as scRNA-seq4. Built on Galaxy, SCiAp facilitates data access (HCA, SCEA and one’s own data), downstream analysis, and visualization of scRNA-seq datasets. We share tools and workflows (including those used in the SCEA) in SCiAp that can run through the web interface or the command line. An instance, known as the HCA Galaxy instance, is available at https://humancellatlas.usegalaxy.eu/ (Fig. 1). Further technical details and usability, among many other topics, are covered in the Supplementary Methods.
A key feature of SCiAp is the ability to integrate tools from different workflows, written in different languages. We break monolithic tools into analysis modules, enabling users to try different competing tool sets and, where possible, integrate them into the same workflows. For example, we produced more than 20 modules for Scanpy5, covering data input, filtering, normalization, variable genes, clustering, dimensionality reductions and trajectory methods, among others. Supplementary Table 1 shows all the tools integrated and the different functional modules into which they were broken; Supplementary Note 1 shows the integration of modules from different tools on analysis workflows. SCiAp provides functionality from Scanpy, Seurat6, Monocle37, SC38, SCmap9, Scater10, SCCAF11, SCPred12, SCEasy and UCSC CellBrowser. Supplementary Figure 1 shows a map of scRNA-seq data analysis functionalities that are covered by tool wrappers contributed as part of this work and external contributions incorporated, shown accordingly.
In summary, SCiAp is a suite of components derived from commonly used tools in scRNA-seq analysis. Being based on Galaxy, it can be deployed on large computational infrastructures or on existing Galaxy instances, reducing software engineering complexities for the biological research community. Supplementary Table 2 shows a comparative overview between SCiAp and similar services. SCiAp outperforms in accessibility and the breadth of tool sets provided. We also provide the underlying tools that resolve software dependencies via Bioconda13 and Biocontainers14, which are commonly used frameworks in bioinformatics. Lab-based scientists with a deep understanding of a cellular system can use this computational framework to interrogate scRNA-seq data, propose further hypotheses and guide their experiments to explore the translational potential of large-scale, single-cell studies using the friendly Galaxy environment.
Data availability
Example input data, in the form of Galaxy histories, are available at http://usegalaxy.eu, with direct links available in Supplementary Note 1. Single Cell Expression Atlas data are directly available from https://www.ebi.ac.uk/gxa/sc and from its FTP site at ftp://ftp.ebi.ac.uk/pub/databases/microarray/data/atlas/sc_experiments/. The Human Cell Atlas data are available from https://data.humancellatlas.org/. In both cases, the appropriate Galaxy modules retrieve data directly from Single Cell Expression Atlas and the Human Cell Atlas.
References
Afgan, E. et al. Nucleic Acids Res. 46, W537–W544 (2018). W1.
Papatheodorou, I. et al. Nucleic Acids Res. 48, D77–D83 (2020). D1.
Moreno, P. et al. Preprint at bioRxiv https://doi.org/10.1101/488643 (2018).
Tekman, M. et al. Gigascience https://doi.org/10.1093/gigascience/giaa102 (2020).
Wolf, F. A., Angerer, P. & Theis, F. J. Genome Biol. 19, 15 (2018).
Stuart, T. et al. Cell 177, 1888–1902.e21 (2019).
Cao, J. et al. Nature 566, 496–502 (2019).
Kiselev, V. Y. et al. Nat. Methods 14, 483–486 (2017).
Kiselev, V. Y., Yiu, A. & Hemberg, M. Nat. Methods 15, 359–362 (2018).
McCarthy, D. J., Campbell, K. R., Lun, A. T. L. & Wills, Q. F. Bioinformatics 33, 1179–1186 (2017).
Miao, Z. et al. Nat. Methods 17, 621–628 (2020).
Alquicira-Hernandez, J., Sathe, A., Ji, H. P., Nguyen, Q. & Powell, J. E. Genome Biol. 20, 264 (2019).
Grüning, B. et al. Nat. Methods 15, 475–476 (2018).
da Veiga Leprevost, F. et al. Bioinformatics 33, 2580–2582 (2017).
Acknowledgements
The authors acknowledge the invaluable support from the Bioconda, Biocontainers and Galaxy communities. P.M., J.R.M., N.H., K.B.M., I.P.: Silicon Valley Community Foundation 2018-183498. M.H.: Silicon Valley Community Foundation 2018-182809 and NHGRI 5U41HG002371-19.
Author information
Authors and Affiliations
Contributions
P.M. designed architecture; P.M. and J.R.M. were lead technical contributors; P.M., N.H., J.R.M., S.M., A.S., K.P., R.C. and G.M. implemented CLIs and tools; N.H., C.T.-L., A.B. and S.T. advised on methods; W.B., P.M., J.R.M., C.T.-L., N.G., S.K.F., Z.M. and M.H. ran training; N.H., P.M. and J.R.M. worked on tool interoperability; M.A.D., Z.M., M.H. implemented tools; B.G., H.R. and P.M. set up and managed the Human Cell Atlas Galaxy instance; W.B. designed training; M.A. and P.M. set up cloud infrastructure for training; W.B., N.G., S.K.F., J.R.M., A.S. and P.M. tested Galaxy tools. Y.P.-R. and B.G. designed and advised on architecture; A.B., S.T. helped conceive the study; I.P. and P.M. designed the study; I.P. and K.B.M. conceived the study.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Supplementary information
Supplementary Information
Supplementary Methods, Fig. 1, Notes 1–5 and Tables 1 and 2.
Rights and permissions
About this article
Cite this article
Moreno, P., Huang, N., Manning, J.R. et al. User-friendly, scalable tools and workflows for single-cell RNA-seq analysis. Nat Methods 18, 327–328 (2021). https://doi.org/10.1038/s41592-021-01102-w
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41592-021-01102-w