Abstract
Recent advances in single-cell technologies have enabled the characterization of epigenomic heterogeneity at the cellular level. Computational methods for automatic cell type annotation are urgently needed given the exponential growth in the number of cells. In particular, annotation of single-cell chromatin accessibility sequencing (scCAS) data, which can capture the chromatin regulatory landscape that governs transcription in each cell type, has not been fully investigated. Here we propose EpiAnno, a probabilistic generative model integrated with a Bayesian neural network, to annotate scCAS data automatically in a supervised manner. We systematically validate the superior performance of EpiAnno for both intra- and inter-dataset annotation on various datasets. We further demonstrate the advantages of EpiAnno for interpretable embedding and biological implications via expression enrichment analysis, partitioned heritability analysis, enhancer identification, cis-coaccessibility analysis and pathway enrichment analysis. In addition, we show that EpiAnno has the potential to reveal cell type-specific motifs and facilitate scCAS data simulation.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
The CLP_LMPP_MPP and CLP_CMP_MPP datasets were collected from NCBI Gene Expression Omnibus (GEO) under accession no. GSE96772. The forebrain dataset can be accessed from GEO under accession number GSE100033. The InSilico dataset was collected from GEO with accession no. GSE65360. The leukaemia dataset can be accessed from GEO with accession no. GSE74310. The mouse brain datasets are available at http://atlas.gs.washington.edu/mouse-atac/data/. The PBMC5k and PBMC10k datasets are available at https://support.10xgenomics.com/single-cell-atac/datasets.
Code availability
The EpiAnno software, including detailed documents and tutorial, is freely available on GitHub (https://github.com/xy-chen16/EpiAnno) and Zenodo (https://doi.org/10.5281/zenodo.5716525)61.
References
The Tabula Muris Consortium Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris. Nature 562, 367–372 (2018).
Cao, J. et al. The single-cell transcriptional landscape of mammalian organogenesis. Nature 566, 496–502 (2019).
Abdelaal, T. et al. A comparison of automatic cell identification methods for single-cell RNA sequencing data. Genome Biol. 20, 194 (2019).
Kiselev, V. Y., Yiu, A. & Hemberg, M. scmap: projection of single-cell RNA-seq data across data sets. Nat. Methods 15, 359–362 (2018).
Pliner, H. A., Shendure, J. & Trapnell, C. Supervised classification enables rapid annotation of cell atlases. Nat. Methods 16, 983–986 (2019).
Xie, P. et al. SuperCT: a supervised-learning framework for enhanced characterization of single-cell transcriptomic profiles. Nucleic Acids Res. 47, e48 (2019).
Klemm, S. L., Shipony, Z. & Greenleaf, W. J. Chromatin accessibility and the regulatory epigenome. Nat. Rev. Genet. 20, 207–220 (2019).
Cusanovich, D. A. et al. A single-cell atlas of in vivo mammalian chromatin accessibility. Cell 174, 1309–1324 (2018).
Chen, H. et al. Assessment of computational methods for the analysis of single-cell ATAC-seq data. Genome Biol. 20, 241 (2019).
Preissl, S. et al. Single-nucleus analysis of accessible chromatin in developing mouse forebrain reveals cell-type-specific transcriptional regulation. Nat. Neurosci. 21, 432–439 (2018).
Domcke, S. et al. A human cell atlas of fetal chromatin accessibility. Science 370, eaba7612 (2020).
Navidi, Z., Zhang, L. & Wang, B. simATAC: a single-cell ATAC-seq simulation framework. Genome Biol. 22, 74 (2021).
Sun, T., Song, D., Li, W. V. & Li, J. J. scDesign2: a transparent simulator that generates high-fidelity single-cell gene expression count data with gene correlations captured. Genome Biol. 22, 163 (2021).
Zamanighomi, M. et al. Unsupervised clustering and epigenetic classification of single cells. Nat. Commun. 9, 2410 (2018).
Xiong, L. et al. SCALE method for single-cell ATAC-seq analysis via latent feature extraction. Nat. Commun. 10, 4576 (2019).
Chen, S. et al. RA3 is a reference-guided approach for epigenetic characterization of single cells. Nat. Commun. 12, 2177 (2021).
Cusanovich, D. A. et al. Multiplex single cell profiling of chromatin accessibility by combinatorial cellular indexing. Science 348, 910–914 (2015).
Buenrostro, J. D. et al. Integrated single-cell analysis maps the continuous regulatory landscape of human hematopoietic differentiation. Cell 173, 1535–1548 (2018).
Ma, W., Su, K. & Wu, H. Evaluation of some aspects in supervised cell type identification for single-cell RNA-seq: classifier, feature selection and reference construction. Genome Biol. 22, 264 (2021).
Ma, F. & Pellegrini, M. ACTINN: automated identification of cell types in single cell RNA sequencing. Bioinformatics 36, 533–538 (2020).
Sun, S., Zhu, J., Ma, Y. & Zhou, X. Accuracy, robustness and scalability of dimensionality reduction methods for single-cell RNA-seq analysis. Genome Biol. 20, 269 (2019).
Slowikowski, K., Hu, X. & Raychaudhuri, S. SNPsea: an algorithm to identify cell types, tissues and pathways affected by risk loci. Bioinformatics 30, 2496–2497 (2014).
Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).
Visel, A., Minovitsky, S., Dubchak, I. & Pennacchio, L. A. VISTA Enhancer Browser—a database of tissue-specific human enhancers. Nucleic Acids Res. 35, D88–D92 (2007).
Gao, T. & Qian, J. EnhancerAtlas 2.0: an updated resource with enhancer annotation in 586 tissue/cell types across nine species. Nucleic Acids Res. 48, D58–D64 (2020).
Hinrichs, A. S. et al. The UCSC Genome Browser Database: update 2006. Nucleic Acids Res. 34, D590–D598 (2006).
Gao, T. et al. scEnhancer: a single-cell enhancer resource with annotation across hundreds of tissue/cell types in three species. Nucleic Acids Res. https://doi.org/10.1093/nar/gkab1032 (2021).
Zeng, W. et al. SilencerDB: a comprehensive database of silencers. Nucleic Acids Res. 49, D221–D228 (2021).
Pliner, H. A. et al. Cicero predicts cis-regulatory DNA interactions from single-cell chromatin accessibility data. Mol. Cell 71, 858–871 (2018).
McLean, C. Y. et al. GREAT improves functional interpretation of cis-regulatory regions. Nat. Biotechnol. 28, 495–501 (2010).
Allen, N. J. Astrocyte regulation of synaptic behavior. Annu. Rev. Cell Dev. Biol. 30, 439–463 (2014).
Buenrostro, J. D. et al. Single-cell chromatin accessibility reveals principles of regulatory variation. Nature 523, 486–490 (2015).
Corces, M. R. et al. Lineage-specific and single-cell chromatin accessibility charts human hematopoiesis and leukemia evolution. Nat. Genet. 48, 1193–1203 (2016).
Schep, A. N., Wu, B., Buenrostro, J. D. & Greenleaf, W. J. chromVAR: inferring transcription-factor-associated accessibility from single-cell epigenomic data. Nat. Methods 14, 975–978 (2017).
Bannwarth, S. et al. Organization of the human tarbp2 gene reveals two promoters that are repressed in an astrocytic cell line. J. Biol. Chem. 276, 48803–48813 (2001).
Fujiyama, T. et al. Inhibitory and excitatory subtypes of cochlear nucleus neurons are defined by distinct bHLH transcription factors, Ptf1a and Atoh1. Development 136, 2049–2058 (2009).
Jin, S. et al. Inference and analysis of cell–cell communication using CellChat. Nat. Commun. 12, 1088 (2021).
Nathanson, J. L. et al. Short promoters in viral vectors drive selective expression in mammalian inhibitory neurons, but do not restrict activity to specific inhibitory cell-types. Front. Neural Circuits 3, 19 (2009).
Wang, P., Zhao, D., Lachman, H. M. & Zheng, D. Enriched expression of genes associated with autism spectrum disorders in human inhibitory neurons. Transl. Psychiatry 8, 13 (2018).
Matcovitch-Natan, O. et al. Microglia development follows a stepwise program to regulate brain homeostasis. Science 353, aad8670 (2016).
Zusso, M. et al. Regulation of postnatal forebrain amoeboid microglial cell proliferation and development by the transcription factor Runx1. J. Neurosci. 32, 11285–11298 (2012).
Wittstatt, J., Reiprich, S. & Küspert, M. Crazy little thing called Sox—new insights in oligodendroglial Sox protein function. Int. J. Mol. Sci. 20, 2713 (2019).
Romano, S., Vinh, N. X., Bailey, J. & Verspoor, K. Adjusting for chance clustering comparison measures. J. Mach. Learn. Res. 17, 4635–4666 (2016).
Nataf, S., Guillen, M. & Pays, L. TGFB1-mediated gliosis in multiple sclerosis spinal cords is favored by the regionalized expression of HOXA5 and the age-dependent decline in androgen receptor ligands. Int. J. Mol. Sci. 20, 5934 (2019).
Lananna, B. V. et al. Cell-autonomous regulation of astrocyte activation by the circadian clock protein BMAL1. Cell Rep. 25, 1–9 (2018).
Liu, Q., Xu, J., Jiang, R. & Wong, W. H. Density estimation using deep generative neural networks. Proc. Natl Acad. Sci. USA 118, e2101344118 (2021).
Risso, D., Perraudeau, F., Gribkova, S., Dudoit, S. & Vert, J. P. A general and flexible method for signal extraction from single-cell RNA-seq data. Nat. Commun. 9, 284 (2018).
Eraslan, G., Simon, L. M., Mircea, M., Mueller, N. S. & Theis, F. J. Single-cell RNA-seq denoising using a deep count autoencoder. Nat. Commun. 10, 390 (2019).
Lopez, R., Regier, J., Cole, M. B., Jordan, M. I. & Yosef, N. Deep generative modeling for single-cell transcriptomics. Nat. Methods 15, 1053–1058 (2018).
Tian, T., Wan, J., Song, Q. & Wei, Z. Clustering single-cell RNA-seq data with a model-based deep learning approach. Nat. Mach. Intell. 1, 191–198 (2019).
Liu, Q., Chen, S., Jiang, R. & Wong, W. H. Simultaneous deep generative modeling and clustering of single cell genomic data. Nat. Mach. Intell. 3, 536–544 (2021).
Stuart, T., Srivastava, A., Madad, S., Lareau, C. A. & Satija, R. Single-cell chromatin state analysis with Signac. Nat. Methods 18, 1333–1341 (2021).
Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. Preprint at https://arxiv.org/abs/1412.6980 (2014).
Chen, X., Chen, S. & Jiang, R. EnClaSC: a novel ensemble approach for accurate and robust cell-type classification of single-cell transcriptomes. BMC Bioinformatics 21, 392 (2020).
Li, Y. & Luo, Y. Performance-weighted-voting model: an ensemble machine learning method for cancer type classification using whole-exome sequencing mutation. Quant. Biol. 8, 347–358 (2020).
Wold, S., Esbensen, K. & Geladi, P. Principal component analysis. Chemometr. Intell. Lab. Syst. 2, 37–52 (1987).
Vierstra, J. et al. Mouse regulatory DNA landscapes reveal global principles of cis-regulatory evolution. Science 346, 1007–1012 (2014).
Su, A. I. et al. A gene atlas of the mouse and human protein-encoding transcriptomes. Proc. Natl Acad. Sci. USA 101, 6062–6067 (2004).
Wolf, F. A., Angerer, P. & Theis, F. J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 19, 15 (2018).
Chen, S. Q., Zhang, B. H., Chen, X. Y., Zhang, X. G. & Jiang, R. stPlus: a reference-based method for the accurate enhancement of spatial transcriptomics. Bioinformatics 37, I299–I307 (2021).
Chen, X. et al. xy-chen16/EpiAnno: EpiAnno. Zenodo https://doi.org/10.5281/zenodo.5716525 (2021).
Acknowledgements
This work was supported by the National Key Research and Development Program of China grant no. 2021YFF1200902 (R.J.), the National Natural Science Foundation of China grants nos. 61873141 (R.J.), 61721003 (X.Z.), 61573207 (R.J.), U1736210 (H.L.), a grant from the Guoqiang Institute, Tsinghua University (R.J.), and the Tsinghua-Fuzhou Institute for Data Technology. We thank S. Lei for helpful suggestions and L. Xiong for cell type labels of the forebrain dataset.
Author information
Authors and Affiliations
Contributions
R.J. conceived the study and supervised the project. X.C. and S.C. designed, implemented and validated EpiAnno. S.S., Z.G. and L.H. helped with analysing the results. X.C., S.C., R.J., H.L. and X.Z. wrote the manuscript, with input from all the authors.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Machine Intelligence thanks Wei Lin and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Notes 1–5, Figs. 1–10 and Tables 1–3.
Rights and permissions
About this article
Cite this article
Chen, X., Chen, S., Song, S. et al. Cell type annotation of single-cell chromatin accessibility data via supervised Bayesian embedding. Nat Mach Intell 4, 116–126 (2022). https://doi.org/10.1038/s42256-021-00432-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s42256-021-00432-w
This article is cited by
-
Discrete latent embedding of single-cell chromatin accessibility sequencing data for uncovering cell heterogeneity
Nature Computational Science (2024)
-
Deciphering cell types by integrating scATAC-seq data with genome sequences
Nature Computational Science (2024)
-
scIBD: a self-supervised iterative-optimizing model for boosting the detection of heterotypic doublets in single-cell chromatin accessibility data
Genome Biology (2023)
-
Cellcano: supervised cell type identification for single cell ATAC-seq data
Nature Communications (2023)
-
Plant synthetic epigenomic engineering for crop improvement
Science China Life Sciences (2022)