A test metric for assessing single-cell RNA-seq batch correction

Büttner, Maren; Miao, Zhichao; Wolf, F. Alexander; Teichmann, Sarah A.; Theis, Fabian J.

doi:10.1038/s41592-018-0254-1

Analysis
Published: 20 December 2018

A test metric for assessing single-cell RNA-seq batch correction

Nature Methods volume 16, pages 43–49 (2019)Cite this article

26k Accesses
203 Citations
157 Altmetric
Metrics details

Subjects

Abstract

Single-cell transcriptomics is a versatile tool for exploring heterogeneous cell populations, but as with all genomics experiments, batch effects can hamper data integration and interpretation. The success of batch-effect correction is often evaluated by visual inspection of low-dimensional embeddings, which are inherently imprecise. Here we present a user-friendly, robust and sensitive k-nearest-neighbor batch-effect test (kBET; https://github.com/theislab/kBET) for quantification of batch effects. We used kBET to assess commonly used batch-regression and normalization approaches, and to quantify the extent to which they remove batch effects while preserving biological variability. We also demonstrate the application of kBET to data from peripheral blood mononuclear cells (PBMCs) from healthy donors to distinguish cell-type-specific inter-individual variability from changes in relative proportions of cell populations. This has important implications for future data-integration efforts, central to projects such as the Human Cell Atlas.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Batch types and the concept of kBET.**

**Fig. 2: kBET is more responsive than other batch tests on simulated data.**

**Fig. 3: ComBat provides the best correction on mESC inDrop technical replicates.**

**Fig. 4: kBET assesses data-integration quality and inter-individual variability.**

Semi-supervised integration of single-cell transcriptomics data

Article Open access 29 January 2024

The art of using t-SNE for single-cell transcriptomics

Article Open access 28 November 2019

Accurate estimation of cell composition in bulk expression through robust integration of single-cell information

Article Open access 24 April 2020

Data availability

We applied the batch estimates to several scRNA-seq datasets. In the inDrop publication, the droplet-based sequencing was demonstrated on mESCs growing on LIF⁺ medium and two additional technical replicates¹². In our analysis, we used two replicates that consisted of 5,952 cells from two batches and 11,308 genes with at least 2 cells having more than 4 unique molecular identifier (UMI) reads per cell. Data were downloaded as UMI-filtered read count matrices from accession GSE65525.

Kolodziejczyk et al.¹⁴ explored heterogeneity in mESCs cultured with three different media (2i, a2i and LIF⁺) on full-length sequenced transcripts (Smart-seq). The three conditions included 219, 123 and 207 cells in 4, 2 and 3 batches, respectively. The mESC data sequenced with full-length Smart-seq¹⁴ were downloaded from ENA (project ID PRJEB6455) as FASTQ files and mapped to an Ensembl⁵² mouse transcriptome (GRCm38.p5.87, equivalent to UCSC mm10) with Salmon²⁴. Cells were quality-controlled according to data derived from the Espresso database (http://www.ebi.ac.uk/teichmann-srv/espresso/).

Further, scRNA-seq has been widely applied in explorations of mouse embryonic development. To test the performance of batch correction for data integration, we collected single-cell RNA-seq data of mouse early embryonic development from eight different studies^{16,17,18,19,20,21,22,23}, consisting of 56, 49, 124, 65, 15, 294, 17 and 15 cells, respectively. The early embryonic development data used have the following accession IDs: E-GEOD-57249, E-GEOD-70605, E-MTAB-3321, GSE53386, E-MTAB-2958, E-GEOD-45719, E-GEOD-44183 and E-GEOD-66582. All studies applied Smart-seq-based protocols for scRNA-seq. All FASTQ files were mapped to an Ensembl⁵² mouse transcriptome (version GRCm38.p5.87) with Salmon²⁴ (version 0.8.2; k-mer = 21 to tolerate different read lengths). Here we considered the studies as batches while omitting the flowcell batches. We continued our analysis without further gene filtering or quality control.

Kang et al.²⁶ studied genetic variation among PBMCs from eight individuals as a replacement for cell barcoding in droplet-based sequencing (10x Genomics). From that study, we used three experimental runs: 3,514 and 4,106 cells from four healthy donors each, and 5,832 cells from these eight healthy donors. Human PBMC data²⁶ can be provided by the authors upon request. Count matrices are available under accession number GSE96583.

References

Tung, P.-Y. et al. Batch effects and the effective design of single-cell gene expression studies. Sci. Rep. 7, 39921 (2017).
Article CAS PubMed PubMed Central Google Scholar
Lun, A. T. L., McCarthy, D. J. & Marioni, J. C. A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor. F1000Res. 5, 2122 (2016).
PubMed PubMed Central Google Scholar
Heimberg, G., Bhatnagar, R., El-Samad, H. & Thomson, M. Low dimensionality in gene expression data enables the accurate extraction of transcriptional programs from shallow sequencing. Cell Syst. 2, 239–250 (2016).
Article CAS PubMed PubMed Central Google Scholar
Finak, G. et al. MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data. Genome Biol. 16, 278 (2015).
Article PubMed PubMed Central Google Scholar
Hicks, S. C., Townes, F. W., Teng, M. & Irizarry, R. A. Missing data and technical variability in single-cell RNA-sequencing experiments. Biostatistics 19, 562–578 (2018).
Article PubMed Google Scholar
Johnson, W. E., Li, C. & Rabinovic, A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8, 118–127 (2007).
Article PubMed Google Scholar
Butler, A., Hoffman, P., Smibert, P., Papalexi, E. & Satija, R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 36, 411–420 (2018).
Article CAS PubMed PubMed Central Google Scholar
Haghverdi, L., Lun, A. T. L., Morgan, M. D. & Marioni, J. C. Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors. Nat. Biotechnol. 36, 421–427 (2018).
Article CAS PubMed PubMed Central Google Scholar
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).
Article PubMed PubMed Central Google Scholar
Ritchie, M. E. et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43, e47 (2015).
Article PubMed PubMed Central Google Scholar
Cressie, N. & Timothy, R. C. Pearson’s χ² and the loglikelihood ratio statistic G²: a comparative review. Int. Stat. Rev. 57, 19–43 (1989).
Article Google Scholar
Klein, A. M. et al. Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells. Cell 161, 1187–1201 (2015).
Article CAS PubMed PubMed Central Google Scholar
Brennecke, P. et al. Accounting for technical noise in single-cell RNA-seq experiments. Nat. Methods 10, 1093–1095 (2013).
Article CAS PubMed Google Scholar
Kolodziejczyk, A. A. et al. Single cell RNA-sequencing of pluripotent states unlocks modular transcriptional variation. Cell Stem Cell 17, 471–485 (2015).
Article CAS PubMed PubMed Central Google Scholar
Angerer, P. et al. Single cells make big data: new challenges and opportunities in transcriptomics. Curr. Opin. Syst. Biol. 4, 85–91 (2017).
Article Google Scholar
Biase, F. H., Cao, X. & Zhong, S. Cell fate inclination within 2-cell and 4-cell mouse embryos revealed by single-cell RNA sequencing. Genome Res. 24, 1787–1796 (2014).
Article CAS PubMed PubMed Central Google Scholar
Liu, W. et al. Identification of key factors conquering developmental arrest of somatic cell cloned embryos by combining embryo biopsy and single-cell sequencing. Cell Discov. 2, 16010 (2016).
Article CAS PubMed PubMed Central Google Scholar
Goolam, M. et al. Heterogeneity in Oct4 and Sox2 targets biases cell fate in 4-cell mouse embryos. Cell 165, 61–74 (2016).
Article CAS PubMed PubMed Central Google Scholar
Fan, X. et al. Single-cell RNA-seq transcriptome analysis of linear and circular RNAs in mouse preimplantation embryos. Genome Biol. 16, 148 (2015).
Article PubMed PubMed Central Google Scholar
Boroviak, T. et al. Lineage-specific profiling delineates the emergence and progression of naive pluripotency in mammalian embryogenesis. Dev. Cell 35, 366–382 (2015).
Article CAS PubMed PubMed Central Google Scholar
Deng, Q., Ramsköld, D., Reinius, B. & Sandberg, R. Single-cell RNA-seq reveals dynamic, random monoallelic gene expression in mammalian cells. Science 343, 193–196 (2014).
Article CAS PubMed Google Scholar
Xue, Z. et al. Genetic programs in human and mouse early embryos revealed by single-cell RNA sequencing. Nature 500, 593–597 (2013).
Article CAS PubMed PubMed Central Google Scholar
Wu, J. et al. The landscape of accessible chromatin in mammalian preimplantation embryos. Nature 534, 652–657 (2016).
Article CAS PubMed Google Scholar
Patro, R., Duggal, G., Love, M. I., Irizarry, R. A. & Kingsford, C. Salmon provides fast and bias-aware quantification of transcript expression. Nat. Methods 14, 417–419 (2017).
Article CAS PubMed PubMed Central Google Scholar
Teng, M. et al. A benchmark for RNA-seq quantification pipelines. Genome Biol. 17, 74 (2016).
Article PubMed PubMed Central Google Scholar
Kang, H. M. et al. Multiplexed droplet single-cell RNA-sequencing using natural genetic variation. Nat. Biotechnol. 36, 89–94 (2018).
Article CAS PubMed Google Scholar
Liu, Q. et al. Quantitative assessment of cell population diversity in single-cell landscapes. bioRxiv Preprint at https://www.biorxiv.org/content/early/2018/05/30/333393 (2018).
Risso, D., Ngai, J., Speed, T. P. & Dudoit, S. Normalization of RNA-seq data using factor analysis of control genes or samples. Nat. Biotechnol. 32, 896–902 (2014).
Article CAS PubMed PubMed Central Google Scholar
Bacher, R. et al. SCnorm: robust normalization of single-cell RNA-seq data. Nat. Methods 14, 584–586 (2017).
Article CAS PubMed PubMed Central Google Scholar
Risso, D., Perraudeau, F., Gribkova, S., Dudoit, S. & Vert, J.-P. A general and flexible method for signal extraction from single-cell RNA-seq data. Nat. Commun. 9, 284 (2018).
Article PubMed PubMed Central Google Scholar
Buettner, F., Pratanwanich, N., McCarthy, D. J., Marioni, J. C. & Stegle, O. f-scLVM: scalable and versatile factor analysis for single-cell RNA-seq. Genome Biol. 18, 212 (2017).
Article PubMed PubMed Central Google Scholar
Li, W. V. & Li, J. J. An accurate and robust imputation method scImpute for single-cell RNA-seq data. Nat. Commun. 9, 997 (2018).
Article PubMed PubMed Central Google Scholar
Eraslan, G., Simon, L. M., Mircea, M., Mueller, N. S. & Theis, F. J. Single cell RNA-seq denoising using a deep count autoencoder. bioRxiv Preprint at https://www.biorxiv.org/content/early/2018/04/13/300681 (2018).
Huang, M. et al. SAVER: gene expression recovery for single-cell RNA sequencing. Nat. Methods 15, 539–542 (2018).
Article CAS PubMed PubMed Central Google Scholar
van Dijk, D. et al. Recovering gene interactions from single-cell data using data diffusion. Cell 174, 716–729 (2018).
Article PubMed PubMed Central Google Scholar
Wolf, F. A., Angerer, P. & Theis, F. J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 19, 15 (2018).
Article PubMed PubMed Central Google Scholar
Saelens, W., Cannoodt, R., Todorov, H. & Saeys, Y. A comparison of single-cell trajectory inference methods: towards more accurate and robust tools. bioRxiv Preprint at https://www.biorxiv.org/content/early/2018/03/05/276907 (2018).
Bhaduri, A., Nowakowski, T. J., Pollen, A. A. & Kriegstein, A. R. Saturating single-cell datasets. bioRxiv Preprint at https://www.biorxiv.org/content/early/2017/11/12/218370 (2017).
Regev, A. et al. The Human Cell Atlas. eLife 6, e27041 (2017).
Article PubMed PubMed Central Google Scholar
Tabula Muris Consortium. Single-cell transcriptomic characterization of 20 organs and tissues from individual mice creates a Tabula Muris. bioRxiv Preprint at https://www.biorxiv.org/content/early/2018/03/29/237446 (2018).
McCarthy, D. J., Campbell, K. R., Lun, A. T. L. & Wills, Q. F. Scater: pre-processing, quality control, normalization and visualization of single-cell RNA-seq data in R. Bioinformatics 33, 1179–1186 (2017).
CAS PubMed PubMed Central Google Scholar
Baik, J. & Silverstein, J. W. Eigenvalues of large sample covariance matrices of spiked population models. J. Multivariate Anal. 97, 1382–1408 (2006).
Article Google Scholar
Rousseeuw, P. J. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987).
Article Google Scholar
Andrews, T. S. & Hemberg, M. Dropout-based feature selection for scRNASeq. bioRxiv Preprint at https://www.biorxiv.org/content/early/2018/05/17/065094 (2018).
Vallejos, C. A., Risso, D., Scialdone, A., Dudoit, S. & Marioni, J. C. Normalizing single-cell RNA sequencing data: challenges and opportunities. Nat. Methods 14, 565–571 (2017).
Article CAS PubMed PubMed Central Google Scholar
Lun, A. T. L., Bach, K. & Marioni, J. C. Pooling across cells to normalize single-cell RNA sequencing data with many zero counts. Genome Biol. 17, 75 (2016).
Article PubMed Google Scholar
Paulson, J. N. et al. Tissue-aware RNA-Seq processing and normalization for heterogeneous and sparse data. BMC Bioinformatics 18, 437 (2017).
Article PubMed PubMed Central Google Scholar
Leek, J. T., Johnson, W. E., Parker, H. S., Jaffe, A. E. & Storey, J. D. The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics 28, 882–883 (2012).
Article CAS PubMed PubMed Central Google Scholar
Stegle, O., Parts, L., Piipari, M., Winn, J. & Durbin, R. Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses. Nat. Protoc. 7, 500–507 (2012).
Article CAS PubMed PubMed Central Google Scholar
Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2010).
Article CAS PubMed Google Scholar
Zappia, L., Phipson, B. & Oshlack, A. Splatter: simulation of single-cell RNA sequencing data. Genome Biol. 18, 174 (2017).
Article PubMed PubMed Central Google Scholar
Aken, B. L. et al. Ensembl 2017. Nucleic Acids Res. 45, D635–D642 (2017).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

We thank A. Böttcher for motivating this study, and T. Illicic for carrying out pilot analyses. We thank in particular M. Subramaniam and J. Ye (UCSF) for the PBMC data. We are grateful to the members of the Teichmann and Theis labs for valuable discussions and comments on the manuscript. M.B. is supported by a DFG Fellowship through the Graduate School of Quantitative Biosciences Munich (QBM). Z.M. is supported by a Single Cell Gene Expression Atlas grant from the Wellcome Trust (nr. 108437/Z/15/Z). F.A.W. acknowledges support by the Helmholtz Postdoc Programme, Initiative and Networking Fund of the Helmholtz Association. F.J.T. acknowledges financial support by the German Science Foundation (SFB 1243 and Graduate School QBM) and by the Bavarian government (BioSysNet). This collaboration was supported by a Helmholtz International Fellow Award to S.A.T.

Author information

These authors contributed equally: Maren Büttner, Zhichao Miao.

Authors and Affiliations

Helmholtz Zentrum München–German Research Center for Environmental Health, Institute of Computational Biology, Neuherberg, Germany
Maren Büttner, F. Alexander Wolf & Fabian J. Theis
European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge, UK
Zhichao Miao & Sarah A. Teichmann
Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK
Zhichao Miao & Sarah A. Teichmann
Department of Physics, Cavendish Laboratory, University of Cambridge, Cambridge, UK
Sarah A. Teichmann
Department of Mathematics, Technische Universität München, Munich, Germany
Fabian J. Theis

Authors

Maren Büttner
View author publications
You can also search for this author in PubMed Google Scholar
Zhichao Miao
View author publications
You can also search for this author in PubMed Google Scholar
F. Alexander Wolf
View author publications
You can also search for this author in PubMed Google Scholar
Sarah A. Teichmann
View author publications
You can also search for this author in PubMed Google Scholar
Fabian J. Theis
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

M.B. developed, tested and validated the method; prepared and analyzed the data; and wrote the paper. Z.M. prepared and analyzed the data and wrote the paper. F.A.W. assisted with method development and manuscript writing. S.A.T. and F.J.T. oversaw the research, designed the method validation and wrote the paper.

Corresponding authors

Correspondence to Sarah A. Teichmann or Fabian J. Theis.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Integrated supplementary information

Supplementary Figure 1 Assessing neighborhood size effect with different flavors of kBET.

Neighborhood size effect for two simulated datasets (1,000 genes, 500 cells, 2 batches of equal size). Dashed vertical line shows the optimal neighborhood size for batch effect detection, that is, where the rejection rate is maximal. Shaded areas represent the 95th percentile of 100 repeated kBET runs. In each run, the number of tested neighborhoods is 10% of the sample size. (a) For 1% of genes, the mean expression levels are varied across the batches. The observed rejection rate is low overall, and decreases with increasing neighborhood size. (b) For 20% of genes, the mean expression levels are varied across the batches. The observed rejection rate is almost 100% and decreases for neighborhood sizes larger than 75%. The vertical dashed line marks the optimal neighborhood size. All flavors of kBET return almost identical results. The exact test has slightly lower rejection rates than Pearson’s 2-test for small neighborhoods (<10% sample size).

Supplementary Figure 2 Simulation of different dropout effects in single-cell RNA-seq data.

Dropout is controlled by the shape parameter k, where batch 1 has k₁ = –1 and k₂ in batch 2 ranges from –0.9 to –3. Batches are always equally sized; sample sizes refer to the total number of samples in the dataset. (a) kBET rejection rates, variance explained and silhouette coefficient for each simulation (from top to bottom). (b) PCA plots for a large difference (left, Δk = 2) and a small difference (right, Δk = 0.04). (c-d) Mean relation (top), dropout relation (center), and cellular detection rate (CDR) effect (bottom) for a large difference (c) and a small difference (d). Blue lines indicate the linear fit with parameters depicted in the top left corner. R² values denote the variance explained by the fit.

Supplementary Figure 3 Simulation of equally sized batches with one having additional noise.

Additional noise is a factor multiplied on final gene expression means and is drawn from a log normal distribution LN, with batch factor and batch scale .We simulated 1,000 samples and drew subsets from this dataset. Batches were always equally sized. (a-b) Batch-effect analysis for several batch factors (a) and batch scales (b) using kBET, PC regression (variance explained by batch effect) and silhouette coefficients (from top to bottom). kBET rejection rates were computed for both the original data space and the default 50-dimensional PC space (top two plots). (c-d) Mean relation (top), dropout relation (center), PCA plot for batch factor = 0.1 (c) and a batch scale = 0.5 (d). Blue lines indicate the linear fit with parameters depicted in the top left corner. R² values denote the variance explained by the fit and correspond to high correlation of mean and dropout of both batches.

Supplementary Figure 4 Highly variable genes and kBET results after batch regression (Klein et al.).

(a) Number of retained highly variable genes before and after batch correction. Reference: intersect of highly variable genes per batch with log(counts + 1) normalization. (b) Total number of highly variable genes after batch correction. (c) False positive rates on highly variable genes for all combinations of normalization and batch-correction methods. (d) Comparison of silhouette coefficient and kBET mean ‘acceptance rate’ (1 – rejection rate) from 100 kBET runs. (e) Comparison of PC regression and kBET mean ‘acceptance rate’ (1 – rejection rate) from 100 kBET runs.

Supplementary Figure 5 Deeply sequenced SMART-seq2/C1 mESC data have similar characteristics for batch correction (Kolodziejczyk et al.).

(a) Illustration of two full-length read datasets with replicates in 2i, LIF and a2i culture (219, 207 and 123 cells, respectively). (b) PCA plots for log(CPM + 1) ComBat-corrected data. (c) Percentage of retained highly variable genes versus kBET acceptance rate (equals 1 – rejection rate) for all combinations of normalization and batch-correction approaches. Best-performing normalization-regression strategies cluster in the top right corner, such as ComBat on log(CPM + 1) data. Isolated cells do not have mutual nearest neighbors and appear in some correction models. Seurat’s CCA alignment batch-corrects data only in a latent space as done in manifold learning, and we therefore could not compute highly variable genes and show only kBET values.

Supplementary Figure 6 Sequencing depth in mouse early development data varies by study rather than cell type.

Sequencing depth (library size) per developmental stage (shape) in eight different studies (color-coded) of mouse embryonic development.

Supplementary Figure 7 kBET detects inter-individual variability in PBMC data (Kang et al.).

PBMC data from eight unrelated individuals processed in three experiments (batches) (Chromium 10X Genomics device), with donor cell identity assigned with demuxlet. Note that with pooling of cells from multiple donors, between-donor processing batch effects are effectively excluded. We applied kBET as a sensitive measure of inter-individual variability. (a) t-SNE plot of all data. Cell types are annotated as in the original publication. (b) Cell-type frequencies per individual and batch. Cells were prepared once and processed in two separate runs. Inter-individual variability is stronger than preparation bias. (c) kBET acceptance rates (1 – rejection rate) for several subsets of the complete dataset. Subsample sizes were chosen from 10% to 100% of the data sample size. Subsampling was repeated threefold and kBET rejection rates were averaged across these replicates to reduce bias from subsampling. With decreasing sample size, we find decreasing rejection rates. This result is due to decreasing certainty for each tested neighborhood as it leads to enhanced failure to reject the null hypothesis.

Supplementary Figure 8 Normalization affects the CDR–library size relation in SMART-seq data.

(a-d) Comparison of cellular detection rate (CDR) effect and library size for inDrop UMI data (Klein et al.) (a-b) and C1 SMART-seq data (Kolodziejczyk et al.) (c-d). (a,c) CDR for count data. (b,d) CDR for CPM-normalized data where any gene was counted as detected if CPM > 1. Blue line shows a linear fit of CDR (y) to library size (x) and batch variable (z) and R² denotes the variance explained by the model. Gray shaded areas indicate s.e. of the fit. In UMI data, the library size explains largely the CDR effect independent of normalization (a,b). In SMART-seq data, we observe a significant contribution of library size to CDR effect (c) that is lost in CPM-normalized data (d).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Büttner, M., Miao, Z., Wolf, F.A. et al. A test metric for assessing single-cell RNA-seq batch correction. Nat Methods 16, 43–49 (2019). https://doi.org/10.1038/s41592-018-0254-1

Download citation

Received: 06 October 2017
Accepted: 31 October 2018
Published: 20 December 2018
Issue Date: January 2019
DOI: https://doi.org/10.1038/s41592-018-0254-1

This article is cited by

DISSECT: deep semi-supervised consistency regularization for accurate cell type fraction and gene expression estimation
- Robin Khatri
- Pierre Machart
- Stefan Bonn
Genome Biology (2024)
Data normalization for addressing the challenges in the analysis of single-cell transcriptomic datasets
- Raquel Cuevas-Diaz Duran
- Haichao Wei
- Jiaqian Wu
BMC Genomics (2024)
scCASE: accurate and interpretable enhancement for single-cell chromatin accessibility sequencing data
- Songming Tang
- Xuejian Cui
- Shengquan Chen
Nature Communications (2024)
Mosaic integration and knowledge transfer of single-cell multimodal data with MIDAS
- Zhen He
- Shuofeng Hu
- Xiaomin Ying
Nature Biotechnology (2024)
Discrete latent embedding of single-cell chromatin accessibility sequencing data for uncovering cell heterogeneity
- Xuejian Cui
- Xiaoyang Chen
- Rui Jiang
Nature Computational Science (2024)