Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Resource
  • Published:

scPerturb: harmonized single-cell perturbation data

Abstract

Analysis across a growing number of single-cell perturbation datasets is hampered by poor data interoperability. To facilitate development and benchmarking of computational methods, we collect a set of 44 publicly available single-cell perturbation–response datasets with molecular readouts, including transcriptomics, proteomics and epigenomics. We apply uniform quality control pipelines and harmonize feature annotations. The resulting information resource, scPerturb, enables development and testing of computational methods, and facilitates comparison and integration across datasets. We describe energy statistics (E-statistics) for quantification of perturbation effects and significance testing, and demonstrate E-distance as a general distance measure between sets of single-cell expression profiles. We illustrate the application of E-statistics for quantifying similarity and efficacy of perturbations. The perturbation–response datasets and E-statistics computation software are publicly available at scperturb.org. This work provides an information resource for researchers working with single-cell perturbation data and recommendations for experimental design, including optimal cell counts and read depth.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Perturbation–response profiling for single cells.
Fig. 2: Single-cell perturbation–response datasets are diverse in type, size and quality.
Fig. 3: E-statistics describe distinctiveness of perturbations in single-cell data.
Fig. 4: E-distance dissects perturbation hierarchy in data from Papalexi et al.9.
Fig. 5: Effect of subsampling UMI counts per cell and number of cells per perturbation on E-statistics.

Similar content being viewed by others

Data availability

The website scperturb.org stores harmonized datasets with the following: scRNA-seq and antibody-based protein datasets: .h5ad files; scATAC-seq: multiple different feature matrix definitions as separate download options. RNA data at https://doi.org/10.5281/zenodo.7041848 and ATAC data at https://doi.org/10.5281/zenodo.7058381. Dataset access details: AdamsonWeissman20167: GSE90546 on GEO55; AissaBenevolenskaya202168: GSE149383 on GEO; ChangYe202169: E-MTAB-10698 on ArrayExpress70; DatlingerBock20171: GSE92872 on GEO; DatlingerBock202171: GSE168620 on GEO; DixitRegev20162: GSE90063 on GEO; FrangiehIzar20216: SCP1064 on the Broad Single Cell Portal https://singlecell.broadinstitute.org/single_cell/study/SCP1064/multi-modal-pooled-perturb-cite-seq-screens-in-patient-models-define-novel-mechanisms-of-cancer-immune-evasion; GasperiniShendure201954: GSE120861 on GEO; GehringPachter201919: https://doi.org/10.22002/D1.1311 on CaltechDATA; Liscovitch-BrauerSanjana202159: GSE161002 on GEO; McFarlandTsherniak202072: https://doi.org/10.6084/m9.figshare.5863776.v1 on figshare; MimitouSmibert202173: GSE156476 on GEO; NormanWeissman201949: GSE133344 on GEO; PapalexiSatija2029: GSE153056 on GEO; PierceGreenleaf202142: data deposited on AWS, URIs to be found at https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8137922/bin/41467_2021_23213_MOESM9_ESM.xlsx; ReplogleWeissman202220: processed single-cell data from gwps.wi.mit.edu; SchiebingerLander201923: GSE106340 and GSE115943 on GEO; SchraivogelSteinmetz202074: GSE135497 on GEO; ShifrutMarson201875: GSE119450 on GEO; SrivatsanTrapnell202052: GSE139944 on GEO; TianKampmann201976: GSE152988 on GEO with mappings from kampmannlab.ucsf.edu/crop-seq; TianKampmann202121: GSE124703 on GEO; WeinrebKlein202077: GSE140802 on GEO; XieHon201778: GSE81884 on GEO; ZhaoSims202179: GSE148842 on GEO.

Code availability

Open access source code is at https://github.com/sanderlab/scPerturb/. We compiled a corresponding Python package called scperturb for performing E-statistics (E-distance and E-testing) in single-cell data, published on PyPI under https://pypi.org/project/scperturb/. Access details for the original publication for each dataset are available in the scPerturb GitHub repository (https://github.com/sanderlab/scPerturb) in the subfolder 'dataset_processing'.

References

  1. Datlinger, P. et al. Pooled CRISPR screening with single-cell transcriptome readout. Nat. Methods 14, 297–301 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Dixit, A., Parnas, O., Li, B. & Chen, J. Perturb-seq: dissecting molecular circuits with scalable single-cell RNA profiling of pooled genetic screens. Cell 167, 1853–1866 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Jaitin, D. A. et al. Dissecting immune circuits by linking CRISPR-pooled screens with single-cell RNA-seq. Cell 167, 1883–1896 (2016).

    Article  CAS  PubMed  Google Scholar 

  4. Gilbert, L. A. et al. Genome-scale CRISPR-mediated control of gene repression and activation. Cell 159, 647–661 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Wessels, H.-H. et al. Efficient combinatorial targeting of RNA transcripts in single cells with Cas13 RNA Perturb-seq. Nat. Methods 20, 86–94 (2023).

    Article  CAS  PubMed  Google Scholar 

  6. Frangieh, C. J. et al. Multimodal pooled Perturb-CITE-seq screens in patient models define mechanisms of cancer immune evasion. Nat. Genet. 53, 332–341 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Adamson, B. et al. A multiplexed single-cell CRISPR screening platform enables systematic dissection of the unfolded protein response. Cell 167, 1867–1882 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Rubin, A. J. et al. Coupled single-cell CRISPR screening and epigenomic profiling reveals causal gene regulatory networks. Cell 176, 361–376 (2019).

    Article  CAS  PubMed  Google Scholar 

  9. Papalexi, E. et al. Characterizing the molecular regulation of inhibitory immune checkpoints with multimodal single-cell screens. Nat. Genet. 53, 322–331 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Gross, T., Wongchenko, M. J., Yan, Y. & Blüthgen, N. Robust network inference using response logic. Bioinformatics 35, i634–i642 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Pratapa, A., Jalihal, A. P., Law, J. N., Bharadwaj, A. & Murali, T. M. Benchmarking algorithms for gene regulatory network inference from single-cell transcriptomic data. Nat. Methods 17, 147–154 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Gross, T. & Blüthgen, N. Identifiability and experimental design in perturbation studies. Bioinformatics 36, i482–i489 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Bertin, P. et al. RECOVER: sequential model optimization platform for combination drug repurposing identifies novel synergistic compounds in vitro. Preprint at https://doi.org/10.48550/arXiv.2202.04202 (2022).

  14. Franz, A. et al. Molecular response to PARP1 inhibition in ovarian cancer cells as determined by mass spectrometry based proteomics. J. Ovarian Res. 14, 140 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Preuer, K. et al. DeepSynergy: predicting anti-cancer drug synergy with deep learning. Bioinformatics 34, 1538–1546 (2018).

    Article  CAS  PubMed  Google Scholar 

  16. Kharchenko, P. V. The triumphs and limitations of computational methods for scRNA-seq. Nat. Methods 18, 723–732 (2021).

    Article  CAS  PubMed  Google Scholar 

  17. Burkhardt, D. B. et al. Quantifying the effect of experimental perturbations at single-cell resolution. Nat. Biotechnol. 39, 619–629 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Dann, E., Henderson, N. C., Teichmann, S. A., Morgan, M. D. & Marioni, J. C. Differential abundance testing on single-cell data using k-nearest neighbor graphs. Nat. Biotechnol. 40, 245–253 (2022).

    Article  CAS  PubMed  Google Scholar 

  19. Gehring, J., Park, J. H., Chen, S., Thomson, M. & Pachter, L. Highly multiplexed single-cell RNA-seq by DNA oligonucleotide tagging of cellular proteins. Nat. Biotechnol. 38, 35–38 (2020).

    Article  CAS  PubMed  Google Scholar 

  20. Replogle, J. M. et al. Mapping information-rich genotype–phenotype landscapes with genome-scale Perturb-seq. Cell 185, 2559–2575 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Tian, R. et al. Genome-wide CRISPRi/a screens in human neurons link lysosomal failure to ferroptosis. Nat. Neurosci. 24, 1020–1034 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Chen, W. S. et al. Uncovering axes of variation among single-cell cancer specimens. Nat. Methods 17, 302–310 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Schiebinger, G. et al. Optimal-transport analysis of single-cell gene expression identifies developmental trajectories in reprogramming. Cell 176, 928–943 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Lotfollahi, M., Naghipourfar, M., Theis, F. J. & Wolf, F. A. Conditional out-of-distribution generation for unpaired data using transfer VAE. Bioinformatics 36, i610–i617 (2020).

    Article  CAS  PubMed  Google Scholar 

  25. Przybyla, L. & Gilbert, L. A. A new era in functional genomics screens. Nat. Rev. Genet. 23, 89–103 (2022).

    Article  CAS  PubMed  Google Scholar 

  26. Forcato, M., Romano, O. & Bicciato, S. Computational methods for the integrative analysis of single-cell data. Brief. Bioinform. 22, 20–29 (2021).

    Article  PubMed  Google Scholar 

  27. Luecken, M. D. et al. Benchmarking atlas-level data integration in single-cell genomics. Nat. Methods 19, 41–50 (2022).

    Article  CAS  PubMed  Google Scholar 

  28. Duan, B. et al. Model-based understanding of single-cell CRISPR screening. Nat. Commun. 10, 2233 (2019).

    Article  ADS  PubMed  PubMed Central  Google Scholar 

  29. Jin, K. et al. CellDrift: inferring perturbation responses in temporally-sampled single cell data. Brief. Bioinform. 23, bbac324 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  30. Lotfollahi, M., Wolf, F. A. & Theis, F. J. scGen predicts single-cell perturbation responses. Nat. Methods 16, 715–721 (2019).

    Article  CAS  PubMed  Google Scholar 

  31. Stathias, V. et al. LINCS Data Portal 2.0: next generation access point for perturbation–response signatures. Nucleic Acids Res. 48, D431–D439 (2020).

    Article  CAS  PubMed  Google Scholar 

  32. Tsherniak, A. et al. Defining a cancer dependency map. Cell 170, 564–576 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Lance, C. et al. Multimodal single cell data integration challenge: results and lessons learned. Preprint at https://doi.org/10.1101/2022.04.11.487796 (2022).

  34. Svensson, V., da Veiga Beltrame, E. & Pachter, L. A curated database reveals trends in single-cell transcriptomics. Database 2020, baaa073 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  35. Broad Institute. Single Cell Portal. https://singlecell.broadinstitute.org/single_cell (2022).

  36. Ji, Y., Lotfollahi, M., Wolf, F. A. & Theis, F. J. Machine learning for perturbational single-cell omics. Cell Syst. 12, 522–537 (2021).

    Article  CAS  PubMed  Google Scholar 

  37. Fischer, D. S. et al. Sfaira accelerates data and model reuse in single cell genomics. Genome Biol. 22, 248 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  38. Chan Zuckerberg CELLxGENE Discover. Cellxgene Data Portal. https://cellxgene.cziscience.com/

  39. Mimitou, E. P. et al. Multiplexed detection of proteins, transcriptomes, clonotypes and CRISPR perturbations in single cells. Nat. Methods 16, 409–412 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Chen, H. et al. Assessment of computational methods for the analysis of single-cell ATAC-seq data. Genome Biol. 20, 241 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  41. Granja, J. M. et al. ArchR is a scalable software package for integrative single-cell chromatin accessibility analysis. Nat. Genet. 53, 403–411 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Pierce, S. E., Granja, J. M. & Greenleaf, W. J. High-throughput single-cell chromatin accessibility CRISPR screens enable unbiased identification of regulatory networks in cancer. Nat. Commun. 12, 2969 (2021).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  43. Buenrostro, J. D. et al. Single-cell chromatin accessibility reveals principles of regulatory variation. Nature 523, 486–490 (2015).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  44. Cusanovich, D. A. et al. Multiplex single-cell profiling of chromatin accessibility by combinatorial cellular indexing. Science 348, 910–914 (2015).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  45. Schep, A. N., Wu, B., Buenrostro, J. D. & Greenleaf, W. J. chromVAR: inferring transcription-factor-associated accessibility from single-cell epigenomic data. Nat. Methods 14, 975–978 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Luecken, M. D. & Theis, F. J. Current best practices in single-cell RNA-seq analysis: a tutorial. Mol. Syst. Biol. 15, e8746 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  47. Haque, A., Engel, J., Teichmann, S. A. & Lönnberg, T. A practical guide to single-cell RNA-sequencing for biomedical research and clinical applications. Genome Med. 9, 75 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  48. Székely, G. J. & Rizzo, M. L. Energy statistics: a class of statistics based on distances. J. Stat. Plan. Inference 143, 1249–1272 (2013).

    Article  MathSciNet  Google Scholar 

  49. Norman, T. M. et al. Exploring genetic interaction manifolds constructed from rich single-cell phenotypes. Science 365, 786–793 (2019).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  50. Schroder, K., Hertzog, P. J., Ravasi, T. & Hume, D. A. Interferon-γ: an overview of signals, mechanisms and functions. J. Leukoc. Biol. 75, 163–189 (2004).

    Article  CAS  PubMed  Google Scholar 

  51. Jung, S. & Marron, J. S. PCA consistency in high dimension, low sample size context. Ann. Stat. 37, 4104–4130 (2009).

    Article  MathSciNet  Google Scholar 

  52. Srivatsan, S. R. et al. Massively multiplex chemical transcriptomics at single-cell resolution. Science 367, 45–51 (2020).

    Article  ADS  CAS  PubMed  Google Scholar 

  53. Yao, D. et al. Scalable genetic screening for regulatory circuits using compressed Perturb-seq. Nat. Biotechnol. https://doi.org/10.1038/s41587-023-01964-9 (2023).

  54. Gasperini, M. et al. A genome-wide framework for mapping gene regulation via cellular genetic screens. Cell 176, 377–390 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Barrett, T. et al. NCBI GEO: archive for functional genomics data sets – update. Nucleic Acids Res. 41, D991–D995 (2013).

    Article  CAS  PubMed  Google Scholar 

  56. Gatto, L. et al. Initial recommendations for performing, benchmarking and reporting single-cell proteomics experiments. Nat. Methods 20, 375–386 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Tian, L., Chen, F. & Macosko, E. Z. The expanding vistas of spatial transcriptomics. Nat. Biotechnol. 41, 773–782 (2023).

    Article  CAS  PubMed  Google Scholar 

  58. Bredikhin, D., Kats, I. & Stegle, O. MUON: multimodal omics analysis framework. Genome Biol. 23, 42 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  59. Liscovitch-Brauer, N. et al. Profiling the genetic determinants of chromatin accessibility with scalable single-cell CRISPR screens. Nat. Biotechnol. 39, 1270–1277 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Vierstra, J. et al. Global reference mapping of human transcription factor footprints. Nature 583, 729–736 (2020).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  61. Satpathy, A. T. et al. Massively parallel single-cell chromatin landscapes of human immune cell development and intratumoral T cell exhaustion. Nat. Biotechnol. 37, 925–936 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  62. Wolf, F. A., Angerer, P. & Theis, F. J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 19, 15 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  63. Bairoch, A. The Cellosaurus, a cell-line knowledge resource. J. Biomol. Tech. 29, 25–38 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  64. Mölder, F. et al. Sustainable data analysis with Snakemake. F1000Res. 10, 33 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  65. Rizzo, M. L. & Székely, G. J. Energy distance. WIREs Comput. Stat. 8, 27–38 (2016).

    Article  MathSciNet  Google Scholar 

  66. Dhapola, P. et al. Scarf enables a highly memory-efficient analysis of large-scale single-cell genomics data. Nat. Commun. 13, 4616 (2022).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  67. Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  68. Aissa, A. F. et al. Single-cell transcriptional changes associated with drug tolerance and response to combination therapies in cancer. Nat. Commun. 12, 1628 (2021).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  69. Chang, M. T. et al. Identifying transcriptional programs underlying cancer drug response with TraCe-seq. Nat. Biotechnol. 40, 86–93 (2022).

    Article  CAS  PubMed  Google Scholar 

  70. Parkinson, H. et al. ArrayExpress: a public database of microarray experiments and gene expression profiles. Nucleic Acids Res. 35, D747–D750 (2007).

    Article  CAS  PubMed  Google Scholar 

  71. Datlinger, P. et al. Ultra-high-throughput single-cell RNA sequencing and perturbation screening with combinatorial fluidic indexing. Nat. Methods 18, 635–642 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  72. McFarland, J. M. et al. Multiplexed single-cell transcriptional response profiling to define cancer vulnerabilities and therapeutic mechanism of action. Nat. Commun. 11, 4296 (2020).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  73. Mimitou, E. P. et al. Scalable, multimodal profiling of chromatin accessibility, gene expression and protein levels in single cells. Nat. Biotechnol. 39, 1246–1258 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  74. Schraivogel, D. et al. Targeted Perturb-seq enables genome-scale genetic screens in single cells. Nat. Methods 17, 629–635 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  75. Shifrut, E. et al. Genome-wide CRISPR screens in primary human T cells reveal key regulators of immune function. Cell 175, 1958–1971 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  76. Tian, R. et al. CRISPR interference-based platform for multimodal genetic screens in human iPSC-derived neurons. Neuron 104, 239–255 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  77. Weinreb, C., Rodriguez-Fraticelli, A., Camargo, F. D. & Klein, A. M. Lineage tracing on transcriptional landscapes links state to fate during differentiation. Science 367, eaaw3381 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  78. Xie, S., Duan, J., Li, B., Zhou, P. & Hon, G. C. Multiplexed engineering and analysis of combinatorial enhancer activity in single cells. Mol. Cell 66, 285–299 (2017).

    Article  CAS  PubMed  Google Scholar 

  79. Zhao, W. et al. Deconvolution of cell type-specific drug responses in human tumor tissue with single-cell RNA-seq. Genome Med. 13, 82 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

The authors appreciate informative conversations with Y. Ji of the Fabian Theis laboratory, helpful code suggestions from G. Wong, and computational support from A. Kollasch of the Debora Marks laboratory. The authors also appreciate preprint review comments from Arcadia Science’s preprint review initiative (G. P. Way, N. Davidson, E. Serrano, P. Hicks, J. Tomkinson, D. Bunten). This work was supported by the National Resource for Network Biology (NRNB, P41GM103504 to C.Sa.), the Wellcome Leap ∆Tissue Program (to C.Sa., L.J.S., D.S.M.), the Deutsche Forschungsgemeinschaft (DFG, RTG2424 CompCancer to N.B.), Einstein Stiftung Berlin (Einstein Visiting Fellow program, to C.Sa., N.B.), and the Intramural Research Program of the National Library of Medicine, National Institutes of Health (to A.L.). Computation was in part performed on the HPC for Research cluster of the Berlin Institute of Health. Figures 1 and 4b were created with BioRender.com.

Author information

Authors and Affiliations

Authors

Contributions

The project was conceptualized by C.Sa., N.B., A.L. and B.Y. Data were curated by T.D.G., S.P., C.Sh., T.G. and S.G. Formal analysis and methodology development were carried out by S.P., T.D.G. and C.Sa. Funding acquisition was done by N.B., D.S.M., L.J.S. and C.Sa. Software development was carried out by J.M., S.P. and T.D.G. Supervision was provided by N.B., A.L., J.P.T.-K., C.Sa., D.S.M. and L.J.S. The original draft was written by T.D.G., S.P., C.Sh., T.G. and J.P.T.-K. Writing review and editing were done by L.J.S., C.Sa., N.B. and A.L.

Corresponding authors

Correspondence to Stefan Peidli, Augustin Luna, Nils Blüthgen or Chris Sander.

Ethics declarations

Competing interests

J.P.T.-K. and T.G. are employees of Relation Therapeutics. C.Sa. is on the science advisory board of Cytoreason Ltd. D.S.M. serves as an advisor for Dyno Therapeutics, Octant, Jura Bio, Tectonic Therapeutic, and Genentech, and is a co-founder of Seismic Therapeutic. All other authors have no competing interests.

Peer review

Peer review information

Nature Methods thanks the anonymous reviewers for their contribution to the peer review of this work. Peer reviewer reports are available. Primary Handling Editor: Lin Tang, in collaboration with the Nature Methods team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Number of cells per dataset by submission date.

There is a rapid increase in published single-cell perturbation datasets around 2019. We speculate that the slight decrease of dataset numbers after 2021 suggested by the plot is due to the ongoing impact of reduced research in the earlier phases of the COVID-19 pandemic.

Extended Data Fig. 2 Harmonization and analysis workflow.

Perturbation datasets with single-cell molecular profiles with at least two perturbations and one control condition (for example unperturbed) of various modality types were identified in a literature search. Data were obtained from public repositories, and metadata (such as guide identity) from paper supplements. Datasets were reprocessed to standardize annotations and analyzed in parallel. All datasets are now available for download from scperturb.org, along with visualizations and summarizing information.

Extended Data Fig. 3 Pairwise E-distances for NormanWeissman2019 dataset.

E-distances between all pairs of perturbations in the dataset NormanWeissman2019. The color scale is clipped at 5% highest and lowest percentiles. Clusters of similar perturbations are visible, for example a cluster of strongly acting perturbations targeting CEBPA at the top.

Supplementary information

Supplementary Information

Supplementary Fig. 1, Supplemental Note including figures and sections 1–9.

Reporting Summary

Peer Review File

Supplementary Tables 1–5

Supplementary Table 1: Dataset metadata and description of source data papers. Supplementary Table 2: Description of scperturb-formatted gene and cell metadata. Supplementary Table 3: E-statistic results for filtered perturbations across datasets in database. Supplementary Table 4: Drug perturbations appearing in multiple datasets. Supplementary Table 5: Gene perturbations appearing in multiple datasets.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Peidli, S., Green, T.D., Shen, C. et al. scPerturb: harmonized single-cell perturbation data. Nat Methods 21, 531–540 (2024). https://doi.org/10.1038/s41592-023-02144-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41592-023-02144-y

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing