Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Brief Communication
  • Published:

The SpliZ generalizes ‘percent spliced in’ to reveal regulated splicing at single-cell resolution

A Publisher Correction to this article was published on 25 April 2022

This article has been updated

Abstract

Detecting single-cell-regulated splicing from droplet-based technologies is challenging. Here, we introduce the splicing Z score (SpliZ), an annotation-free statistical method to detect regulated splicing in single-cell RNA sequencing. We applied the SpliZ to human lung cells, discovering hundreds of genes with cell-type-specific splicing patterns including ones with potential implications for basic and translational biology.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: The SpliZ outperforms PSI in simulation.
Fig. 2: The SpliZ detects cell-type-specific splicing in HLCA dataset.

Similar content being viewed by others

Data availability

HLCA data was downloaded from the European Genome-Phenome Archive at accession number EGAS00001004344 (ref. 22). We refer to patient 2 in HLCA as individual 1 and patient 3 as individual 2 in our analysis. Our cell-type definition is based on concatenating the ‘compartment’ and ‘free annotation’ columns from the HLCA metadata and only considering lung cells (not blood). SpliZ scores and Leafcutter results, as well as the original data needed to reproduce these results, are available at FigShare: https://doi.org/10.6084/m9.figshare.14378819.v1. Human RefSeq hg38 annotation file was downloaded from ftp://ftp.ncbi.nlm.nih.gov/refseq/H_sapiens/annotation/GRCh38_latest/refseq_identifiers/GRCh38_latest_genomic.gff.gz. The UCSC Pfam database for the hg38 genome assembly was downloaded from http://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/ucscGenePfam.txt.gz.

Code availability

The SpliZ code along with the code used for data analysis and to create the figures are available through a GitHub repository https://github.com/juliaolivieri/SpliZ_pipeline/. This repository is archived with Zenodo under the following https://doi.org/10.5281/zenodo.5781783 (ref. 23). The pipeline was written in Python (v.3.6.7), and installed package versions are the following (also available in an environment.yml file on github): matplotlib (v.2.2.3); numpy (v.1.18.4); pandas (v.1.0.4); pyarrow (v.0.15.1); scipy (v.1.4.1); snakemake-minimal (v.5.4.5); statsmodels (v.0.11.1) and tqdm (v.4.46.0). We used Leafcutter (https://github.com/davidaknowles/leafcutter) and regtools (https://github.com/griffithlab/regtools).

Change history

References

  1. Shao, Y. et al. Alternative splicing-derived intersectin1-l and intersectin1-s exert opposite function in glioma progression. Cell Death Dis. 10, 431 (2019).

    Article  Google Scholar 

  2. Nakka, K., Kovac, R., Wong, M. M.-K. & Dilworth, F. J. Intron retained, transcript detained: intron retention as a hallmark of the quiescent satellite cell state. Dev. Cell 53, 623–625 (2020).

    Article  CAS  Google Scholar 

  3. Oleynikov, Y. & Singer, R. H. RNA localization: different zipcodes, same postman? Trends Cell Biol. 8, 381–383 (1998).

    Article  CAS  Google Scholar 

  4. Yang, Y. & Carstens, R. P. Alternative splicing regulates distinct subcellular localization of epithelial splicing regulatory protein 1 (esrp1) isoforms. Sci. Rep. 7, 3848 (2017).

    Article  Google Scholar 

  5. Arzalluz-Luque, Á. & Conesa, A. Single-cell RNAseq for the study of isoforms—how is that possible? Genome Biol. 19, 110 (2018).

    Article  Google Scholar 

  6. Vaquero-Garcia, J. et al. A new view of transcriptome complexity and regulation through the lens of local splicing variations. eLife 5, e11752 (2016).

    Article  Google Scholar 

  7. Salzman, J., Jiang, H. & Wong, W. H. Statistical modeling of RNA-seq data. Statistical Sci. https://doi.org/10.1214/10-STS343 (2011).

  8. Li, J. J., Jiang, C.-R., Brown, J. B., Huang, H. & Bickel, P. J. Sparse linear modeling of next-generation mRNA sequencing (RNA-seq) data for isoform discovery and abundance estimation. Proc. Natl Acad. Sci. USA 108, 19867–19872 (2011).

    Article  CAS  Google Scholar 

  9. Trincado, J. L. et al. Suppa2: fast, accurate, and uncertainty-aware differential splicing analysis across multiple conditions. Genome Biol. 19, 40 (2018).

    Article  Google Scholar 

  10. Shen, S. et al. rmats: robust and flexible detection of differential alternative splicing from replicate RNA-seq data. Proc. Natl Acad. Sci. USA 111, E5593–E5601 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  11. Najar, C. F. B. A., Yosef, N. & Lareau, L. F. Coverage-dependent bias creates the appearance of binary splicing in single cells. eLife 9, e54603 (2020).

    Article  CAS  Google Scholar 

  12. Westoby, J., Artemov, P., Hemberg, M. & Ferguson-Smith, A. Obstacles to detecting isoforms using full-length scRNA-seq data. Genome Biol. 21, 74 (2020).

    Article  CAS  Google Scholar 

  13. Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).

    Article  CAS  Google Scholar 

  14. Dehghannasiri, R., Olivieri, J. E., Damljanovic, A. & Salzman, J. Specific splice junction detection in single cells with SICILIAN. Genome Biol. 22, 219 (2021).

    Article  CAS  Google Scholar 

  15. Travaglini, K. J. et al. A molecular cell atlas of the human lung from single-cell RNA sequencing. Nature 587, 619–625 (2020).

    Article  CAS  Google Scholar 

  16. Hayat, S. M. G. et al. Cd47: role in the immune system and application to cancer therapy. Cell. Oncol. 43, 19–30 (2020).

    Article  CAS  Google Scholar 

  17. Chao, M. P. et al. Therapeutic targeting of the macrophage immune checkpoint CD47 in myeloid malignancies. Front. Oncol. 9, 1380 (2020).

    Article  Google Scholar 

  18. Li, Y. I. et al. Annotation-free quantification of RNA splicing using leafcutter. Nat. Genet. 50, 151–158 (2018).

    Article  CAS  Google Scholar 

  19. Olivieri, J. E. et al. RNA splicing programs define tissue compartments and cell types at single cell resolution. eLife 10, e70692 (2021).

    Article  CAS  Google Scholar 

  20. Chung, E. & Romano, J. P. Exact and asymptotically robust permutation tests. Ann. Stat. 41, 484–507 (2013).

    Article  Google Scholar 

  21. Li, Y. I. et al. Annotation-free quantification of RNA splicing using leafcutter. Nat. Genet. 50, 151–158 (2018).

    Article  CAS  Google Scholar 

  22. Travaglini, K. J. et al. A molecular cell atlas of the human lung from single-cell RNA sequencing. Nature 587, 619–625 (2020).

    Article  CAS  Google Scholar 

  23. Olivieri, J. E. juliaolivieri/SpliZ_pipeline: v1.0. Zenodo https://doi.org/10.5281/zenodo.5781783 (2021).

Download references

Acknowledgements

We thank P. Wang and S. Quake for insightful and instrumental comments during the development of the method, E. Meyer and R. Bierman for comments on the manuscript, J. Klein for creating parts of Fig. 1, and K. Travaglini and M. Krasnow for providing advanced access to the HLCA data before its publication. J.O. is supported by the National Science Foundation Graduate Research Fellowship under grant no. DGE-1656518, a Stanford Graduate Fellowship and a Lieberman Fellowship. R.D. is supported by the Cancer Systems Biology Scholars Program grant no. R25 CA180993 and Clinical Data Science Fellowship grant no. T15 LM7033-36. J.S. is supported by the National Institute of General Medical Sciences grant nos. R01 GM116847 and R35 GM139517 and the National Science Federation Faculty Early Career Development Program Award no. MCB1552196.

Author information

Authors and Affiliations

Authors

Contributions

J.O. developed the software and analyzed the data. R.D. developed the software and analyzed the data. J.S. conceived and supervised the project. J.O., R.D. and J.S. wrote the paper.

Corresponding author

Correspondence to Julia Salzman.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Methods thanks Ángeles Arzalluz-Luque, Yang I. Li and the other, anonymous, reviewer for their contribution to the peer review of this work. Lin Tang, in collaboration with the Nature Methods team, was the Primary Handling Editor. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 The SpliZ is not correlated with gene expression.

a. There is no consistent correlation between either the SpliZ or SpliZVD and gene expression. b. Plots of number of spliced reads vs SpliZ and SpliZVD show that there is no significant correlation between gene expression and SpliZ or SpliZVD. c. LMO7 shows no evidence of correlation between number of spliced reads and SpliZ or SpliZVD.

Extended Data Fig. 2 SpliZ toy example.

Cell on the left has short average introns vs the cell on the right, giving it a lower SpliZ.

Extended Data Fig. 3 Simulations of the SpliZ, SpliZVD, and PSI.

In both simulations, two cell types with 20 cells respectively are simulated, each type having a different proportion of isoforms (1000 trials for a, 100 trials for b). At each read depth, Poisson(n) reads are sampled in proportion to isoform abundances. Null values are calculated from cell populations with identical isoform expression. SpliZsites based on the SVD described in the methods are starred by asterisks, and coincide with all splice sites with differential partner exon usage between the two cell types. a. The SpliZ, SpliZVD, and PSI all have the same power in the case of exon skipping. b. The simulation from Fig. 1d was modified, changing the fractions of isoform abundance as described in the figure such that when exon 2 is present, exon 3 is included 99% of the time. While this splice site was identified as a SpliZsite in Fig. 1d, here it is not identified (red X), as would be expected for a splice junction with 99% constitutive splicing. This simulation setting shows that the SpliZ can correctly identify only alternative exons as SpliZsites.

Extended Data Fig. 4 SpliZVD calculation example.

The SpliZVD is the projection of the matrix of splicing residuals onto its first eigenvector. The residual matrix’s SVD is used to identify the most variably alternatively spliced sites. The top left shows a gene structure of TPM2, with reads aligning to different junctions above. Each read is assigned a SpliZ residual and SpliZVD residual, the latter of which is based on the first eigenvector (shown below the gene annotations). Ovals representing different cells are colored based on the sign of their SpliZ or SpliZVD values, showing that in this case the SpliZVD is able to distinguish differences in splicing where the SpliZ cannot.

Extended Data Fig. 5 Splicing residual matrices.

SVD decompositions of the splicing residual matrix based on the simulations in Fig. 1d,f, and Extended Data Fig. 3 at average read depth 20. Rows correspond to splice sites and columns correspond to eigenvectors. For each splice site, the value from the SVD that causes the splice site to be picked as SpliZsite is boxed in green. a,b,c. As expected, all non-constitutive splice sites are selected as SpliZsites. d. As expected, the three most variable splice sites are chosen as SpliZsites, while the 99% constitutive splice site is not chosen as SpliZsite.

Extended Data Fig. 6 Cell type summary in HLCA dataset.

a. Box plot of the number of reads per cell for each cell type in both individuals’ 10x data. Numbers to the right of the plot indicate the number of cells plotted for each cell type over the given experiment. The middle line of each box is the median, and each box extends from the first to third quartile. Whiskers extend to 1.5 times the interquartile range. All points outside of this range are plotted individually. b. Bar plots of the number of cells per cell type for each individuals’ 10x data.

Extended Data Fig. 7 Called gene intersections.

a. Intersection of genes called by the SpliZ between 10X and subsetted Smart-seq2 data. b. Intersection of genes called by the SpliZVD between 10X and subsetted Smart-seq2 data. c. Intersection between genes called by the SpliZ and SpliZVD in 10x data.

Extended Data Fig. 8 SpliZ correlations.

Correlation between median SpliZ values per matched gene and cell type a. Between individuals for 10X data. b. Between technologies for each HLCA individual (datasets subsetted to shared junctions and cell types per individual before running).

Extended Data Fig. 9 Compartment-specific splicing.

The compartment-specific regulated alternative splicing of (a) ATP5FC1, (b) MYL6, and, (c) PPP1R12A. For ATP5F1C, endothelial and stromal cells have a higher fraction of junctional reads for the exon exclusion event compared to other compartments. For MYL6, immune cells have lower fraction of junctional reads for the exon inclusion event compared to other compartments. For PPP1R12A, immune cells have a higher fraction of junctional reads for the exon inclusion event. The plots include the cell types in which the splice site has at least 20 junctional reads in at least 10 cells in at least one of the 4 datasets (two individuals and two technologies: 10X and SS2) were chosen. Dots represent the fraction of junctional reads for each alternative splice site in the celltype. For each dot, the outer ring (in white) shows the upper CI and the inner ring (color-coded) shows the lower CI. Box plots show the average alternative splice site for each cell across all technologies and individuals. Each box shows 25-75% quantiles of average splice site per cell. Arcs represent the average fractions at the compartment level. There is no dot when the alternative splice site has 0 junctional reads.

Extended Data Fig. 10 10X vs Smart-seq2 comparison.

a. Upset plot showing the number of cell types sequenced in each HLCA individual and technology. b. Bar plot showing the number of cells sequenced for each HLCA individual and technology. c. Upset plot showing the number of alternative junctions (junctions for which at least one splice site has at least one other partner in the dataset) for each HLCA individual and technology. d. The SpliZ is calculated independently for Smart-seq2 data restricted to junctions detected by 10X to measure technology-dependence of results.

Supplementary information

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Olivieri, J.E., Dehghannasiri, R. & Salzman, J. The SpliZ generalizes ‘percent spliced in’ to reveal regulated splicing at single-cell resolution. Nat Methods 19, 307–310 (2022). https://doi.org/10.1038/s41592-022-01400-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41592-022-01400-x

This article is cited by

Search

Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics