The SpliZ generalizes ‘percent spliced in’ to reveal regulated splicing at single-cell resolution

Olivieri, Julia Eve; Dehghannasiri, Roozbeh; Salzman, Julia

doi:10.1038/s41592-022-01400-x

Brief Communication
Published: 03 March 2022

The SpliZ generalizes ‘percent spliced in’ to reveal regulated splicing at single-cell resolution

Nature Methods volume 19, pages 307–310 (2022)Cite this article

5729 Accesses
7 Citations
10 Altmetric
Metrics details

Subjects

A Publisher Correction to this article was published on 25 April 2022

This article has been updated

Abstract

Detecting single-cell-regulated splicing from droplet-based technologies is challenging. Here, we introduce the splicing Z score (SpliZ), an annotation-free statistical method to detect regulated splicing in single-cell RNA sequencing. We applied the SpliZ to human lung cells, discovering hundreds of genes with cell-type-specific splicing patterns including ones with potential implications for basic and translational biology.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: The SpliZ outperforms PSI in simulation.**

**Fig. 2: The SpliZ detects cell-type-specific splicing in HLCA dataset.**

Interrogations of single-cell RNA splicing landscapes with SCASL define new cell identities with physiological relevance

Article Open access 09 March 2024

rMATS-turbo: an efficient and flexible computational tool for alternative splicing analysis of large-scale RNA-seq data

Article 23 February 2024

Dissecting splicing decisions and cell-to-cell variability with designed sequence libraries

Article Open access 08 October 2019

Data availability

HLCA data was downloaded from the European Genome-Phenome Archive at accession number EGAS00001004344 (ref. ²²). We refer to patient 2 in HLCA as individual 1 and patient 3 as individual 2 in our analysis. Our cell-type definition is based on concatenating the ‘compartment’ and ‘free annotation’ columns from the HLCA metadata and only considering lung cells (not blood). SpliZ scores and Leafcutter results, as well as the original data needed to reproduce these results, are available at FigShare: https://doi.org/10.6084/m9.figshare.14378819.v1. Human RefSeq hg38 annotation file was downloaded from ftp://ftp.ncbi.nlm.nih.gov/refseq/H_sapiens/annotation/GRCh38_latest/refseq_identifiers/GRCh38_latest_genomic.gff.gz. The UCSC Pfam database for the hg38 genome assembly was downloaded from http://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/ucscGenePfam.txt.gz.

Code availability

The SpliZ code along with the code used for data analysis and to create the figures are available through a GitHub repository https://github.com/juliaolivieri/SpliZ_pipeline/. This repository is archived with Zenodo under the following https://doi.org/10.5281/zenodo.5781783 (ref. ²³). The pipeline was written in Python (v.3.6.7), and installed package versions are the following (also available in an environment.yml file on github): matplotlib (v.2.2.3); numpy (v.1.18.4); pandas (v.1.0.4); pyarrow (v.0.15.1); scipy (v.1.4.1); snakemake-minimal (v.5.4.5); statsmodels (v.0.11.1) and tqdm (v.4.46.0). We used Leafcutter (https://github.com/davidaknowles/leafcutter) and regtools (https://github.com/griffithlab/regtools).

Change history

25 April 2022
A Correction to this paper has been published: https://doi.org/10.1038/s41592-022-01500-8

References

Shao, Y. et al. Alternative splicing-derived intersectin1-l and intersectin1-s exert opposite function in glioma progression. Cell Death Dis. 10, 431 (2019).
Article Google Scholar
Nakka, K., Kovac, R., Wong, M. M.-K. & Dilworth, F. J. Intron retained, transcript detained: intron retention as a hallmark of the quiescent satellite cell state. Dev. Cell 53, 623–625 (2020).
Article CAS Google Scholar
Oleynikov, Y. & Singer, R. H. RNA localization: different zipcodes, same postman? Trends Cell Biol. 8, 381–383 (1998).
Article CAS Google Scholar
Yang, Y. & Carstens, R. P. Alternative splicing regulates distinct subcellular localization of epithelial splicing regulatory protein 1 (esrp1) isoforms. Sci. Rep. 7, 3848 (2017).
Article Google Scholar
Arzalluz-Luque, Á. & Conesa, A. Single-cell RNAseq for the study of isoforms—how is that possible? Genome Biol. 19, 110 (2018).
Article Google Scholar
Vaquero-Garcia, J. et al. A new view of transcriptome complexity and regulation through the lens of local splicing variations. eLife 5, e11752 (2016).
Article Google Scholar
Salzman, J., Jiang, H. & Wong, W. H. Statistical modeling of RNA-seq data. Statistical Sci. https://doi.org/10.1214/10-STS343 (2011).
Li, J. J., Jiang, C.-R., Brown, J. B., Huang, H. & Bickel, P. J. Sparse linear modeling of next-generation mRNA sequencing (RNA-seq) data for isoform discovery and abundance estimation. Proc. Natl Acad. Sci. USA 108, 19867–19872 (2011).
Article CAS Google Scholar
Trincado, J. L. et al. Suppa2: fast, accurate, and uncertainty-aware differential splicing analysis across multiple conditions. Genome Biol. 19, 40 (2018).
Article Google Scholar
Shen, S. et al. rmats: robust and flexible detection of differential alternative splicing from replicate RNA-seq data. Proc. Natl Acad. Sci. USA 111, E5593–E5601 (2014).
CAS PubMed PubMed Central Google Scholar
Najar, C. F. B. A., Yosef, N. & Lareau, L. F. Coverage-dependent bias creates the appearance of binary splicing in single cells. eLife 9, e54603 (2020).
Article CAS Google Scholar
Westoby, J., Artemov, P., Hemberg, M. & Ferguson-Smith, A. Obstacles to detecting isoforms using full-length scRNA-seq data. Genome Biol. 21, 74 (2020).
Article CAS Google Scholar
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
Article CAS Google Scholar
Dehghannasiri, R., Olivieri, J. E., Damljanovic, A. & Salzman, J. Specific splice junction detection in single cells with SICILIAN. Genome Biol. 22, 219 (2021).
Article CAS Google Scholar
Travaglini, K. J. et al. A molecular cell atlas of the human lung from single-cell RNA sequencing. Nature 587, 619–625 (2020).
Article CAS Google Scholar
Hayat, S. M. G. et al. Cd47: role in the immune system and application to cancer therapy. Cell. Oncol. 43, 19–30 (2020).
Article CAS Google Scholar
Chao, M. P. et al. Therapeutic targeting of the macrophage immune checkpoint CD47 in myeloid malignancies. Front. Oncol. 9, 1380 (2020).
Article Google Scholar
Li, Y. I. et al. Annotation-free quantification of RNA splicing using leafcutter. Nat. Genet. 50, 151–158 (2018).
Article CAS Google Scholar
Olivieri, J. E. et al. RNA splicing programs define tissue compartments and cell types at single cell resolution. eLife 10, e70692 (2021).
Article CAS Google Scholar
Chung, E. & Romano, J. P. Exact and asymptotically robust permutation tests. Ann. Stat. 41, 484–507 (2013).
Article Google Scholar
Li, Y. I. et al. Annotation-free quantification of RNA splicing using leafcutter. Nat. Genet. 50, 151–158 (2018).
Article CAS Google Scholar
Travaglini, K. J. et al. A molecular cell atlas of the human lung from single-cell RNA sequencing. Nature 587, 619–625 (2020).
Article CAS Google Scholar
Olivieri, J. E. juliaolivieri/SpliZ_pipeline: v1.0. Zenodo https://doi.org/10.5281/zenodo.5781783 (2021).

Download references

Acknowledgements

We thank P. Wang and S. Quake for insightful and instrumental comments during the development of the method, E. Meyer and R. Bierman for comments on the manuscript, J. Klein for creating parts of Fig. 1, and K. Travaglini and M. Krasnow for providing advanced access to the HLCA data before its publication. J.O. is supported by the National Science Foundation Graduate Research Fellowship under grant no. DGE-1656518, a Stanford Graduate Fellowship and a Lieberman Fellowship. R.D. is supported by the Cancer Systems Biology Scholars Program grant no. R25 CA180993 and Clinical Data Science Fellowship grant no. T15 LM7033-36. J.S. is supported by the National Institute of General Medical Sciences grant nos. R01 GM116847 and R35 GM139517 and the National Science Federation Faculty Early Career Development Program Award no. MCB1552196.

Author information

These authors contributed equally: Julia Eve Olivieri, Roozbeh Dehghannasiri.

Authors and Affiliations

Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
Julia Eve Olivieri, Roozbeh Dehghannasiri & Julia Salzman
Institute for Computational and Mathematical Engineering, Stanford University, Stanford, CA, USA
Julia Eve Olivieri
Department of Biochemistry, Stanford University, Stanford, CA, USA
Roozbeh Dehghannasiri & Julia Salzman

Authors

Julia Eve Olivieri
View author publications
You can also search for this author in PubMed Google Scholar
Roozbeh Dehghannasiri
View author publications
You can also search for this author in PubMed Google Scholar
Julia Salzman
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

J.O. developed the software and analyzed the data. R.D. developed the software and analyzed the data. J.S. conceived and supervised the project. J.O., R.D. and J.S. wrote the paper.

Corresponding author

Correspondence to Julia Salzman.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Methods thanks Ángeles Arzalluz-Luque, Yang I. Li and the other, anonymous, reviewer for their contribution to the peer review of this work. Lin Tang, in collaboration with the Nature Methods team, was the Primary Handling Editor. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 The SpliZ is not correlated with gene expression.

a. There is no consistent correlation between either the SpliZ or SpliZVD and gene expression. b. Plots of number of spliced reads vs SpliZ and SpliZVD show that there is no significant correlation between gene expression and SpliZ or SpliZVD. c. LMO7 shows no evidence of correlation between number of spliced reads and SpliZ or SpliZVD.

Extended Data Fig. 2 SpliZ toy example.

Cell on the left has short average introns vs the cell on the right, giving it a lower SpliZ.

Extended Data Fig. 3 Simulations of the SpliZ, SpliZVD, and PSI.

In both simulations, two cell types with 20 cells respectively are simulated, each type having a different proportion of isoforms (1000 trials for a, 100 trials for b). At each read depth, Poisson(n) reads are sampled in proportion to isoform abundances. Null values are calculated from cell populations with identical isoform expression. SpliZsites based on the SVD described in the methods are starred by asterisks, and coincide with all splice sites with differential partner exon usage between the two cell types. a. The SpliZ, SpliZVD, and PSI all have the same power in the case of exon skipping. b. The simulation from Fig. 1d was modified, changing the fractions of isoform abundance as described in the figure such that when exon 2 is present, exon 3 is included 99% of the time. While this splice site was identified as a SpliZsite in Fig. 1d, here it is not identified (red X), as would be expected for a splice junction with 99% constitutive splicing. This simulation setting shows that the SpliZ can correctly identify only alternative exons as SpliZsites.

Extended Data Fig. 4 SpliZVD calculation example.

The SpliZVD is the projection of the matrix of splicing residuals onto its first eigenvector. The residual matrix’s SVD is used to identify the most variably alternatively spliced sites. The top left shows a gene structure of TPM2, with reads aligning to different junctions above. Each read is assigned a SpliZ residual and SpliZVD residual, the latter of which is based on the first eigenvector (shown below the gene annotations). Ovals representing different cells are colored based on the sign of their SpliZ or SpliZVD values, showing that in this case the SpliZVD is able to distinguish differences in splicing where the SpliZ cannot.

Extended Data Fig. 5 Splicing residual matrices.

SVD decompositions of the splicing residual matrix based on the simulations in Fig. 1d,f, and Extended Data Fig. 3 at average read depth 20. Rows correspond to splice sites and columns correspond to eigenvectors. For each splice site, the value from the SVD that causes the splice site to be picked as SpliZsite is boxed in green. a,b,c. As expected, all non-constitutive splice sites are selected as SpliZsites. d. As expected, the three most variable splice sites are chosen as SpliZsites, while the 99% constitutive splice site is not chosen as SpliZsite.

Extended Data Fig. 6 Cell type summary in HLCA dataset.

a. Box plot of the number of reads per cell for each cell type in both individuals’ 10x data. Numbers to the right of the plot indicate the number of cells plotted for each cell type over the given experiment. The middle line of each box is the median, and each box extends from the first to third quartile. Whiskers extend to 1.5 times the interquartile range. All points outside of this range are plotted individually. b. Bar plots of the number of cells per cell type for each individuals’ 10x data.

Extended Data Fig. 7 Called gene intersections.

a. Intersection of genes called by the SpliZ between 10X and subsetted Smart-seq2 data. b. Intersection of genes called by the SpliZVD between 10X and subsetted Smart-seq2 data. c. Intersection between genes called by the SpliZ and SpliZVD in 10x data.

Extended Data Fig. 8 SpliZ correlations.

Correlation between median SpliZ values per matched gene and cell type a. Between individuals for 10X data. b. Between technologies for each HLCA individual (datasets subsetted to shared junctions and cell types per individual before running).

Extended Data Fig. 9 Compartment-specific splicing.

The compartment-specific regulated alternative splicing of (a) ATP5FC1, (b) MYL6, and, (c) PPP1R12A. For ATP5F1C, endothelial and stromal cells have a higher fraction of junctional reads for the exon exclusion event compared to other compartments. For MYL6, immune cells have lower fraction of junctional reads for the exon inclusion event compared to other compartments. For PPP1R12A, immune cells have a higher fraction of junctional reads for the exon inclusion event. The plots include the cell types in which the splice site has at least 20 junctional reads in at least 10 cells in at least one of the 4 datasets (two individuals and two technologies: 10X and SS2) were chosen. Dots represent the fraction of junctional reads for each alternative splice site in the celltype. For each dot, the outer ring (in white) shows the upper CI and the inner ring (color-coded) shows the lower CI. Box plots show the average alternative splice site for each cell across all technologies and individuals. Each box shows 25-75% quantiles of average splice site per cell. Arcs represent the average fractions at the compartment level. There is no dot when the alternative splice site has 0 junctional reads.

Extended Data Fig. 10 10X vs Smart-seq2 comparison.

a. Upset plot showing the number of cell types sequenced in each HLCA individual and technology. b. Bar plot showing the number of cells sequenced for each HLCA individual and technology. c. Upset plot showing the number of alternative junctions (junctions for which at least one splice site has at least one other partner in the dataset) for each HLCA individual and technology. d. The SpliZ is calculated independently for Smart-seq2 data restricted to junctions detected by 10X to measure technology-dependence of results.

Supplementary information

Supplementary Information

Supplementary discussion.

Reporting Summary

Peer Review File

Supplementary Tables 1--3.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Olivieri, J.E., Dehghannasiri, R. & Salzman, J. The SpliZ generalizes ‘percent spliced in’ to reveal regulated splicing at single-cell resolution. Nat Methods 19, 307–310 (2022). https://doi.org/10.1038/s41592-022-01400-x

Download citation

Received: 16 April 2021
Accepted: 18 January 2022
Published: 03 March 2022
Issue Date: March 2022
DOI: https://doi.org/10.1038/s41592-022-01400-x

This article is cited by

Interrogations of single-cell RNA splicing landscapes with SCASL define new cell identities with physiological relevance
- Xianke Xiang
- Yao He
- Xuerui Yang
Nature Communications (2024)
ReadZS detects cell type-specific and developmentally regulated RNA processing programs in single-cell RNA-seq
- Elisabeth Meyer
- Kaitlin Chaung
- Julia Salzman
Genome Biology (2022)