Annotation-free quantification of RNA splicing using LeafCutter

Li, Yang I.; Knowles, David A.; Humphrey, Jack; Barbeira, Alvaro N.; Dickinson, Scott P.; Im, Hae Kyung; Pritchard, Jonathan K.

doi:10.1038/s41588-017-0004-9

Technical Report
Published: 11 December 2017

Annotation-free quantification of RNA splicing using LeafCutter

Nature Genetics volume 50, pages 151–158 (2018)Cite this article

36k Accesses
326 Citations
113 Altmetric
Metrics details

Subjects

Abstract

The excision of introns from pre-mRNA is an essential step in mRNA processing. We developed LeafCutter to study sample and population variation in intron splicing. LeafCutter identifies variable splicing events from short-read RNA-seq data and finds events of high complexity. Our approach obviates the need for transcript annotations and circumvents the challenges in estimating relative isoform or exon usage in complex splicing events. LeafCutter can be used both to detect differential splicing between sample groups and to map splicing quantitative trait loci (sQTLs). Compared with contemporary methods, our approach identified 1.4–2.1 times more sQTLs, many of which helped us ascribe molecular effects to disease-associated variants. Transcriptome-wide associations between LeafCutter intron quantifications and 40 complex traits increased the number of associated disease genes at a 5% false discovery rate by an average of 2.1-fold compared with that detected through the use of gene expression levels alone. LeafCutter is fast, scalable, easy to use, and available online.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 2: LeafCutter discovers reproducible unannotated introns.**

**Fig. 3: A comparison of methods for detecting differential splicing.**

**Fig. 4: LeafCutter sQTLs augment the interpretation of GWAS hits.**

**Fig. 5: LeafCutter sQTLs enable interpretation of disease variants.**

Single-cell long-read sequencing-based mapping reveals specialized splicing patterns in developing and adult mouse and human brain

Article Open access 09 April 2024

Inferring gene regulatory networks from single-cell multiome data using atlas-scale external data

Article Open access 12 April 2024

Tissue-specific enhancer–gene maps from multimodal single-cell data identify causal disease alleles

Article 09 April 2024

References

Han, H. et al. MBNL proteins repress ES-cell-specific alternative splicing and reprogramming. Nature 498, 241–245 (2013).
Article CAS Google Scholar
Calarco, J. A. et al. Regulation of vertebrate nervous system alternative splicing and development by an SR-related protein. Cell 138, 898–910 (2009).
Article CAS Google Scholar
Brett, D., Pospisil, H., Valcárcel, J., Reich, J. & Bork, P. Alternative splicing and genome complexity. Nat. Genet. 30, 29–30 (2002).
Article CAS Google Scholar
Pai, A. A. et al. Widespread shortening of 3′ untranslated regions and increased exon inclusion are evolutionarily conserved features of innate immune responses to infection. PLoS Genet. 12, e1006338 (2016).
Article Google Scholar
Trapnell, C. et al. Differential analysis of gene regulation at transcript resolution with RNA-seq. Nat. Biotechnol. 31, 46–53 (2013).
Article CAS Google Scholar
Leng, N. et al. EBSeq: an empirical Bayes hierarchical model for inference in RNA-seq experiments. Bioinformatics 29, 1035–1043 (2013).
Article CAS Google Scholar
Patro, R., Mount, S. M. & Kingsford, C. Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms. Nat. Biotechnol. 32, 462–464 (2014).
Article CAS Google Scholar
Bray, N., Pimentel, H., Melsted, P. & Pachter, L. Near-optimal RNA-Seq quantification. Preprint available at https://arxiv.org/abs/1505.02710 (2015).
Katz, Y., Wang, E. T., Airoldi, E. M. & Burge, C. B. Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nat. Methods 7, 1009–1015 (2010).
Article CAS Google Scholar
Anders, S., Reyes, A. & Huber, W. Detecting differential usage of exons from RNA-seq data. Genome Res. 22, 2008–2017 (2012).
Article CAS Google Scholar
Lacroix, V., Sammeth, M., Guigo, R. & Bergeron, A. Exact transcriptome reconstruction from short sequence reads. In Algorithms in Bioinformatics (eds. Crandall, K.A. & Lagergren, J.) 50–63 (Springer, Berlin, Heidelberg, 2008).
Vaquero-Garcia, J. et al. A new view of transcriptome complexity and regulation through the lens of local splicing variations. eLife 5, e11752 (2016).
Article Google Scholar
Stein, S., Lu, Z. X., Bahrami-Samani, E., Park, J. W. & Xing, Y. Discover hidden splicing variations by mapping personal transcriptomes to personal genomes. Nucleic Acids Res. 43, 10612–10622 (2015).
Article CAS Google Scholar
Zhao, K., Lu, Z. X., Park, J. W., Zhou, Q. & Xing, Y. GLiMMPS: robust statistical model for regulatory variation of alternative splicing using RNA-seq data. Genome Biol. 14, R74 (2013).
Article Google Scholar
Monlong, J., Calvo, M., Ferreira, P. G. & Guigó, R. Identification of genetic variants associated with alternative splicing using sQTLseekeR. Nat. Commun. 5, 4698 (2014).
Article CAS Google Scholar
Ongen, H. & Dermitzakis, E. T. Alternative splicing QTLs in European and African populations. Am. J. Hum. Genet. 97, 567–575 (2015).
Article CAS Google Scholar
Li, Y. I. et al. RNA splicing is a primary link between genetic variation and disease. Science 352, 600–604 (2016).
Article CAS Google Scholar
Tilgner, H. et al. Deep sequencing of subcellular RNA fractions shows splicing to be predominantly co-transcriptional in the human genome but inefficient for lncRNAs. Genome Res. 22, 1616–1625 (2012).
Article CAS Google Scholar
Wu, J., Anczuków, O., Krainer, A. R., Zhang, M. Q. & Zhang, C. OLego: fast and sensitive mapping of spliced mRNA-Seq reads using small seeds. Nucleic Acids Res. 41, 5149–5163 (2013).
Article CAS Google Scholar
GTEx Consortium. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348, 648–660 (2015).
Article CAS Google Scholar
Soumillon, M. et al. Cellular source and mechanisms of high transcriptome complexity in the mammalian testis. Cell Rep. 3, 2179–2190 (2013).
Article CAS Google Scholar
Kaessmann, H. Origins, evolution, and phenotypic impact of new genes. Genome Res. 20, 1313–1326 (2010).
Article CAS Google Scholar
Nellore, A. et al. Human splicing diversity and the extent of unannotated splice junctions across human RNA-seq samples on the Sequence Read Archive. Genome Biol. 17, 266 (2016).
Article Google Scholar
Shen, S. et al. rMATS: robust and flexible detection of differential alternative splicing from replicate RNA-Seq data. Proc. Natl. Acad. Sci. USA 111, E5593–E5601 (2014).
Article CAS Google Scholar
Merkin, J., Russell, C., Chen, P. & Burge, C. B. Evolutionary dynamics of gene and isoform regulation in mammalian tissues. Science 338, 1593–1599 (2012).
Article CAS Google Scholar
Barbosa-Morais, N. L. et al. The evolutionary landscape of alternative splicing in vertebrate species. Science 338, 1587–1593 (2012).
Article CAS Google Scholar
Reyes, A. et al. Drift and conservation of differential exon usage across tissues in primate species. Proc. Natl. Acad. Sci. USA 110, 15377–15382 (2013).
Article CAS Google Scholar
Ongen, H., Buil, A., Brown, A. A., Dermitzakis, E. T. & Delaneau, O. Fast and efficient QTL mapper for thousands of molecular phenotypes. Bioinformatics 32, 1479–1485 (2016).
Article CAS Google Scholar
Lappalainen, T. et al. Transcriptome and genome sequencing uncovers functional variation in humans. Nature 501, 506–511 (2013).
Article CAS Google Scholar
Hsiao, Y. H. et al. Alternative splicing modulated by genetic variants demonstrates accelerated evolution regulated by highly conserved proteins. Genome Res. 26, 440–450 (2016).
Article CAS Google Scholar
Barbeira, A.N. et al. Exploring the phenotypic consequences of tissue specific gene expression variation inferred from GWAS summary statistics. Preprint available at https://www.biorxiv.org/content/early/2017/10/03/045260 (2017).
Orozco, G. et al. Association of CD40 with rheumatoid arthritis confirmed in a large UK case-control study. Ann. Rheum. Dis. 69, 813–816 (2010).
Article CAS Google Scholar
van de Geijn, B., McVicker, G., Gilad, Y. & Pritchard, J. K. WASP: allele-specific software for robust molecular quantitative trait locus discovery. Nat. Methods 12, 1061–1063 (2015).
Article Google Scholar
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
Article CAS Google Scholar
Wheeler, H. E. et al. Survey of the heritability and sparse architecture of gene expression traits across human tissues. PLoS Genet 12, e1006423 (2016).
Article Google Scholar
Gamazon, E. R. et al. A gene-based association method for mapping traits using reference transcriptome data. Nat. Genet. 47, 1091–1098 (2015).
Article CAS Google Scholar
Ellis, S.E., Collado Torres, L. & Leek, J. Improving the value of public RNA-seq expression data by phenotype prediction. Preprint available at http://www.biorxiv.org/content/early/2017/06/03/145656.full.pdf (2017).

Download references

Acknowledgements

We thank X. Lan and other members of the Pritchard lab for helpful discussions and comments. This work was supported by a CEHG fellowship (Y.I.L.), the Howard Hughes Medical Institute (J.K.P.), and the US National Institutes of Health (NIH grants HG007036, HG008140, and HG009431 to J.K.P., and MH107666 to H.K.I.).

Author information

Yang I. Li
Present address: Section of Genetic Medicine, Department of Medicine, The University of Chicago, Chicago, IL, USA
Yang I. Li and David A. Knowles contributed equally to this work.

Authors and Affiliations

Department of Genetics, Stanford University, Stanford, CA, USA
Yang I. Li, David A. Knowles & Jonathan K. Pritchard
Department of Computer Science, Stanford University, Stanford, CA, USA
David A. Knowles
Department of Radiology, Stanford University, Stanford, CA, USA
David A. Knowles
UCL Genetics Institute, Gower Street, London, UK
Jack Humphrey
Department of Neurodegenerative Disease, UCL Institute of Neurology, London, UK
Jack Humphrey
Section of Genetic Medicine, Department of Medicine, The University of Chicago, Chicago, IL, USA
Alvaro N. Barbeira, Scott P. Dickinson & Hae Kyung Im
Department of Biology, Stanford University, Stanford, CA, USA
Jonathan K. Pritchard
Howard Hughes Medical Institute, Stanford University, Stanford, CA, USA
Jonathan K. Pritchard

Authors

Yang I. Li
View author publications
You can also search for this author in PubMed Google Scholar
David A. Knowles
View author publications
You can also search for this author in PubMed Google Scholar
Jack Humphrey
View author publications
You can also search for this author in PubMed Google Scholar
Alvaro N. Barbeira
View author publications
You can also search for this author in PubMed Google Scholar
Scott P. Dickinson
View author publications
You can also search for this author in PubMed Google Scholar
Hae Kyung Im
View author publications
You can also search for this author in PubMed Google Scholar
Jonathan K. Pritchard
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Y.I.L., D.A.K., and J.K.P. conceived of the project. Y.I.L. and D.A.K. performed the analyses and implemented the software. D.A.K. developed and performed the statistical tests and modeling. J.H. implemented the visualization application. A.N.B., S.P.D., and H.K.I. performed the S-PrediXcan analyses. Y.I.L. and J.K.P. wrote the manuscript.

Corresponding authors

Correspondence to Yang I. Li, David A. Knowles or Jonathan K. Pritchard.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Integrated Supplementary Information

Supplementary Figure 1

Several types of common alternatively splicing events are captured by the alternative excision of introns.

Supplementary Figure 2

Bar plots showing the number of alternatively used junctions annotated from our GTEx analyses that were found in Intropolis⁶. phenopredict⁸ was used to predict the tissue type corresponding to the SRA samples analyzed in Intropolis. For each set of junctions, the proportion of junctions that were found (at least 1 read) in any SRA sample (Any), or found in samples which were predicted to be from testis (Testis) are highlighted. The predicted tissues with the highest number of supported junctions are colored in purple. Eighty-six percent of all novel alternatively used testis junctions from our LeafCutter analysis could be found in testis samples within SRA (not including GTEx).

Supplementary Figure 3

Junctions in GTEx tissues. (a) Distribution of the number of different GTEx tissues in which junctions predicted to be absent, or present in three commonly-used annotation databases, could be detected. (b) Relative junction usage in multiple GTEx organs of annotated and unannotated junctions identiffed in four GTEx organs. (c) Distribution of LeafCutter clusters from GTEx samples in terms of their splicing types. Clusters with only annotated junctions and clusters with unannotated junctions were further separated.

Supplementary Figure 4

PhastCons score distribution of splice site of novel introns. While ∼60% of annotated splice sites have local phastCons score >0.6, only 15-25% of unannotated splice sites do. Thus ∼80% of novel splice sites may represent noisy intron excision events.

Supplementary Figure 5

Comparison between beta-binomial and Dirichlet-multinomial models for differential splicing analyses, performed on 10 male brain vs. heart samples from GTEx. Two approaches for combining per-intron p-values into cluster level introns are compared: Bonferroni correction and Fisher's combined test. Bonferroni is very conservative, as expected. Fisher's combined test has considerably lower power than the multinomial approaches. However, only v2 of the Dirichlet-multinomial (which uses a per intron concentration/overdispersion parameter) is well calibrated under permutations.

Supplementary Figure 6

Memory usage (RAM) of four differential splicing methods applied to comparisons between 3, 5, 10, and 15 YRI versus CEU LCLs RNA-seq samples. We omitted the 15v15 MAJIQ run due to its expensive resource usage (both in terms of time and RAM). Right panel shows usage in log scale.

Supplementary Figure 7

Cumulative distributions of differential splicing test P values (1-posterior for MAJIQ) for the comparison of all YRI versus CEU LCLs (red). The distribution of test P values for the permuted comparisons are also shown (black). *Cuffinks2 reports 19 signiffcantly differentially spliced genes in the 3 vs. 3 comparison, but none in the other comparisons.

Supplementary Figure 8

Receiver operating characteristic (ROC) curves of LeafCutter, Cuffinks2, rMATS and MAJIQ for evaluation of differential splicing of genes with transcripts simulated to have varying levels of differential expression. Top panel shows ROC curves when excluding genes that were not tested by each respective methods. The bottom plot includes genes that were not tested in the calculation of true positive rate.

Supplementary Figure 9

LeafCutter is effective even with as few as 8 samples. Here we performed differential splicing analysis of 4 male brain vs 4 male muscle samples, and compared to results using 220 samples. a) p-values under permutations are well-calibrated. b-c) p-values and effect sizes are highly correlated between the two sample size datasets. d) Signiffcant disparity in effect sizes between the two sample sizes is primarily driven by an intron being unique to a tissue when N = 8.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, Y.I., Knowles, D.A., Humphrey, J. et al. Annotation-free quantification of RNA splicing using LeafCutter. Nat Genet 50, 151–158 (2018). https://doi.org/10.1038/s41588-017-0004-9

Download citation

Received: 04 April 2017
Accepted: 08 November 2017
Published: 11 December 2017
Issue Date: January 2018
DOI: https://doi.org/10.1038/s41588-017-0004-9

This article is cited by

SSBlazer: a genome-wide nucleotide-resolution model for predicting single-strand break sites
- Sheng Xu
- Junkang Wei
- Yu Li
Genome Biology (2024)
Interrogations of single-cell RNA splicing landscapes with SCASL define new cell identities with physiological relevance
- Xianke Xiang
- Yao He
- Xuerui Yang
Nature Communications (2024)
Alternative splicing and environmental adaptation in wild house mice
- David N. Manahan
- Michael W. Nachman
Heredity (2024)
Single-cell multi-ome regression models identify functional and disease-associated enhancers and enable chromatin potential analysis
- Sneha Mitra
- Rohan Malik
- Christina S. Leslie
Nature Genetics (2024)
rMATS-turbo: an efficient and flexible computational tool for alternative splicing analysis of large-scale RNA-seq data
- Yuanyuan Wang
- Zhijie Xie
- Yi Xing
Nature Protocols (2024)