The excision of introns from pre-mRNA is an essential step in mRNA processing. We developed LeafCutter to study sample and population variation in intron splicing. LeafCutter identifies variable splicing events from short-read RNA-seq data and finds events of high complexity. Our approach obviates the need for transcript annotations and circumvents the challenges in estimating relative isoform or exon usage in complex splicing events. LeafCutter can be used both to detect differential splicing between sample groups and to map splicing quantitative trait loci (sQTLs). Compared with contemporary methods, our approach identified 1.4–2.1 times more sQTLs, many of which helped us ascribe molecular effects to disease-associated variants. Transcriptome-wide associations between LeafCutter intron quantifications and 40 complex traits increased the number of associated disease genes at a 5% false discovery rate by an average of 2.1-fold compared with that detected through the use of gene expression levels alone. LeafCutter is fast, scalable, easy to use, and available online.
Subscribe to Journal
Get full journal access for 1 year
only $17.42 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Han, H. et al. MBNL proteins repress ES-cell-specific alternative splicing and reprogramming. Nature 498, 241–245 (2013).
Calarco, J. A. et al. Regulation of vertebrate nervous system alternative splicing and development by an SR-related protein. Cell 138, 898–910 (2009).
Brett, D., Pospisil, H., Valcárcel, J., Reich, J. & Bork, P. Alternative splicing and genome complexity. Nat. Genet. 30, 29–30 (2002).
Pai, A. A. et al. Widespread shortening of 3′ untranslated regions and increased exon inclusion are evolutionarily conserved features of innate immune responses to infection. PLoS Genet. 12, e1006338 (2016).
Trapnell, C. et al. Differential analysis of gene regulation at transcript resolution with RNA-seq. Nat. Biotechnol. 31, 46–53 (2013).
Leng, N. et al. EBSeq: an empirical Bayes hierarchical model for inference in RNA-seq experiments. Bioinformatics 29, 1035–1043 (2013).
Patro, R., Mount, S. M. & Kingsford, C. Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms. Nat. Biotechnol. 32, 462–464 (2014).
Bray, N., Pimentel, H., Melsted, P. & Pachter, L. Near-optimal RNA-Seq quantification. Preprint available at https://arxiv.org/abs/1505.02710 (2015).
Katz, Y., Wang, E. T., Airoldi, E. M. & Burge, C. B. Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nat. Methods 7, 1009–1015 (2010).
Anders, S., Reyes, A. & Huber, W. Detecting differential usage of exons from RNA-seq data. Genome Res. 22, 2008–2017 (2012).
Lacroix, V., Sammeth, M., Guigo, R. & Bergeron, A. Exact transcriptome reconstruction from short sequence reads. In Algorithms in Bioinformatics (eds. Crandall, K.A. & Lagergren, J.) 50–63 (Springer, Berlin, Heidelberg, 2008).
Vaquero-Garcia, J. et al. A new view of transcriptome complexity and regulation through the lens of local splicing variations. eLife 5, e11752 (2016).
Stein, S., Lu, Z. X., Bahrami-Samani, E., Park, J. W. & Xing, Y. Discover hidden splicing variations by mapping personal transcriptomes to personal genomes. Nucleic Acids Res. 43, 10612–10622 (2015).
Zhao, K., Lu, Z. X., Park, J. W., Zhou, Q. & Xing, Y. GLiMMPS: robust statistical model for regulatory variation of alternative splicing using RNA-seq data. Genome Biol. 14, R74 (2013).
Monlong, J., Calvo, M., Ferreira, P. G. & Guigó, R. Identification of genetic variants associated with alternative splicing using sQTLseekeR. Nat. Commun. 5, 4698 (2014).
Ongen, H. & Dermitzakis, E. T. Alternative splicing QTLs in European and African populations. Am. J. Hum. Genet. 97, 567–575 (2015).
Li, Y. I. et al. RNA splicing is a primary link between genetic variation and disease. Science 352, 600–604 (2016).
Tilgner, H. et al. Deep sequencing of subcellular RNA fractions shows splicing to be predominantly co-transcriptional in the human genome but inefficient for lncRNAs. Genome Res. 22, 1616–1625 (2012).
Wu, J., Anczuków, O., Krainer, A. R., Zhang, M. Q. & Zhang, C. OLego: fast and sensitive mapping of spliced mRNA-Seq reads using small seeds. Nucleic Acids Res. 41, 5149–5163 (2013).
GTEx Consortium. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348, 648–660 (2015).
Soumillon, M. et al. Cellular source and mechanisms of high transcriptome complexity in the mammalian testis. Cell Rep. 3, 2179–2190 (2013).
Kaessmann, H. Origins, evolution, and phenotypic impact of new genes. Genome Res. 20, 1313–1326 (2010).
Nellore, A. et al. Human splicing diversity and the extent of unannotated splice junctions across human RNA-seq samples on the Sequence Read Archive. Genome Biol. 17, 266 (2016).
Shen, S. et al. rMATS: robust and flexible detection of differential alternative splicing from replicate RNA-Seq data. Proc. Natl. Acad. Sci. USA 111, E5593–E5601 (2014).
Merkin, J., Russell, C., Chen, P. & Burge, C. B. Evolutionary dynamics of gene and isoform regulation in mammalian tissues. Science 338, 1593–1599 (2012).
Barbosa-Morais, N. L. et al. The evolutionary landscape of alternative splicing in vertebrate species. Science 338, 1587–1593 (2012).
Reyes, A. et al. Drift and conservation of differential exon usage across tissues in primate species. Proc. Natl. Acad. Sci. USA 110, 15377–15382 (2013).
Ongen, H., Buil, A., Brown, A. A., Dermitzakis, E. T. & Delaneau, O. Fast and efficient QTL mapper for thousands of molecular phenotypes. Bioinformatics 32, 1479–1485 (2016).
Lappalainen, T. et al. Transcriptome and genome sequencing uncovers functional variation in humans. Nature 501, 506–511 (2013).
Hsiao, Y. H. et al. Alternative splicing modulated by genetic variants demonstrates accelerated evolution regulated by highly conserved proteins. Genome Res. 26, 440–450 (2016).
Barbeira, A.N. et al. Exploring the phenotypic consequences of tissue specific gene expression variation inferred from GWAS summary statistics. Preprint available at https://www.biorxiv.org/content/early/2017/10/03/045260 (2017).
Orozco, G. et al. Association of CD40 with rheumatoid arthritis confirmed in a large UK case-control study. Ann. Rheum. Dis. 69, 813–816 (2010).
van de Geijn, B., McVicker, G., Gilad, Y. & Pritchard, J. K. WASP: allele-specific software for robust molecular quantitative trait locus discovery. Nat. Methods 12, 1061–1063 (2015).
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
Wheeler, H. E. et al. Survey of the heritability and sparse architecture of gene expression traits across human tissues. PLoS Genet 12, e1006423 (2016).
Gamazon, E. R. et al. A gene-based association method for mapping traits using reference transcriptome data. Nat. Genet. 47, 1091–1098 (2015).
Ellis, S.E., Collado Torres, L. & Leek, J. Improving the value of public RNA-seq expression data by phenotype prediction. Preprint available at http://www.biorxiv.org/content/early/2017/06/03/145656.full.pdf (2017).
We thank X. Lan and other members of the Pritchard lab for helpful discussions and comments. This work was supported by a CEHG fellowship (Y.I.L.), the Howard Hughes Medical Institute (J.K.P.), and the US National Institutes of Health (NIH grants HG007036, HG008140, and HG009431 to J.K.P., and MH107666 to H.K.I.).
The authors declare no competing financial interests.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Integrated Supplementary Information
Several types of common alternatively splicing events are captured by the alternative excision of introns.
Bar plots showing the number of alternatively used junctions annotated from our GTEx analyses that were found in Intropolis6. phenopredict8 was used to predict the tissue type corresponding to the SRA samples analyzed in Intropolis. For each set of junctions, the proportion of junctions that were found (at least 1 read) in any SRA sample (Any), or found in samples which were predicted to be from testis (Testis) are highlighted. The predicted tissues with the highest number of supported junctions are colored in purple. Eighty-six percent of all novel alternatively used testis junctions from our LeafCutter analysis could be found in testis samples within SRA (not including GTEx).
Junctions in GTEx tissues. (a) Distribution of the number of different GTEx tissues in which junctions predicted to be absent, or present in three commonly-used annotation databases, could be detected. (b) Relative junction usage in multiple GTEx organs of annotated and unannotated junctions identiffed in four GTEx organs. (c) Distribution of LeafCutter clusters from GTEx samples in terms of their splicing types. Clusters with only annotated junctions and clusters with unannotated junctions were further separated.
PhastCons score distribution of splice site of novel introns. While ∼60% of annotated splice sites have local phastCons score >0.6, only 15-25% of unannotated splice sites do. Thus ∼80% of novel splice sites may represent noisy intron excision events.
Comparison between beta-binomial and Dirichlet-multinomial models for differential splicing analyses, performed on 10 male brain vs. heart samples from GTEx. Two approaches for combining per-intron p-values into cluster level introns are compared: Bonferroni correction and Fisher's combined test. Bonferroni is very conservative, as expected. Fisher's combined test has considerably lower power than the multinomial approaches. However, only v2 of the Dirichlet-multinomial (which uses a per intron concentration/overdispersion parameter) is well calibrated under permutations.
Memory usage (RAM) of four differential splicing methods applied to comparisons between 3, 5, 10, and 15 YRI versus CEU LCLs RNA-seq samples. We omitted the 15v15 MAJIQ run due to its expensive resource usage (both in terms of time and RAM). Right panel shows usage in log scale.
Cumulative distributions of differential splicing test P values (1-posterior for MAJIQ) for the comparison of all YRI versus CEU LCLs (red). The distribution of test P values for the permuted comparisons are also shown (black). *Cuffinks2 reports 19 signiffcantly differentially spliced genes in the 3 vs. 3 comparison, but none in the other comparisons.
Receiver operating characteristic (ROC) curves of LeafCutter, Cuffinks2, rMATS and MAJIQ for evaluation of differential splicing of genes with transcripts simulated to have varying levels of differential expression. Top panel shows ROC curves when excluding genes that were not tested by each respective methods. The bottom plot includes genes that were not tested in the calculation of true positive rate.
LeafCutter is effective even with as few as 8 samples. Here we performed differential splicing analysis of 4 male brain vs 4 male muscle samples, and compared to results using 220 samples. a) p-values under permutations are well-calibrated. b-c) p-values and effect sizes are highly correlated between the two sample size datasets. d) Signiffcant disparity in effect sizes between the two sample sizes is primarily driven by an intron being unique to a tissue when N = 8.
Hierarchical clustering on all 1,258 introns that had no missing values in any of the samples.
We restricted to introns that were found to be differentially excised between human tissues (P value < 10−10 and effect size > 1:0).
Sharing of sQTL discoveries between Cuffinks2, Altrans, and LeafCutter estimated using Storey's π 0 method.
Meta-cluster representation of position of all 4,543 sQTLs identiffed at 1% FDR.
Functional enrichment of 4,543 sQTLs identified at 1% FDR from CEU GEUVADIS data. Bars represent the 95% confidence interval from 500 bootstraps.
Example of a shared sQTL.
Example of a tissue-specific sQTL.
About this article
Cite this article
Li, Y.I., Knowles, D.A., Humphrey, J. et al. Annotation-free quantification of RNA splicing using LeafCutter. Nat Genet 50, 151–158 (2018). https://doi.org/10.1038/s41588-017-0004-9
Subcutaneous and intramuscular fat transcriptomes show large differences in network organization and associations with adipose traits in pigs
Science China Life Sciences (2021)
Hearing impairment due to Mir183/96/182 mutations suggests both loss-of-function and gain-of-function effects
Disease Models & Mechanisms (2021)
Nucleic Acids Research (2021)
Identification and analysis of splicing quantitative trait loci across multiple tissues in the human genome
Nature Communications (2021)
Nature Communications (2021)