Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Annotation-free quantification of RNA splicing using LeafCutter

Abstract

The excision of introns from pre-mRNA is an essential step in mRNA processing. We developed LeafCutter to study sample and population variation in intron splicing. LeafCutter identifies variable splicing events from short-read RNA-seq data and finds events of high complexity. Our approach obviates the need for transcript annotations and circumvents the challenges in estimating relative isoform or exon usage in complex splicing events. LeafCutter can be used both to detect differential splicing between sample groups and to map splicing quantitative trait loci (sQTLs). Compared with contemporary methods, our approach identified 1.4–2.1 times more sQTLs, many of which helped us ascribe molecular effects to disease-associated variants. Transcriptome-wide associations between LeafCutter intron quantifications and 40 complex traits increased the number of associated disease genes at a 5% false discovery rate by an average of 2.1-fold compared with that detected through the use of gene expression levels alone. LeafCutter is fast, scalable, easy to use, and available online.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: Overview of LeafCutter.
Fig. 2: LeafCutter discovers reproducible unannotated introns.
Fig. 3: A comparison of methods for detecting differential splicing.
Fig. 4: LeafCutter sQTLs augment the interpretation of GWAS hits.
Fig. 5: LeafCutter sQTLs enable interpretation of disease variants.

References

  1. 1.

    Han, H. et al. MBNL proteins repress ES-cell-specific alternative splicing and reprogramming. Nature 498, 241–245 (2013).

    CAS  Article  Google Scholar 

  2. 2.

    Calarco, J. A. et al. Regulation of vertebrate nervous system alternative splicing and development by an SR-related protein. Cell 138, 898–910 (2009).

    CAS  Article  Google Scholar 

  3. 3.

    Brett, D., Pospisil, H., Valcárcel, J., Reich, J. & Bork, P. Alternative splicing and genome complexity. Nat. Genet. 30, 29–30 (2002).

    CAS  Article  Google Scholar 

  4. 4.

    Pai, A. A. et al. Widespread shortening of 3′ untranslated regions and increased exon inclusion are evolutionarily conserved features of innate immune responses to infection. PLoS Genet. 12, e1006338 (2016).

    Article  Google Scholar 

  5. 5.

    Trapnell, C. et al. Differential analysis of gene regulation at transcript resolution with RNA-seq. Nat. Biotechnol. 31, 46–53 (2013).

    CAS  Article  Google Scholar 

  6. 6.

    Leng, N. et al. EBSeq: an empirical Bayes hierarchical model for inference in RNA-seq experiments. Bioinformatics 29, 1035–1043 (2013).

    CAS  Article  Google Scholar 

  7. 7.

    Patro, R., Mount, S. M. & Kingsford, C. Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms. Nat. Biotechnol. 32, 462–464 (2014).

    CAS  Article  Google Scholar 

  8. 8.

    Bray, N., Pimentel, H., Melsted, P. & Pachter, L. Near-optimal RNA-Seq quantification. Preprint available at https://arxiv.org/abs/1505.02710 (2015).

  9. 9.

    Katz, Y., Wang, E. T., Airoldi, E. M. & Burge, C. B. Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nat. Methods 7, 1009–1015 (2010).

    CAS  Article  Google Scholar 

  10. 10.

    Anders, S., Reyes, A. & Huber, W. Detecting differential usage of exons from RNA-seq data. Genome Res. 22, 2008–2017 (2012).

    CAS  Article  Google Scholar 

  11. 11.

    Lacroix, V., Sammeth, M., Guigo, R. & Bergeron, A. Exact transcriptome reconstruction from short sequence reads. In Algorithms in Bioinformatics (eds. Crandall, K.A. & Lagergren, J.) 50–63 (Springer, Berlin, Heidelberg, 2008).

  12. 12.

    Vaquero-Garcia, J. et al. A new view of transcriptome complexity and regulation through the lens of local splicing variations. eLife 5, e11752 (2016).

    Article  Google Scholar 

  13. 13.

    Stein, S., Lu, Z. X., Bahrami-Samani, E., Park, J. W. & Xing, Y. Discover hidden splicing variations by mapping personal transcriptomes to personal genomes. Nucleic Acids Res. 43, 10612–10622 (2015).

    CAS  Article  Google Scholar 

  14. 14.

    Zhao, K., Lu, Z. X., Park, J. W., Zhou, Q. & Xing, Y. GLiMMPS: robust statistical model for regulatory variation of alternative splicing using RNA-seq data. Genome Biol. 14, R74 (2013).

    Article  Google Scholar 

  15. 15.

    Monlong, J., Calvo, M., Ferreira, P. G. & Guigó, R. Identification of genetic variants associated with alternative splicing using sQTLseekeR. Nat. Commun. 5, 4698 (2014).

    CAS  Article  Google Scholar 

  16. 16.

    Ongen, H. & Dermitzakis, E. T. Alternative splicing QTLs in European and African populations. Am. J. Hum. Genet. 97, 567–575 (2015).

    CAS  Article  Google Scholar 

  17. 17.

    Li, Y. I. et al. RNA splicing is a primary link between genetic variation and disease. Science 352, 600–604 (2016).

    CAS  Article  Google Scholar 

  18. 18.

    Tilgner, H. et al. Deep sequencing of subcellular RNA fractions shows splicing to be predominantly co-transcriptional in the human genome but inefficient for lncRNAs. Genome Res. 22, 1616–1625 (2012).

    CAS  Article  Google Scholar 

  19. 19.

    Wu, J., Anczuków, O., Krainer, A. R., Zhang, M. Q. & Zhang, C. OLego: fast and sensitive mapping of spliced mRNA-Seq reads using small seeds. Nucleic Acids Res. 41, 5149–5163 (2013).

    CAS  Article  Google Scholar 

  20. 20.

    GTEx Consortium. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348, 648–660 (2015).

    CAS  Article  Google Scholar 

  21. 21.

    Soumillon, M. et al. Cellular source and mechanisms of high transcriptome complexity in the mammalian testis. Cell Rep. 3, 2179–2190 (2013).

    CAS  Article  Google Scholar 

  22. 22.

    Kaessmann, H. Origins, evolution, and phenotypic impact of new genes. Genome Res. 20, 1313–1326 (2010).

    CAS  Article  Google Scholar 

  23. 23.

    Nellore, A. et al. Human splicing diversity and the extent of unannotated splice junctions across human RNA-seq samples on the Sequence Read Archive. Genome Biol. 17, 266 (2016).

    Article  Google Scholar 

  24. 24.

    Shen, S. et al. rMATS: robust and flexible detection of differential alternative splicing from replicate RNA-Seq data. Proc. Natl. Acad. Sci. USA 111, E5593–E5601 (2014).

    CAS  Article  Google Scholar 

  25. 25.

    Merkin, J., Russell, C., Chen, P. & Burge, C. B. Evolutionary dynamics of gene and isoform regulation in mammalian tissues. Science 338, 1593–1599 (2012).

    CAS  Article  Google Scholar 

  26. 26.

    Barbosa-Morais, N. L. et al. The evolutionary landscape of alternative splicing in vertebrate species. Science 338, 1587–1593 (2012).

    CAS  Article  Google Scholar 

  27. 27.

    Reyes, A. et al. Drift and conservation of differential exon usage across tissues in primate species. Proc. Natl. Acad. Sci. USA 110, 15377–15382 (2013).

    CAS  Article  Google Scholar 

  28. 28.

    Ongen, H., Buil, A., Brown, A. A., Dermitzakis, E. T. & Delaneau, O. Fast and efficient QTL mapper for thousands of molecular phenotypes. Bioinformatics 32, 1479–1485 (2016).

    CAS  Article  Google Scholar 

  29. 29.

    Lappalainen, T. et al. Transcriptome and genome sequencing uncovers functional variation in humans. Nature 501, 506–511 (2013).

    CAS  Article  Google Scholar 

  30. 30.

    Hsiao, Y. H. et al. Alternative splicing modulated by genetic variants demonstrates accelerated evolution regulated by highly conserved proteins. Genome Res. 26, 440–450 (2016).

    CAS  Article  Google Scholar 

  31. 31.

    Barbeira, A.N. et al. Exploring the phenotypic consequences of tissue specific gene expression variation inferred from GWAS summary statistics. Preprint available at https://www.biorxiv.org/content/early/2017/10/03/045260 (2017).

  32. 32.

    Orozco, G. et al. Association of CD40 with rheumatoid arthritis confirmed in a large UK case-control study. Ann. Rheum. Dis. 69, 813–816 (2010).

    CAS  Article  Google Scholar 

  33. 33.

    van de Geijn, B., McVicker, G., Gilad, Y. & Pritchard, J. K. WASP: allele-specific software for robust molecular quantitative trait locus discovery. Nat. Methods 12, 1061–1063 (2015).

    Article  Google Scholar 

  34. 34.

    Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).

    CAS  Article  Google Scholar 

  35. 35.

    Wheeler, H. E. et al. Survey of the heritability and sparse architecture of gene expression traits across human tissues. PLoS Genet 12, e1006423 (2016).

    Article  Google Scholar 

  36. 36.

    Gamazon, E. R. et al. A gene-based association method for mapping traits using reference transcriptome data. Nat. Genet. 47, 1091–1098 (2015).

    CAS  Article  Google Scholar 

  37. 37.

    Ellis, S.E., Collado Torres, L. & Leek, J. Improving the value of public RNA-seq expression data by phenotype prediction. Preprint available at http://www.biorxiv.org/content/early/2017/06/03/145656.full.pdf (2017).

Download references

Acknowledgements

We thank X. Lan and other members of the Pritchard lab for helpful discussions and comments. This work was supported by a CEHG fellowship (Y.I.L.), the Howard Hughes Medical Institute (J.K.P.), and the US National Institutes of Health (NIH grants HG007036, HG008140, and HG009431 to J.K.P., and MH107666 to H.K.I.).

Author information

Affiliations

Authors

Contributions

Y.I.L., D.A.K., and J.K.P. conceived of the project. Y.I.L. and D.A.K. performed the analyses and implemented the software. D.A.K. developed and performed the statistical tests and modeling. J.H. implemented the visualization application. A.N.B., S.P.D., and H.K.I. performed the S-PrediXcan analyses. Y.I.L. and J.K.P. wrote the manuscript.

Corresponding authors

Correspondence to Yang I. Li or David A. Knowles or Jonathan K. Pritchard.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Integrated Supplementary Information

Supplementary Figure 1

Several types of common alternatively splicing events are captured by the alternative excision of introns.

Supplementary Figure 2

Bar plots showing the number of alternatively used junctions annotated from our GTEx analyses that were found in Intropolis6. phenopredict8 was used to predict the tissue type corresponding to the SRA samples analyzed in Intropolis. For each set of junctions, the proportion of junctions that were found (at least 1 read) in any SRA sample (Any), or found in samples which were predicted to be from testis (Testis) are highlighted. The predicted tissues with the highest number of supported junctions are colored in purple. Eighty-six percent of all novel alternatively used testis junctions from our LeafCutter analysis could be found in testis samples within SRA (not including GTEx).

Supplementary Figure 3

Junctions in GTEx tissues. (a) Distribution of the number of different GTEx tissues in which junctions predicted to be absent, or present in three commonly-used annotation databases, could be detected. (b) Relative junction usage in multiple GTEx organs of annotated and unannotated junctions identiffed in four GTEx organs. (c) Distribution of LeafCutter clusters from GTEx samples in terms of their splicing types. Clusters with only annotated junctions and clusters with unannotated junctions were further separated.

Supplementary Figure 4

PhastCons score distribution of splice site of novel introns. While 60% of annotated splice sites have local phastCons score >0.6, only 15-25% of unannotated splice sites do. Thus 80% of novel splice sites may represent noisy intron excision events.

Supplementary Figure 5

Comparison between beta-binomial and Dirichlet-multinomial models for differential splicing analyses, performed on 10 male brain vs. heart samples from GTEx. Two approaches for combining per-intron p-values into cluster level introns are compared: Bonferroni correction and Fisher's combined test. Bonferroni is very conservative, as expected. Fisher's combined test has considerably lower power than the multinomial approaches. However, only v2 of the Dirichlet-multinomial (which uses a per intron concentration/overdispersion parameter) is well calibrated under permutations.

Supplementary Figure 6

Memory usage (RAM) of four differential splicing methods applied to comparisons between 3, 5, 10, and 15 YRI versus CEU LCLs RNA-seq samples. We omitted the 15v15 MAJIQ run due to its expensive resource usage (both in terms of time and RAM). Right panel shows usage in log scale.

Supplementary Figure 7

Cumulative distributions of differential splicing test P values (1-posterior for MAJIQ) for the comparison of all YRI versus CEU LCLs (red). The distribution of test P values for the permuted comparisons are also shown (black). *Cuffinks2 reports 19 signiffcantly differentially spliced genes in the 3 vs. 3 comparison, but none in the other comparisons.

Supplementary Figure 8

Receiver operating characteristic (ROC) curves of LeafCutter, Cuffinks2, rMATS and MAJIQ for evaluation of differential splicing of genes with transcripts simulated to have varying levels of differential expression. Top panel shows ROC curves when excluding genes that were not tested by each respective methods. The bottom plot includes genes that were not tested in the calculation of true positive rate.

Supplementary Figure 9

LeafCutter is effective even with as few as 8 samples. Here we performed differential splicing analysis of 4 male brain vs 4 male muscle samples, and compared to results using 220 samples. a) p-values under permutations are well-calibrated. b-c) p-values and effect sizes are highly correlated between the two sample size datasets. d) Signiffcant disparity in effect sizes between the two sample sizes is primarily driven by an intron being unique to a tissue when N = 8.

Supplementary Figure 10

Hierarchical clustering on all 1,258 introns that had no missing values in any of the samples.

Supplementary Figure 11

We restricted to introns that were found to be differentially excised between human tissues (P value < 10−10 and effect size > 1:0).

Supplementary Figure 12

Sharing of sQTL discoveries between Cuffinks2, Altrans, and LeafCutter estimated using Storey's π 0 method.

Supplementary Figure 13

Meta-cluster representation of position of all 4,543 sQTLs identiffed at 1% FDR.

Supplementary Figure 14

Functional enrichment of 4,543 sQTLs identified at 1% FDR from CEU GEUVADIS data. Bars represent the 95% confidence interval from 500 bootstraps.

Supplementary Figure 15

Example of a shared sQTL.

Supplementary Figure 16

Example of a tissue-specific sQTL.

Supplementary information

Supplementary Figures and Supplementary Note

Supplementary Figures 1–16 and Supplementary Note 1

Life Sciences Reporting Summary

Supplementary Dataset 1

List of genes associated through RNA expression or splicing with S-PrediXcan

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Li, Y.I., Knowles, D.A., Humphrey, J. et al. Annotation-free quantification of RNA splicing using LeafCutter. Nat Genet 50, 151–158 (2018). https://doi.org/10.1038/s41588-017-0004-9

Download citation

Further reading

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing