The U1 spliceosomal RNA is recurrently mutated in multiple cancers

Article metrics

Abstract

Cancers are caused by genomic alterations known as drivers. Hundreds of drivers in coding genes are known but, to date, only a handful of noncoding drivers have been discovered—despite intensive searching1,2. Attention has recently shifted to the role of altered RNA splicing in cancer; driver mutations that lead to transcriptome-wide aberrant splicing have been identified in multiple types of cancer, although these mutations have only been found in protein-coding splicing factors such as splicing factor 3b subunit 1 (SF3B1)3,4,5,6. By contrast, cancer-related alterations in the noncoding component of the spliceosome—a series of small nuclear RNAs (snRNAs)—have barely been studied, owing to the combined challenges of characterizing noncoding cancer drivers and the repetitive nature of snRNA genes1,7,8. Here we report a highly recurrent A>C somatic mutation at the third base of U1 snRNA in several types of tumour. The primary function of U1 snRNA is to recognize the 5′ splice site via base-pairing. This mutation changes the preferential A–U base-pairing between U1 snRNA and the 5′ splice site to C–G base-pairing, and thus creates novel splice junctions and alters the splicing pattern of multiple genes—including known drivers of cancer. Clinically, the A>C mutation is associated with heavy alcohol use in patients with hepatocellular carcinoma, and with the aggressive subtype of chronic lymphocytic leukaemia with unmutated immunoglobulin heavy-chain variable regions. The mutation in U1 snRNA also independently confers an adverse prognosis to patients with chronic lymphocytic leukaemia. Our study demonstrates a noncoding driver in spliceosomal RNAs, reveals a mechanism of aberrant splicing in cancer and may represent a new target for treatment. Our findings also suggest that driver discovery should be extended to a wider range of genomic regions.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: Overview of somatic mutations in U1.
Fig. 2: Global gene splicing and expression changes associated with the g.3A>C mutation.
Fig. 3: Cancer-related genes that are mis-spliced in CLL with U1 mutations.
Fig. 4: Driver alterations and clinical features related to the g.3A>C mutation.

Code availability

All published computational programs used in this study are indicated in corresponding sections. Scripts used to perform U1 snRNA mutational calling are available at https://github.com/smshuai/U1-snRNA.

Data availability

PCAWG data are available at ICGC DCC (https://docs.icgc.org/pcawg/data/; donor identifiers in Supplementary Table 1). Additional CLL data (donor identifiers in Supplementary Table 3) are available at ICGC DCC (https://dcc.icgc.org/releases/release_27/Projects/CLLE-ES) and EGA (raw data under accession numbers EGAS00001000374 and EGAS00001001306). Additional HCC data are available at GDC Data Portal (raw and processed data under project code TCGA-LIHC; donor identifiers in Supplementary Table 3). CLL cell line RNA-seq data are available at GSE134197.

References

  1. 1.

    Khurana, E. et al. Role of non-coding sequence variants in cancer. Nat. Rev. Genet. 17, 93–108 (2016).

  2. 2.

    Shuai, S., Gallinger, S. & Stein, L. D. DriverPower: combined burden and functional impact tests for cancer driver discovery. Preprint at https://www.biorxiv.org/content/10.1101/215244v1 (2017).

  3. 3.

    Yoshida, K. et al. Frequent pathway mutations of splicing machinery in myelodysplasia. Nature 478, 64–69 (2011).

  4. 4.

    Wang, L. et al. SF3B1 and other novel cancer genes in chronic lymphocytic leukemia. N. Engl. J. Med. 365, 2497–2506 (2011).

  5. 5.

    Seiler, M. et al. Somatic mutational landscape of splicing factor genes and their functional consequences across 33 cancer types. Cell Reports 23, 282–296.e4 (2018).

  6. 6.

    Quesada, V. et al. Exome sequencing identifies recurrent mutations of the splicing factor SF3B1 gene in chronic lymphocytic leukemia. Nat. Genet. 44, 47–52 (2011).

  7. 7.

    Denison, R. A., Van Arsdell, S. W., Bernstein, L. B. & Weiner, A. M. Abundant pseudogenes for small nuclear RNAs are dispersed in the human genome. Proc. Natl Acad. Sci. USA 78, 810–814 (1981).

  8. 8.

    Manser, T. & Gesteland, R. F. Human U1 loci: genes for human U1 RNA have dramatically similar genomic environments. Cell 29, 257–264 (1982).

  9. 9.

    Campbell, P. J., Getz, G., Stuart, J. M., Korbel, J. O. & Stein, L. D. Pan-cancer analysis of whole genomes. Preprint at https://www.bioRxiv.org/content/10.1101/162784v1 (2019).

  10. 10.

    Rheinbay, E. et al. Discovery and characterization of coding and non-coding driver mutations in more than 2,500 whole cancer genomes. Preprint at https://www.bioRxiv.org/content/10.1101/237313v1 (2017).

  11. 11.

    Kondo, Y., Oubridge, C., van Roon, A.-M. M. & Nagai, K. Crystal structure of human U1 snRNP, a small nuclear ribonucleoprotein particle, reveals the mechanism of 5′ splice site recognition. eLife 4, e04986 (2015).

  12. 12.

    Suzuki, H. et al. Recurrent noncoding U1-snRNA mutations drive cryptic splicing in SHH medulloblastoma. Nature https://doi.org/10.1038/s41586-019-1650-0 (2019).

  13. 13.

    Li, Y. I. et al. Annotation-free quantification of RNA splicing using LeafCutter. Nat. Genet. 50, 151–158 (2018).

  14. 14.

    Wang, L. et al. Transcriptomic characterization of SF3B1 mutation reveals its pleiotropic effects in chronic lymphocytic leukemia. Cancer Cell 30, 750–763 (2016).

  15. 15.

    Forbes, S. A. et al. COSMIC: somatic cancer genetics at high-resolution. Nucleic Acids Res. 45, D777–D783 (2017).

  16. 16.

    Herishanu, Y., Gibellini, F., Njuguna, N., Keyvanfar, K. & Wiestner, A. CD44 signaling via PI3K/AKT and MAPK/ERK pathways protects CLL cells from spontaneous and drug induced apoptosis. Blood 112, 541 (2008).

  17. 17.

    Fedorchenko, O. et al. CD44 regulates the apoptotic response and promotes disease development in chronic lymphocytic leukemia. Blood 121, 4126–4136 (2013).

  18. 18.

    Popp, M. W. & Maquat, L. E. Leveraging rules of nonsense-mediated mRNA decay for genome engineering and personalized medicine. Cell 165, 1319–1322 (2016).

  19. 19.

    Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102, 15545–15550 (2005).

  20. 20.

    Darman, R. B. et al. Cancer-associated SF3B1 hotspot mutations induce cryptic 3′ splice site selection through use of a different branch point. Cell Reports 13, 1033–1045 (2015).

  21. 21.

    Kipps, T. J. et al. Chronic lymphocytic leukaemia. Nat. Rev. Dis. Primers 3, 16096 (2017).

  22. 22.

    Nadeu, F. et al. Clinical impact of clonal and subclonal TP53, SF3B1, BIRC3, NOTCH1, and ATM mutations in chronic lymphocytic leukemia. Blood 127, 2122–2130 (2016).

  23. 23.

    Llovet, J. M. et al. Hepatocellular carcinoma. Nat. Rev. Dis. Primers 2, 16018 (2016).

  24. 24.

    Zhang, J. et al. Disease-associated mutation in SRSF2 misregulates splicing by altering RNA-binding affinities. Proc. Natl Acad. Sci. USA 112, E4726–E4734 (2015).

  25. 25.

    Dvinge, H., Kim, E., Abdel-Wahab, O. & Bradley, R. K. RNA splicing factors as oncoproteins and tumour suppressors. Nat. Rev. Cancer 16, 413–430 (2016).

  26. 26.

    Lee, S. C.-W. & Abdel-Wahab, O. Therapeutic targeting of splicing in cancer. Nat. Med. 22, 976–986 (2016).

  27. 27.

    Seiler, M. et al. H3B-8800, an orally available small-molecule splicing modulator, induces lethality in spliceosome-mutant cancers. Nat. Med. 24, 497–504 (2018).

  28. 28.

    Zhang, S. et al. Targeting chronic lymphocytic leukemia cells with a humanized monoclonal antibody specific for CD44. Proc. Natl Acad. Sci. USA 110, 6127–6132 (2013).

  29. 29.

    Puente, X. S. et al. Non-coding recurrent mutations in chronic lymphocytic leukaemia. Nature 526, 519–524 (2015).

  30. 30.

    Harrow, J. et al. GENCODE: the reference human genome annotation for the ENCODE Project. Genome Res. 22, 1760–1774 (2012).

  31. 31.

    Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

  32. 32.

    Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).

  33. 33.

    Shiraishi, Y. et al. An empirical Bayesian framework for somatic mutation detection from cancer genome sequencing data. Nucleic Acids Res. 41, e89 (2013).

  34. 34.

    Robinson, J. T., Thorvaldsdóttir, H., Wenger, A. M., Zehir, A. & Mesirov, J. P. Variant review with the Integrative Genomics Viewer. Cancer Res. 77, e31–e34 (2017).

  35. 35.

    Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).

  36. 36.

    PCAWG Transcriptome Core Group et al. Genomic basis for RNA alterations revealed by whole-genome analyses of 27 cancer types. Preprint at https://www.biorxiv.org/content/10.1101/183889v2 (2018).

  37. 37.

    Anders, S., Pyl, P. T. & Huber, W. HTSeq—a Python framework to work with high-throughput sequencing data. Bioinformatics 31, 166–169 (2015).

  38. 38.

    Bray, N. L., Pimentel, H., Melsted, P. & Pachter, L. Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol. 34, 525–527 (2016).

  39. 39.

    Ewels, P., Magnusson, M., Lundin, S. & Käller, M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics 32, 3047–3048 (2016).

  40. 40.

    Wang, L., Wang, S. & Li, W. RSeQC: quality control of RNA-seq experiments. Bioinformatics 28, 2184–2185 (2012).

  41. 41.

    Ritchie, M. E. et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43, e47 (2015).

  42. 42.

    Shen, S. et al. rMATS: robust and flexible detection of differential alternative splicing from replicate RNA-seq data. Proc. Natl Acad. Sci. USA 111, E5593–E5601 (2014).

  43. 43.

    van der Maaten, L. & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008).

  44. 44.

    Reimand, J. et al. g:Profiler–a web server for functional interpretation of gene lists (2016 update).Nucleic Acids Res. 44, W83–W89 (2016).

  45. 45.

    Ashburner, M. et al. Gene ontology: tool for the unification of biology. Nat. Genet. 25, 25–29 (2000).

  46. 46.

    Kanehisa, M. & Goto, S. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28, 27–30 (2000).

  47. 47.

    Croft, D. et al. Reactome: a database of reactions, pathways and biological processes. Nucleic Acids Res. 39, D691–D697 (2011).

  48. 48.

    Liberzon, A. et al. Molecular signatures database (MSigDB) 3.0. Bioinformatics 27, 1739–1740 (2011).

  49. 49.

    Zhang, Y. et al. Whole genome and RNA sequencing of 1,220 cancers reveals hundreds of genes deregulated by rearrangement of cis-regulatory elements. Preprint at https://www.biorxiv.org/content/10.1101/099861v3 (2017).

  50. 50.

    Fujimoto, A. et al. Whole-genome mutational landscape and characterization of noncoding and structural mutations in liver cancer. Nat. Genet. 48, 500–509 (2016).

  51. 51.

    The Cancer Genome Atlas Research Network. Comprehensive and integrative genomic characterization of hepatocellular carcinoma. Cell 169, 1327–1341.e23 (2017).

  52. 52.

    Liu, J. et al. An integrated TCGA pan-cancer clinical data resource to drive high-quality survival outcome analytics. Cell 173, 400–416.e11 (2018).

Download references

Acknowledgements

The authors acknowledge the use of pre-embargo whole-genome sequencing alignment data from the PCAWG project, approved by the PCAWG Steering Committee (with L.D.S. recused). This work was supported by the Government of Ontario (S.S. and L.D.S.), the Instituto de Salud Carlos III (project PMP15/00007; to F.N., M.P., J.D., X.S.P., C.L.-O. and E.C.), the ‘la Caixa’ Foundation Grant No HR17-00221 (Health Research 2017 Program; to F.N., M.P., J.D., X.S.P., C.L.-O. and E.C.) and the Ministerio de Economía y Competitividad (MINECO) SAF2013-45836-R (to X.S.P., A.D.-N. and A.G.-F.). A.D.-N. is supported by the Department of Education of the Basque Government (grant number PRE_2017_1_0100). E.C. is supported by ICREA under the ICREA Academia programme. F.N. is supported by a pre-doctoral fellowship of the Ministerio de Economía y Competitividad (MINECO, BES-2016-076372). H.S. is a recipient of a Research Fellowship (Astellas Foundation for Research on Metabolic Disorders).

Author information

S.S. and L.D.S. designed the experiments, interpreted results and prepared the manuscript with inputs from all authors. S.S. and H.S. performed primary tumour whole-genome sequencing and RNA-seq analysis (Figs. 1, 2, Extended Data Figs. 1, 2). S.A.K. and F.N. conducted rhAmp (Extended Data Fig. 1d) and CLL RT–qPCR experiments (Extended Data Fig. 4). S.S. and L.D.S. performed pathway and gene-set analysis (Fig. 3, Extended Data Fig. 3, 5), comparison with SF3B1 (Extended Data Fig. 8), and clinical and driver analysis in the HCC cohort (Fig. 4b–d, f, g, Extended Data Figs. 9, 10d–g). F.N. and J.D. performed clinical and driver analysis in the CLL cohort (Fig. 4a, e, Extended Data Figs. 7, 10a–c). X.S.P., A.G.-F. and A.D-N. conducted cell line experiments (Extended Data Fig. 6; data analysis of Extended Data Fig. 6b–h was performed by S.S.). E.C. and C.L.-O. assembled the cohorts and co-directed earlier studies that produced the CLL genomic and transcriptomic data used in Figs. 1–4, Extended Data Fig. 2–8. E.C., M.P., J.D. and C.L.-O. provided CLL tissue samples and the corresponding donor clinical data used in Fig. 4a, e, Extended Data Figs. 7, 10a–c. L.D.S., E.C. and M.D.T. supervised the project. All authors read, had the opportunity to comment on and have approved the manuscript.

Correspondence to Lincoln D. Stein.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Peer review information Nature thanks Rotem Karni, Brandon Wainwright and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Extended data figures and tables

Extended Data Fig. 1 Overview of genotyping methods and results.

a, Computational pipeline used to identify somatic mutations in U1 genes from whole-genome sequencing data. b, Computational pipeline used to infer the U1 g.3A>C status on the basis of splicing patterns. c, Computational pipeline used to infer the U1 g.3A>C status on the basis of expression patterns. d, e, U1 g.3A>C status for 318 CLL (d) and 613 HCC samples (e). Consensus genotypes are used in downstream analysis. One hundred and three undetermined HCC samples are excluded from downstream analysis. Grey indicates data not available.

Extended Data Fig. 2 Global transcriptomic changes in CLL and HCC.

a, Category of mis-spliced introns. Black boxes indicate exons and lines indicate introns; red lines indicate actual splice junctions. A junction is described as ‘annotated’ if the junction matches any annotated introns. A junction is described as ‘novel annotated’ if both splice sites are annotated but not paired. A cryptic 5′ splice site junction exists if only the 3′ splice site is annotated. A cryptic 3′ splice site junction exists if only the 5′ splice site is annotated. b, Euler plots of mis-spliced intron clusters in CLL and HCC. Increased excision and decreased excision represent intron clusters that have significantly mis-spliced introns with ΔPSI > 0 and ΔPSI < 0, respectively. c, 5′ splice site for introns with increased (n = 239) or decreased excision (n = 294) in HCC with U1 mutation. Top, bar plot shows Benjamini–Hochberg adjusted P values from χ2 tests; red line indicates q = 0.1. d, Category of mis-splicing events in HCC with U1 mutation. e, P value quantile–quantile plots for differential expression analysis. Global gene-expression changes cannot be detected in the randomized dataset. P values are from limma. Biologically independent patient samples are used for CLL (11 cases with U1 mutation versus 254 cases with wild-type U1) and HCC (20 cases with U1 mutation versus 367 cases with wild-type U1). f, g, t-SNE plots showing the accuracy of transcriptome-based inferences. f, t-SNE coordinates are computed from differentially spliced introns, and predictions are made with the splicing-based classifier. g, t-SNE coordinates are computed from differentially expressed genes, and predictions are made with the expression-based classifier.

Extended Data Fig. 3 Premature termination codon-bearing mis-spliced isoforms for MSI2, POLD1 and ABCD3 in CLL with U1 mutation.

a, Coding sequence of MSI2 (ENST00000284073.2), POLD1 (ENST00000440232.2) and ABCD3 (ENST00000370214.4) in CLL. Annotated exons are numbered. Positions 1 to 3 indicate the start codon. The first stop codon in each cryptic exon is labelled as the premature termination codon (PTC). The last exon junction complex (EJC) is also labelled. The distance between premature termination codon and the last exon junction complex is >55 nt in all isoforms with cryptic exons. b, Expression box plots for three of the transcripts that bear premature termination codon, shown in a. P values are from two-sided Wilcoxon rank-sum tests. MUT (n = 6) and WT (n = 61) represent independent CLL samples with or without the g.3A>C mutation, respectively. In the box plot, centre line, box limits, whiskers and points represent the median, 25th and 75th percentiles, 1.5× interquartile range and individual samples, respectively. c, Sashimi plot for mis-splicing of ABCD3. This sashimi plot is the same as those in Fig. 3. d, Splice score of ABCD3 across all CLL samples. The ABCD3 splice score is calculated from RNA-seq data (Methods). e, Using the ABCD3 splice score as a marker for U1 mutation in CLL. The ABCD3 splice score can be used to predict the U1 g.3A>C status with 100% accuracy in CLL.

Extended Data Fig. 4 Validation of mis-splicing in MSI2, POLD1, CD44 and ABCD3.

a, Junctions used in RT–qPCR validations. For MSI2, POLD1 and ABCD3, one annotated junction and one cryptic junction are used. For CD44, one CD44s junction and one CD44v junction are used. Cryptic and CD44v junctions should be over-excised in samples with U1 mutation. b, RT–PCR result for junctions shown in a. MUT (n = 6) and WT (n = 8) represent independent CLL samples with and without the g.3A>C mutation, respectively. GAPDH is used as control. The length of all PCR products is shown on the right. The experiment was conducted once. For gel source data, see Supplementary Fig. 1. c, RT–qPCR result for junctions shown in a. Expression of cryptic and CD44v junctions are significantly higher in CLL samples with U1 mutation, as expected. P values are from two-sided Wilcoxon rank-sum tests. MUT (n = 5) and WT (n = 8) represent biologically independent CLL samples with and without the g.3A>C mutation, respectively.

Extended Data Fig. 5 Genes and pathways altered in CLL with U1 mutation.

a, c, Representative upregulated (a) and downregulated (c) gene sets in CLL with U1 mutation (n = 11 independent patients). For a, c, enrichment score (ES), normalized enrichment score (NES) and false-discovery rate (FDR) q are from the permutation test of GSEA. Genes in the x axis are sorted from the most significantly upregulated gene to the most significantly downregulated gene. The dashed line indicates fold change = 1. b, Expression box plots for TERF2 and TERT in the telomere maintenance pathway. q values are from limma. MUT (n = 11) and WT (n = 254) represent biologically independent CLL samples with and without the g.3A>C mutation, respectively. d, e, Expression heat map of ribosomal genes and BCR genes. d, Only the top 10 differentially expressed ribosomal genes are shown. e, Right, bar charts show the log2-transformed fold change and −log10(q value) from limma (n = 11 independent samples of CLL with U1 mutation; n = 254 independent samples of CLL with wild-type U1). f, Expression box plot for TP53. TP53 MUT/DEL (n = 15 independent patients) represents CLL samples with TP53 mutation and/or deletion; U1 MUT (n = 10 independent patients) represents samples with the g.3A>C mutation; WT (n = 273 independent patients) represents the rest of the CLL samples. One sample with both TP53 and g.3A>C mutation is excluded. P values are from two-sided Wilcoxon rank-sum tests. g, Expression box plots for genes in the NF-κB (CHUK and IKBKB), PI3K/AKT (PIK3CA and AKT1) and MAPK/ERK (JUN) pathways. These pathways are downstream effectors of the BCR pathways. q values are from limma. MUT (n = 11) and WT (n = 254) are the same as in b. In all box plots, centre line, box limits, whiskers and points represent median, 25th and 75th percentiles, 1.5× interquartile range and individual samples, respectively.

Extended Data Fig. 6 Exogenous expression of the g.3A>C mutation in cell lines induced transcriptome-wide changes.

a, 5′ RACE confirming the expression of the U1 g.3A>C mutation in three CLL cell lines. CLL cell lines (JVM3, HG3 and MEC1) are infected with lentiviral particles that contain U1 locus with or without the g.3A>C mutation (g.3A>C or WT, respectively). The electropherograms correspond to the sequence of the PCR product (reverse-strand). The arrowheads indicate the location of the 3rd base of U1. The location of the 5′ RACE primer is also indicated. The experiment was conducted once. b, 5′ splice site for introns with increased or decreased excision in CLL cell lines with U1 mutation (n = 3 biological independent cell lines). Top, bar chart shows q values from χ2 tests for base composition difference; red line indicates the q = 0.1 cutoff. c, Category of mis-splicing events in CLL cell lines. Extended Data Figure 2a provides the definition of each category. The number of introns is coloured by the sixth base of 5′ splice site. d, Volcano plot for differential expression analysis of CLL cell lines. q values are from limma. n = 3 biological independent cell lines; each cell line has a form with (MUT) and without (WT) U1 mutation. eg, Euler plots comparing differentially spliced introns (e) and differentially expressed genes (f, g) between primary CLL from patients and cell lines. e, Over-excised introns with the G6 5′ splice site that are direct targets of the U1 mutation are compared. P values are from one-tailed Fisher’s exact test. The expected number of overlaps is also shown. h, Asymmetric changes in gene expression for cell lines. In all three cell lines, more genes are upregulated (log2-transformed fold change > 1) than downregulated (log2-transformed fold change < −1).

Extended Data Fig. 7 Driver events and clinical features of CLL.

Left, oncoprint shows 36 CLL driver alterations across 313 patients with CLL. Clinical classification (CLL or monoclonal B cell lymphocytosis), gender, IGHV status and U1 g.3A>C status are shown on the top. Right, bar plot shows q values from two-sided Fisher’s exact test that compares each alteration with the U1 g.3A>C status.

Extended Data Fig. 8 Consequence of SF3B1 mutation and its relation to the U1 g.3A>C mutation in CLL.

a, Category of mis-splicing events in CLL with SF3B1 mutation. Unlike the g.3A>C mutation, SF3B1 mutation induces more cryptic 3′ splice site changes. b, c, Sequence motifs of 5′ splice site (b) and 3′ splice site (c) for mis-spliced introns in CLL with SF3B1 mutation (n = 6). In b, c, the bar plots on the top show Benjamini–Hochberg adjusted P values from χ2 tests. Red lines indicate the q = 0.1 cutoff. d, e, Intron retention (d) and exon skipping (e) events in CLL with U1 mutation or SF3B1 mutation. In d, e, pairwise comparisons are made between three groups of U-CLL: with U1 g.3A>C mutation (U1-MUT, n = 6); with SF3B1 mutation (SF3B1-MUT, n = 6); without U1 or SF3B1 mutation (WT; n = 30). The number of events is counted with respect to corresponding controls (ctrl). f, 5′ splice site motifs for intron retention and exon skipping events in samples with U1 mutation, as compared to wild-type samples. The sequence motif for exon skipping events is from 5′ splice site of the red exon in e.

Extended Data Fig. 9 Major driver events and clinical features of HCC.

Left, this oncoprint shows 15 major driver alterations across all whole-genome-sequenced samples from two HCC projects: LIHC-US (n = 53) and LIRI-JP (n = 129). Project code, gender, risk factor status (heavy alcohol use, or infection with hepatitis B or C virus) and U1 g.3A>C status are shown on the top. White colour in top annotations indicates that data are not available. Right, bar plot shows q values from Cochran–Mantel–Haenszel χ2 tests that compare each alteration with the U1 mutation. Three types of alterations are used in the plot: simple somatic mutations (SSM), structural variations (SV) and somatic copy-number alterations (SCNA).

Extended Data Fig. 10 Clinical analysis of CLL and HCC.

a, Kaplan–Meier plot of overall survival for CLL. P value is from the Cox model. b, Independent prognostic value of the U1 g.3A>C mutation in CLL. Multivariate Cox model results for time to first treatment in 301 patients with CLL (number of events = 153). HR, hazard ratio; CI, confidence interval. c, U1 mutation frequency by disease stage in CLL. A and B/C, Binet A and B/C stages. d, Kaplan–Meier plot of overall survival for HCC samples in the LIRI-JP project. MUT indicates samples with the g.3A>C mutation and WT indicates samples without the g.3A>C mutation; 13 MUT and 116 WT samples are used. e, f, Kaplan–Meier plots of OS (e) and progression-free interval (f) for HCC samples in the LIHC-US project. MUT indicates samples with the g.3A>C mutation and WT indicates samples without the g.3A>C mutation; 15 MUT and 335 WT samples are used. For df, P values are from two-sided log-rank tests; g, U1 mutation frequency by disease stage in HCC. For LIRI-JP, staging uses the Liver Cancer Study Group of Japan system. For LIHC-US, staging uses the American Joint Committee on Cancer system.

Supplementary information

Supplementary Information

This file contains Supplementary Notes 1-4

Reporting Summary

Supplementary Figure 1

Uncropped gel images for Extended Data Fig. 4b

Supplementary Table 1

Summary of patient characteristics. This table provides ICGC donor ID, tumour cohort, project code, sex, age, diagnosis (ICD-10) and histology information for patients used in this study

Supplementary Table 2

U1 snRNA mutations identified in seven canonical U1 genes. This table provides 277 somatic mutations identified in any of the seven canonical U1 genes for PCAWG patients via WGS. Related to Fig. 1

Supplementary Table 3

Consensus U1 g.3A>C status for CLL and HCC. This table provides data availability, WGS-based genotyping, transcriptome-based inference, rhAMP results and consensus genotyping information for each of the CLL (n = 318) and HCC (n = 613) donors. Related to Extended Data Fig. 1

Supplementary Table 4

Differentially spliced introns and expressed genes in CLL and HCC. This table provides differentially spliced introns identified by LeafCutter (Methods) and differentially expressed genes identified by limma (Methods) for CLL (11 U1-MUT vs 254 WT; biologically independent patients), HCC (20 U1-MUT vs 367 WT; biologically independent patients) and CLL cell lines (n = 3 biological independent cell lines; each cell line has a MUT and WT form). P is nominal p-values; q is BH-adjusted p-values. Related to Fig. 2, Extended Data Fig. 2 and 6

Supplementary Table 5

Gene sets enriched in U1 mutated CLL and HCC. This table provides significant gene sets identified by the Gene Set Enrichment Analysis (GSEA) and g:Profiler for primary CLL, HCC and CLL cell lines. Both nominal p-values and multiple comparisons corrected p-values from GSEA are provided. Related to Extended Data Fig. 5

Supplementary Table 6

Primers used in rhAMP and RT-PCR experiments. This table provides all primer sequences used in the experimental validation of CLL

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Shuai, S., Suzuki, H., Diaz-Navarro, A. et al. The U1 spliceosomal RNA is recurrently mutated in multiple cancers. Nature 574, 712–716 (2019) doi:10.1038/s41586-019-1651-z

Download citation

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.