This is an unedited manuscript that has been accepted for publication. Nature Research are providing this early version of the manuscript as a service to our customers. The manuscript will undergo copyediting, typesetting and a proof review before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers apply.

The U1 spliceosomal RNA is recurrently mutated in multiple cancers

Article metrics


Cancers are caused by genomic alterations known as drivers. While hundreds of drivers in coding genes are known, only a handful of non-coding drivers have been discovered to date despite intensive searching1,2. Attention has recently shifted to the role of altered RNA splicing in cancer; driver mutations that lead to transcriptome-wide aberrant splicing have been identified in multiple cancer types, although they have only been found in protein-coding splicing factors like SF3B1 (splicing factor 3b subunit 1)3–6. In contrast, cancer-related alterations in the non-coding component of the spliceosome, a series of small nuclear RNAs (snRNAs), have barely been studied due to the combined challenges of characterizing non-coding cancer drivers and the repetitive nature of snRNA genes1,7,8. Here we report a highly recurrent A>C somatic mutation at the third base of U1 snRNA in several tumour types. The primary function of U1 is to recognize the 5′ splice site (5′SS) via base-pairing. This mutation changes the preferential A-U base-pairing between U1 and 5′SS to C-G base-pairing, thereby creating novel splice junctions and altering the splicing pattern of multiple genes, including known cancer drivers. Clinically, the A>C mutation is associated with alcohol abuse in hepatocellular carcinoma (HCC) and the aggressive IGHV unmutated subtype of chronic lymphocytic leukaemia (CLL). The U1 mutation also confers an adverse prognosis to CLL patients independently. Our study demonstrates one of the first non-coding drivers in spliceosomal RNAs, reveals a novel mechanism of aberrant splicing in cancer and may represent a new target for treatment. Our findings also suggest that driver discovery should be extended to a wider range of genomic regions.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Author information

Correspondence to Lincoln D. Stein.

Supplementary information

Supplementary Information

This file contains Supplementary Notes 1-4.

Reporting Summary

Supplementary Figure 1

Uncropped gel images for Extended Data Fig. 4b.

Supplementary Table 1

Summary of patient characteristics. This table provides ICGC donor ID, tumour cohort, project code, sex, age, diagnosis (ICD-10) and histology information for patients used in this study.

Supplementary Table 2

U1 snRNA mutations identified in seven canonical U1 genes. This table provides 277 somatic mutations identified in any of the seven canonical U1 genes for PCAWG patients via WGS. Related to Fig. 1.

Supplementary Table 3

Consensus U1 g.3A>C status for CLL and HCC. This table provides data availability, WGS-based genotyping, transcriptome-based inference, rhAMP results and consensus genotyping information for each of the CLL (n = 318) and HCC (n = 613) donors. Related to Extended Data Fig. 1.

Supplementary Table 4

Differentially spliced introns and expressed genes in CLL and HCC. This table provides differentially spliced introns identified by LeafCutter (Methods) and differentially expressed genes identified by limma (Methods) for CLL (11 U1-MUT vs 254 WT; biologically independent patients), HCC (20 U1-MUT vs 367 WT; biologically independent patients) and CLL cell lines (n = 3 biological independent cell lines; each cell line has a MUT and WT form). P is nominal p-values; q is BH-adjusted p-values. Related to Fig. 2, Extended Data Fig. 2 and 6.

Supplementary Table 5

Gene sets enriched in U1 mutated CLL and HCC. This table provides significant gene sets identified by the Gene Set Enrichment Analysis (GSEA) and g:Profiler for primary CLL, HCC and CLL cell lines. Both nominal p-values and multiple comparisons corrected p-values from GSEA are provided. Related to Extended Data Fig. 5.

Supplementary Table 6

Primers used in rhAMP and RT-PCR experiments. This table provides all primer sequences used in the experimental validation of CLL.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.