Cancers are caused by genomic alterations known as drivers. While hundreds of drivers in coding genes are known, only a handful of non-coding drivers have been discovered to date despite intensive searching1,2. Attention has recently shifted to the role of altered RNA splicing in cancer; driver mutations that lead to transcriptome-wide aberrant splicing have been identified in multiple cancer types, although they have only been found in protein-coding splicing factors like SF3B1 (splicing factor 3b subunit 1)3–6. In contrast, cancer-related alterations in the non-coding component of the spliceosome, a series of small nuclear RNAs (snRNAs), have barely been studied due to the combined challenges of characterizing non-coding cancer drivers and the repetitive nature of snRNA genes1,7,8. Here we report a highly recurrent A>C somatic mutation at the third base of U1 snRNA in several tumour types. The primary function of U1 is to recognize the 5′ splice site (5′SS) via base-pairing. This mutation changes the preferential A-U base-pairing between U1 and 5′SS to C-G base-pairing, thereby creating novel splice junctions and altering the splicing pattern of multiple genes, including known cancer drivers. Clinically, the A>C mutation is associated with alcohol abuse in hepatocellular carcinoma (HCC) and the aggressive IGHV unmutated subtype of chronic lymphocytic leukaemia (CLL). The U1 mutation also confers an adverse prognosis to CLL patients independently. Our study demonstrates one of the first non-coding drivers in spliceosomal RNAs, reveals a novel mechanism of aberrant splicing in cancer and may represent a new target for treatment. Our findings also suggest that driver discovery should be extended to a wider range of genomic regions.
Subscribe to Journal
Get full journal access for 1 year
only $3.90 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
This file contains Supplementary Notes 1-4.
Uncropped gel images for Extended Data Fig. 4b.
Summary of patient characteristics. This table provides ICGC donor ID, tumour cohort, project code, sex, age, diagnosis (ICD-10) and histology information for patients used in this study.
U1 snRNA mutations identified in seven canonical U1 genes. This table provides 277 somatic mutations identified in any of the seven canonical U1 genes for PCAWG patients via WGS. Related to Fig. 1.
Consensus U1 g.3A>C status for CLL and HCC. This table provides data availability, WGS-based genotyping, transcriptome-based inference, rhAMP results and consensus genotyping information for each of the CLL (n = 318) and HCC (n = 613) donors. Related to Extended Data Fig. 1.
Differentially spliced introns and expressed genes in CLL and HCC. This table provides differentially spliced introns identified by LeafCutter (Methods) and differentially expressed genes identified by limma (Methods) for CLL (11 U1-MUT vs 254 WT; biologically independent patients), HCC (20 U1-MUT vs 367 WT; biologically independent patients) and CLL cell lines (n = 3 biological independent cell lines; each cell line has a MUT and WT form). P is nominal p-values; q is BH-adjusted p-values. Related to Fig. 2, Extended Data Fig. 2 and 6.
Gene sets enriched in U1 mutated CLL and HCC. This table provides significant gene sets identified by the Gene Set Enrichment Analysis (GSEA) and g:Profiler for primary CLL, HCC and CLL cell lines. Both nominal p-values and multiple comparisons corrected p-values from GSEA are provided. Related to Extended Data Fig. 5.
Primers used in rhAMP and RT-PCR experiments. This table provides all primer sequences used in the experimental validation of CLL.