Letter | Published:

Widespread intronic polyadenylation inactivates tumour suppressor genes in leukaemia

Naturevolume 561pages127131 (2018) | Download Citation

Abstract

DNA mutations are known cancer drivers. Here we investigated whether mRNA events that are upregulated in cancer can functionally mimic the outcome of genetic alterations. RNA sequencing or 3′-end sequencing techniques were applied to normal and malignant B cells from 59 patients with chronic lymphocytic leukaemia (CLL)1,2,3. We discovered widespread upregulation of truncated mRNAs and proteins in primary CLL cells that were not generated by genetic alterations but instead occurred by intronic polyadenylation. Truncated mRNAs caused by intronic polyadenylation were recurrent (n = 330) and predominantly affected genes with tumour-suppressive functions. The truncated proteins generated by intronic polyadenylation often lack the tumour-suppressive functions of the corresponding full-length proteins (such as DICER and FOXN3), and several even acted in an oncogenic manner (such as CARD11, MGA and CHST11). In CLL, the inactivation of tumour-suppressor genes by aberrant mRNA processing is substantially more prevalent than the functional loss of such genes through genetic events. We further identified new candidate tumour-suppressor genes that are inactivated by intronic polyadenylation in leukaemia and by truncating DNA mutations in solid tumours4,5. These genes are understudied in cancer, as their overall mutation rates are lower than those of well-known tumour-suppressor genes. Our findings show the need to go beyond genomic analyses in cancer diagnostics, as mRNA events that are silent at the DNA level are widespread contributors to cancer pathogenesis through the inactivation of tumour-suppressor genes.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Data availability

All 3′-seq and RNA-seq data generated and analysed for this study have been deposited in the Gene Expression Omnibus (GEO) database under accession numbers GSE111310 and GSE111793. The code to analyse the data are available at https://bitbucket.org/leslielab/apa_2018/ and the processed data are available in Supplementary Table 1 (for Figs. 1b–d, 2a, 4a, Extended Data Figs. 3 and 4) and Supplementary Table 2 (for Extended Data Fig. 8a), and in the Source Data files (for Figs. 1e, 2c, e, 3a, c, 4b–d, g, Extended Data Figs. 2c, 6j, 7c and 8a). Data on DNA mutations from patients with CLL were provided by D. A. Landau and need to be requested from him. The mutation data on solid cancers were obtained through the MSK cbio portal. The data can be accessed at http://www.cbioportal.org.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  1. 1.

    Lianoglou, S., Garg, V., Yang, J. L., Leslie, C. S. & Mayr, C. Ubiquitously transcribed genes use alternative polyadenylation to achieve tissue-specific expression. Genes Dev. 27, 2380–2396 (2013).

  2. 2.

    Singh, I. et al. Widespread intronic polyadenylation diversifies immune cell transcriptomes. Nat. Commun. 9, 1716 (2018).

  3. 3.

    Landau, D. A. et al. Mutations driving CLL and their evolution in progression and relapse. Nature 526, 525–530 (2015).

  4. 4.

    Cerami, E. et al. The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. Cancer Discov. 2, 401–404 (2012).

  5. 5.

    Davoli, T. et al. Cumulative haploinsufficiency and triplosensitivity drive aneuploidy patterns and shape the cancer genome. Cell 155, 948–962 (2013).

  6. 6.

    Dvinge, H., Kim, E., Abdel-Wahab, O. & Bradley, R. K. RNA splicing factors as oncoproteins and tumour suppressors. Nat. Rev. Cancer 16, 413–430 (2016).

  7. 7.

    Puente, X. S. et al. Whole-genome sequencing identifies recurrent mutations in chronic lymphocytic leukaemia. Nature 475, 101–105 (2011).

  8. 8.

    Quesada, V. et al. Exome sequencing identifies recurrent mutations of the splicing factor SF3B1 gene in chronic lymphocytic leukemia. Nat. Genet. 44, 47–52 (2011).

  9. 9.

    Stilgenbauer, S., Bullinger, L., Lichter, P., Döhner, H. & the German CLL Study Group (GCLLSG). Genetics of chronic lymphocytic leukemia: genomic aberrations and V(H) gene mutation status in pathogenesis and clinical course. Leukemia 16, 993–1007 (2002).

  10. 10.

    Gruber, A. J. et al. A comprehensive analysis of 3′ end sequencing datasets reveals novel polyadenylation signals and the repressive role of heterogeneous ribonucleoprotein C on cleavage and polyadenylation. Genome Res. 26, 1145–1159 (2016).

  11. 11.

    Vorlová, S. et al. Induction of antagonistic soluble decoy receptor tyrosine kinases by intronic polyA activation. Mol. Cell 43, 927–939 (2011).

  12. 12.

    Zarnack, K. et al. Direct competition between hnRNP C and U2AF65 protects the transcriptome from the exonization of Alu elements. Cell 152, 453–466 (2013).

  13. 13.

    Lenz, G. et al. Oncogenic CARD11 mutations in human diffuse large B cell lymphoma. Science 319, 1676–1679 (2008).

  14. 14.

    Bertin, J. et al. CARD11 and CARD14 are novel caspase recruitment domain (CARD)/membrane-associated guanylate kinase (MAGUK) family members that interact with BCL10 and activate NF-κB. J. Biol. Chem. 276, 11877–11882 (2001).

  15. 15.

    Mansouri, L., Papakonstantinou, N., Ntoufa, S., Stamatopoulos, K. & Rosenquist, R. NF-κB activation in chronic lymphocytic leukemia: a point of convergence of external triggers and intrinsic lesions. Semin. Cancer Biol. 39, 40–48 (2016).

  16. 16.

    Rakheja, D. et al. Somatic mutations in DROSHA and DICER1 impair microRNA biogenesis through distinct mechanisms in Wilms tumours. Nat. Commun. 5, 4802 (2014).

  17. 17.

    De Paoli, L. et al. MGA, a suppressor of MYC, is recurrently inactivated in high risk chronic lymphocytic leukemia. Leuk. Lymphoma 54, 1087–1090 (2013).

  18. 18.

    Hurlin, P. J., Steingrìmsson, E., Copeland, N. G., Jenkins, N. A. & Eisenman, R. N. Mga, a dual-specificity transcription factor that interacts with Max and contains a T-domain DNA-binding motif. EMBO J. 18, 7019–7028 (1999).

  19. 19.

    Ogawa, H., Ishiguro, K., Gaubatz, S., Livingston, D. M. & Nakatani, Y. A complex with chromatin modifiers that occupies E2F- and Myc-responsive genes in G0 cells. Science 296, 1132–1136 (2002).

  20. 20.

    Huot, G. et al. CHES1/FOXN3 regulates cell proliferation by repressing PIM2 and protein biosynthesis. Mol. Biol. Cell 25, 554–565 (2014).

  21. 21.

    Vogelstein, B. et al. Cancer genome landscapes. Science 339, 1546–1558 (2013).

  22. 22.

    Nadanaka, S., Kinouchi, H., Taniguchi-Morita, K., Tamura, J. & Kitagawa, H. Down-regulation of chondroitin 4-O-sulfotransferase-1 by Wnt signaling triggers diffusion of Wnt-3a. J. Biol. Chem. 286, 4199–4208 (2011).

  23. 23.

    Schmidt, H. H. et al. Deregulation of the carbohydrate (chondroitin 4) sulfotransferase 11 (CHST11) gene in a B-cell chronic lymphocytic leukemia with a t(12;14)(q23;q32). Oncogene 23, 6991–6996 (2004).

  24. 24.

    Milland, J., Russell, S. M., Dodson, H. C., McKenzie, I. F. & Sandrin, M. S. The cytoplasmic tail of α1,3-galactosyltransferase inhibits Golgi localization of the full-length enzyme. J. Biol. Chem. 277, 10374–10378 (2002).

  25. 25.

    Kessenbrock, K. et al. A role for matrix metalloproteinases in regulating mammary stem cell function via the Wnt signaling pathway. Cell Stem Cell 13, 300–313 (2013).

  26. 26.

    Wang, L. et al. Somatic mutation as a mechanism of Wnt/β-catenin pathway activation in CLL. Blood 124, 1089–1098 (2014).

  27. 27.

    Ni, T. K. & Kuperwasser, C. Premature polyadenylation of MAGI3 produces a dominantly-acting oncogene in human breast cancer. eLife 5, e14730 (2016).

  28. 28.

    Berger, A. H., Knudson, A. G. & Pandolfi, P. P. A continuum model for tumour suppression. Nature 476, 163–169 (2011).

  29. 29.

    Béguelin, W. et al. EZH2 is required for germinal center formation and somatic EZH2 mutations promote lymphoid transformation. Cancer Cell 23, 677–692 (2013).

  30. 30.

    Ranzani, V. et al. The long intergenic noncoding RNA landscape of human lymphocytes highlights the regulation of T cell differentiation by linc-MAF-4. Nat. Immunol. 16, 318–325 (2015).

  31. 31.

    Hoek, K. L. et al. A cell-based systems biology assessment of human blood to monitor immune responses after influenza vaccination. PLoS ONE 10, e0118528 (2015).

  32. 32.

    Trimarchi, T. et al. Genome-wide mapping and characterization of Notch-regulated long noncoding RNAs in acute leukemia. Cell 158, 593–606 (2014).

  33. 33.

    Tian, B., Hu, J., Zhang, H. & Lutz, C. S. A large-scale analysis of mRNA polyadenylation of human and mouse genes. Nucleic Acids Res. 33, 201–212 (2005).

  34. 34.

    Anders, S., Reyes, A. & Huber, W. Detecting differential usage of exons from RNA-seq data. Genome Res. 22, 2008–2017 (2012).

  35. 35.

    Kim, Y. K., Kim, B. & Kim, V. N. Re-evaluation of the roles of DROSHA, Export in 5, and DICER in microRNA biogenesis. Proc. Natl Acad. Sci. USA 113, E1881–E1889 (2016).

  36. 36.

    Berkovits, B. D. & Mayr, C. Alternative 3′ UTRs act as scaffolds to regulate membrane protein localization. Nature 522, 363–367 (2015).

  37. 37.

    Karanth, S., Zinkhan, E. K., Hill, J. T., Yost, H. J. & Schlegel, A. FOXN3 regulates hepatic glucose utilization. Cell Rep. 15, 2745–2755 (2016).

  38. 38.

    Li, Z. et al. A global transcriptional regulatory role for c-Myc in Burkitt’s lymphoma cells. Proc. Natl Acad. Sci. USA 100, 8164–8169 (2003).

  39. 39.

    Zeller, K. I. et al. Global mapping of c-Myc binding sites and target gene networks in human B cells. Proc. Natl Acad. Sci. USA 103, 17834–17839 (2006).

  40. 40.

    Ren, B. et al. E2F integrates cell cycle progression with DNA repair, replication, and G(2)/M checkpoints. Genes Dev. 16, 245–256 (2002).

  41. 41.

    Taubert, S. et al. E2F-dependent histone acetylation and recruitment of the Tip60 acetyltransferase complex to chromatin in late G1. Mol. Cell. Biol. 24, 4546–4556 (2004).

  42. 42.

    Jenner, R. G. et al. The transcription factors T-bet and GATA-3 control alternative pathways of T-cell differentiation through a shared set of target genes. Proc. Natl Acad. Sci. USA 106, 17876–17881 (2009).

  43. 43.

    Jezkova, J. et al. Brachyury regulates proliferation of cancer cells via a p27Kip1-dependent pathway. Oncotarget 5, 3813–3822 (2014).

  44. 44.

    Veeman, M. T., Slusarski, D. C., Kaykas, A., Louie, S. H. & Moon, R. T. Zebrafish prickle, a modulator of noncanonical Wnt/Fz signaling, regulates gastrulation movements. Curr. Biol. 13, 680–685 (2003).

  45. 45.

    Mayr, C. & Bartel, D. P. Widespread shortening of 3′UTRs by alternative cleavage and polyadenylation activates oncogenes in cancer cells. Cell 138, 673–684 (2009).

  46. 46.

    Landau, D. A. et al. Evolution and impact of subclonal mutations in chronic lymphocytic leukemia. Cell 152, 714–726 (2013).

Download references

Acknowledgements

This work was funded by the NCI grant U01-CA164190 (to C.M. and C.S.L.), a Starr Cancer Foundation grant (to C.M. and C.S.L.), the Innovator Award of the Damon Runyon-Rachleff Cancer Foundation and the Island Outreach Foundation (DRR-24-13; to C.M.), the NIH Director’s Pioneer Award (DP1-GM123454, to C.M.), the Pershing Square Sohn Cancer Research Alliance (to C.M.) and the MSK Core Grant (P30 CA008748). We are grateful to V. K. Modi for access to lymphatic tissue, to D. A. Landau for providing CLL RNA-seq data and sample identities to validate our findings, and to C. Wu and D. Neuberg for clinical outcome analyses. We thank J. Mendell and V. Narry Kim for providing the V5-DICER construct and the DICER knockout cells, J. Chaudhuri for critical reading of the manuscript, and the members of the Mayr laboratory for discussions.

Reviewer information

Nature thanks M. Muschen and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Author information

Author notes

  1. These authors contributed equally: Shih-Han Lee, Irtisha Singh

Affiliations

  1. Cancer Biology and Genetics Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA

    • Shih-Han Lee
    • , Sarah Tisdale
    •  & Christine Mayr
  2. Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA

    • Irtisha Singh
    •  & Christina S. Leslie
  3. Tri-I Program in Computational Biology and Medicine, Weill Cornell Graduate College, New York, NY, USA

    • Irtisha Singh
  4. Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA

    • Omar Abdel-Wahab

Authors

  1. Search for Shih-Han Lee in:

  2. Search for Irtisha Singh in:

  3. Search for Sarah Tisdale in:

  4. Search for Omar Abdel-Wahab in:

  5. Search for Christina S. Leslie in:

  6. Search for Christine Mayr in:

Contributions

S.-H.L. organized and collected the samples and generated the libraries. S.-H.L. and S.T. performed and analysed all validation and functional experiments and contributed to study design. I.S. performed all of the computational analyses with respect to identification of IPA isoforms and their integration with published CLL datasets with input from C.S.L. and C.M. O.A.-W. provided the CLL samples and some of the CLL RNA-seq data. C.M. conceived the study and integrated CLL-IPAs with mutation analysis of solid cancers. C.M., S.-H.L. and S.T. wrote the manuscript with input from all authors.

Competing interests

The authors declare no competing interests.

Corresponding author

Correspondence to Christine Mayr.

Extended data figures and tables

  1. Extended Data Fig. 1 Validation of IPA isoforms by independent methods and identification of CLL-IPAs used for further analysis.

    a, RNA-seq data were used to validate the presence of IPA isoforms using a GLM. Within two 100-nucleotide windows (green bars) separated by 51 nucleotides and located up- and downstream of the IPA peak, the RNA-seq reads were counted. The IPA peak was considered validated if adjusted P < 0.1 (see Methods). Out of n = 5,587 tested IPA isoforms, n = 1,662 were validated by this method. Shown is MGA as a representative example. b, As only a fraction of IPA isoforms were validated by the method from a, additional methods were used to obtain independent evidence for the presence of the IPA isoforms. Independent evidence was obtained using untemplated adenosines from RNA-seq data or through the presence of the IPA isoform in other 3′-seq protocols10. As the majority of immune cell types used in this study have not been investigated using other 3′-seq protocols and IPA isoform expression is cell type-specific2, highly expressed IPA isoforms (>10 TPM) were not excluded from further analysis even if no read evidence was found by other protocols. c, Hierarchical clustering based on IPA site usage separates the 3′-seq dataset into four groups. It separates CD5+ B from CLL samples and clusters CLL samples into three different groups. Shown is the usage difference of the 20% most variable IPA isoforms across the dataset (n = 342). Four out of thirteen CLL samples cluster away from the rest of the samples and are characterized by a high number of IPA isoforms (CLL high). d, The GLM (FDR-adjusted P < 0.1, IPA usage difference ≥ 0.05, IPA isoform expressed in CD5+ B < 8 TPM) identified 477 recurrent (significantly upregulated in at least 2 out of 13 CLL samples by 3′-seq) and 454 non-recurrent (significantly upregulated in 1 out of 13 CLL samples by 3′-seq). IPAs were validated in an independent RNA-seq dataset containing 46 new CLL samples. Among the recurrent IPAs, 71% of testable IPAs were verified using another GLM (see a). Among the non-recurrent IPAs, 64% of testable IPAs were verified. e, Plotting the number of CLL-IPAs per sample separates the CLL samples investigated by 3′-seq into two groups: 4 out of 13 samples generate a high number of CLL-IPAs (CLL high, median of CLL-IPAs/sample, n = 100, range, 42–274), whereas the rest of the samples generate lower numbers (CLL low, median, n = 9, range, 5–28). Centre bar denotes the median; error bars denote the interquartile range. **P = 0.003, two-sided Mann–Whitney U-test.

  2. Extended Data Fig. 2 The normal B cell counterpart of CLL cells are CD5+ B cells derived from lymphoid tissue.

    a, Hierarchical clustering of normal human B cells (naive B (NB), memory B (MemB) and CD5+ B) derived from lymphoid tissues or peripheral blood based on mRNA expression obtained from RNA-seq. The heat map shows the 20% most variable genes across the dataset (n = 1,887). The gene expression profiles of B cell subsets derived from peripheral blood or lymphoid tissue differ substantially, although the same markers were used for purification. b, As in a, but RNA-seq data from CLL samples were added to the analysis. The heat map shows the 20% most variable genes across the dataset (n = 2,078). CLL samples cluster with tissue-derived and not with blood-derived normal immune cells. c, Number of all differentially expressed genes from the analysis shown in b. Source data

  3. Extended Data Fig. 3 The 3′-seq and RNA-seq tracks of functionally validated CLL-IPAs.

    Five CLL-IPAs were functionally validated. Their 3′-seq and RNA-seq tracks are shown here and in Fig. 2a. Data are shown as in Fig. 1b. The corresponding RT–PCRs are shown in Extended Data Fig. 5a.

  4. Extended Data Fig. 4 CLL-IPAs generate truncated mRNAs and proteins.

    Gene models and western blots of 10 candidates depicted as in Figs. 1b and 2a show that CLL B cells generate full-length and IPA-generated truncated proteins. BLCL were used as control B cells and were included in the 3′-seq tracks. Actin was used as loading control on the same blots. For gel source data see Supplementary Fig. 1.

  5. Extended Data Fig. 5 Validation of the IPA-generated truncated mRNAs and validation of their stable expression over time.

    a, Detection of full-length and IPA-generated truncated mRNAs by RT–PCR in normal B cells (CD5+ B, BLCL) and CLL cells used in the western blot validations shown in Fig. 2a and Extended Data Fig. 4. All experiments were performed twice with similar results. Primers to amplify the mRNA isoforms are located in the first and last exons shown in the gene models and are listed in Supplementary Table 3. HPRT was used as loading control. b, Induction of truncated mRNAs and proteins through shRNA-mediated knockdown of splicing factors. All experiments were performed twice with similar results. U2AF1 was knocked down in HeLa cells, U2AF2 was knocked down in HEK293 cells and hnRNPC was knocked down in A549 cells. Shown as in a, except for NUP96, which is shown as in Extended Data Fig. 4. NUP96 is derived from NUP98 precursor. Induction of DICER1 IPA by transfection of increasing amounts of anti-sense morpholinos (MO) directed against the 5′ splice site of intron 23 of DICER1 in HeLa cells. Shown are RT–PCRs. c, RT–PCRs, performed once, on expression of full-length and IPA isoforms for eight CLL-IPAs in samples from two patients with CLL and control B cells (CD5+ B, BLCL). The samples were collected over a time interval of over 6 years. CLL11: T1, 17 months after diagnosis, T2, 24 months, T3, 44 months; CLL6: T1, 16 months, T2, 49 months, T3, 91 months (42 months after treatment). Samples from all time points (except CLL6, T3) were obtained from untreated patients. The primers for amplifications of the products were located in the first and last exons shown in the gene models and are listed in Supplementary Table 3. Expression of HPRT serves as loading control. The same gel picture of HPRT is shown in Fig. 3b for CLL samples and in a, far right panel, for BLCL and CD5+ control samples. All tested CLL-IPA isoforms were detectable at several time points during the course of the disease. Compared with CD5+ B cells, expression of FCHSD2 IPA was not significantly upregulated in CLL. d, Western blots of full-length and IPA-generated truncated proteins from CARD11, DICER and SCAF4. All experiments were performed twice with similar results. Actin was used as loading control. Shown are samples from normal B cells (BLCL) and two patients with CLL, both at two different time points 0.5–10 months apart. For gel source data, see Supplementary Fig. 1.

  6. Extended Data Fig. 6 IPA-generated truncated proteins resemble the protein products of truncating DNA mutations and have cancer-promoting properties.

    a, CARD11 IPA results in translation of intronic nucleotides (grey) until an in-frame stop codon is encountered. This results in the generation of 16 new amino acids (grey) downstream of exon 10. In the case of MGA IPA, three new amino acids downstream of exon 9 are generated. b, Western blot showing that TMD8 cells express similar amounts of CARD11 IPA as CLL samples. The western blot is shown as in Fig. 2a and was performed twice. Actin was used as loading control. c, Western blot (as in b) showing full-length CARD11 as well as CARD11 IPA in TMD8 cells expressing a control shRNA (Co), an shRNA that exclusively knocks down the full-length protein and two different shRNAs that exclusively knock down the CARD11 IPA isoform. The experiment was performed twice with similar results. GAPDH was used as loading control. d, Endogenous phospho-NF-κB p65 levels were measured by FACS in TMD8 cells expressing the indicated shRNAs from c. Mean fluorescent intensity values are shown in parentheses in FACS plots of a representative experiment out of three. e, Immunoprecipitation of V5-DICER or V5-DICER IPA from HEK293T cells using an anti-V5 antibody. The experiment was performed twice with similar results. 2.5% of input was loaded. f, The extent of miRNA processing depends on the expression levels of full-length DICER, but not IPA. Shown are wild-type (WT) and DICER knockout (KO) HCT116 cells. Re-expression of different amounts of full-length DICER1 protein in the knockout cells (measured by western blot of DICER1 in the top panel) results in different levels of endogenous let-7 expression (measured by northern blot in the bottom panel; compare lanes 3 and 4). Expression of DICER IPA has no influence on miRNA processing (compare lanes 4 and 5). Actin and U6 were used as loading controls. The experiment was performed twice with similar results. g, Western blot of MGA. MGA and MGA IPA were cloned and expressed in HEK293T cells to confirm the predicted protein size. The experiment was performed twice with similar results. Shown is also the endogenous MGA expression in Raji cells. Actin was used as loading control on the same blot. Asterisk denotes an unspecific band. h, Protein models of full-length and FOXN3 IPA are shown as in Fig. 2b. The IPA-generated protein truncates the fork-head domain and is predicted to lose the repressive activity. i, As in a, but for FOXN3. FOXN3 IPA generates 32 new amino acids downstream of exon 2. j, FOXN3 IPA significantly derepresses expression of the oncogenic targets MYC and PIM2. Fold change in mRNA level of endogenous genes in MEC1 B cells after transfection of GFP–FOXN3 IPA compared with transfection of full-length GFP-FOXN3. HPRT-normalized values are shown as box plots (as in Fig. 1e) from n = 5 biologically independent experiments, each performed in technical triplicates. **P = 0.002, two-sided t-test for independent samples. For gel source data, see Supplementary Fig. 1. Source data

  7. Extended Data Fig. 7 Inactivation of TSGs by CLL-IPAs independently of DNA mutations.

    a, The distribution of full-length protein size of genes that generate CLL-IPAs (n = 306) and B-IPAs (n = 2,690) is shown in amino acids. Box plots are as in Fig. 1e. P = 0.87, two-sided Mann–Whitney U-test. b, TR rate (ratio of TR mutations compared to total mutations) is shown for known TSGs obtained previously5. Box plots are as in Fig. 1e. P = 1 × 10−155, two-sided Mann–Whitney U-test. c, Known TSGs, obtained previously5 that are targeted by CLL-IPAs (n = 21) are shown. Dark green bars indicate the fraction of retained CDRs for each IPA-generated protein. Black dots indicate the hot spot positions of TR mutations obtained from MSK cbio portal. CLL-IPAs mostly occur upstream or within 10% (of overall amino acid length) of the mutations. P = 0.04, two-sided Wilcoxon rank-sum test. d, Contingency table for enrichment of TSGs among genes that generate CLL-IPAs. P value was obtained from two-sided Fisher’s exact test. TSGs were obtained previously5. e, TSGs and genes that generate CLL-IPA isoforms have longer CDRs than genes that do not generate IPA isoforms. Box plots are as in Fig. 1e. P = 1 × 10−80, two-sided Kruskal–Wallis test. f, Five control gene lists (n = 306, each) with a similar size distribution as CLL-IPAs and expressed in CLL were tested for enrichment of TSGs. Shown is the number of TSGs found. A χ2 test did not show a significant enrichment of TSGs among the control genes. g, Contingency table for enrichment of TR mutation genes in CLL among genes that generate CLL-IPAs. P value was obtained from two-sided Fisher’s exact test. h, ZMYM5 is truncated by a TR mutation and an IPA isoform in the same patient, but the aberrations are predicted to result in different truncated proteins. A 10-bp deletion in exon 3 results in a frameshift leading to the generation of a truncated ZMYM5 protein, whereas ZMYM5 IPA (not yet annotated) produces a truncated protein containing 352 more amino acids in the same patient. The genes shown in h and i are the only genes with simultaneous presence of a TR mutation and CLL-IPA out of n = 268 tested. The position of the TR mutation is indicated in green. CLL7 and CLL11 3′-seq and RNA-seq tracks are shown for comparison reasons. i, MGA is truncated by a TR mutation and an IPA isoform in the same patient. The TR mutation affects the 5′ splice site of intron 7, thus generating two additional amino acids downstream of exon 7, whereas the IPA isoform encodes a truncated MGA protein containing three more amino acids downstream of exon 9. Mutation and 3′-seq analysis were performed once. CLL7 and CLL11 are shown for comparison reasons. j, Shown are additional recurrent (n > 1) DNA mutations found by exome sequencing of CLL patient samples stratified by a high or low number of CLL-IPAs per patient. Only the top and bottom 16 samples with high or low CLL-IPAs are shown to normalize the number of samples analysed. This analysis is only descriptive and no test was performed. k, Significant enrichment of SF3B1 mutations in the group of CLL samples with abundant CLL-IPA isoforms. Two-sided Mann–Whitney U-test was performed. l, Abundance of CLL-IPAs is not associated with IGVH mutational status. Shown is the number of CLL-IPAs per sample for patients with mutated (MUT, n = 30) or unmutated (UN, n = 21) IGVH genes. Box plots are as in Fig. 1e. P = 0.4, two-sided Mann–Whitney U-test. Source data

  8. Extended Data Fig. 8 Novel TSG candidates and validation of CHST11 IPA as cancer-promoting isoform.

    a, As in Fig. 3c, but shown are known (red gene names) and novel TSG candidates (black gene names) among the abundant CLL-IPAs. CLL-IPAs seem to inactivate these genes as they mostly occur upstream or within 10% (of overall amino acid length) of the mutations. P = 1 × 10−8, two-sided Wilcoxon rank-sum test performed on all 136 TSGs; P = 1 × 10−8, two-sided Wilcoxon rank-sum test performed on the novel TSGs, n = 119. Position of the TR mutation was determined using the data obtained from the MSK cbio portal and indicates the hot spot mutation. Right, the fraction of CLL samples affected represents the fraction of CLL samples (out of 59) with significant expression of the IPA isoform. Genes were included if they were affected in at least 20% of samples investigated either by 3′-seq or RNA-seq. b, Contingency table for enrichment of novel TSGs among highly recurrent CLL-IPAs. P value was obtained from two-sided Fisher’s exact test. c, TSGs have larger protein sizes. Box plots are as in Fig. 1e. **P = 0.005, two-sided Mann–Whitney U-test. The increased overall mutation rate of known TSGs correlates with larger protein size. P = 1 × 10−6, Spearman’s correlation coefficient, r = 0.74. d, CHST11 IPA generates 18 new amino acids (grey) downstream of exon 1. e, Experimental set-up to measure paracrine WNT activity produced by MEC1 B cells either expressing GFP, GFP–CHST11 or GFP–CHST11 IPA and using a WNT reporter expressed in HEK293T cells. Primary CLL cells and the CLL cell line MEC1 express several WNTs, including WNT5B. In the presence of CHST11 WNT (red dots) binds to sulfated proteins on the surface of WNT producing cells, whereas WNT is secreted into the medium in the presence of CHST11 IPA. WNT-conditioned medium activates a WNT reporter in HEK293T cells. This set-up refers to Fig. 4f, g. f, Western blot, performed once, for WNT5 shown as in Fig. 4f, but including HeLa cells as positive control for WNT5 expression. Actin was used as loading control on the same blot. Source data

  9. Extended Data Fig. 9 Cancer-upregulated IPA isoforms are also detected in breast cancer and T-ALL.

    a, MAGI3 is a TSG that is preferentially targeted by IPA in breast cancer27. Shown is the mutation profile obtained from MSK cbio portal. b, Expression of IPA isoforms in T-ALL detected by RNA-seq. Shown are 3′-seq and RNA-seq tracks of a representative mRNA (out of n = 101) from CLL samples, T-ALL samples and normal thymus. The T-ALL RNA-seq data were obtained previously32. We detected n = 381 IPA isoforms in at least one T-ALL sample, n = 133 in at least one thymus sample, n = 104 in at least one T-ALL and one thymus sample, and n = 101 in at least two T-ALL samples, but not in any of the thymus samples.

  10. Extended Data Table 1 Samples investigated by 3′-seq and RNA-seq

Supplementary information

  1. Supplementary Figures

    This file contains gel source data for Fig. 2a, 2d, 3c, 4f, Extended Data Fig. 5, 6a-d, 7b, c, e-g, and the individual P values for Fig. 2e.

  2. Reporting Summary

  3. Supplementary Table

    This file contains Supplementary Table S1: CLL-IPAs and CD5+B-IPAs.

  4. Supplementary Table

    This file contains Supplementary Table S2: Novel TSG candidates.

  5. Supplementary Table

    This file contains Supplementary Table S3: Primer sequences.

Source data

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/s41586-018-0465-8

Further reading

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.