TP53 isoform junction reads based analysis in malignant and normal contexts

TP53 is one of the most frequently altered genes in cancer; it can be inactivated by a number of different mechanisms. NM_000546.6 (ENST00000269305.9) is by far the predominant TP53 isoform, however a few other alternative isoforms have been described to be expressed at much lower levels. To better understand patterns of TP53 alternative isoforms expression in cancer and normal samples we performed exon-exon junction reads based analysis of TP53 isoforms using RNA-seq data from The Cancer Genome Atlas (TCGA), Cancer Cell Line Encyclopedia (CCLE), and Genotype-Tissue Expression (GTEx) project. TP53 C-terminal alternative isoforms have abolished or severely decreased tumor suppressor activity, and therefore, an increase in fraction of TP53 C-terminal alternative isoforms may be expected in tumors with wild type TP53. Despite our expectation that there would be increase of fraction of TP53 C-terminal alternative isoforms, we observed no substantial increase in fraction of TP53 C-terminal alternative isoforms in TCGA tumors and CCLE cancer cell lines with wild type TP53, likely indicating that TP53 C-terminal alternative isoforms expression cannot be reliably selected for during tumor progression.


TP53 isoform junction reads based analysis in malignant and normal contexts
Suleyman Vural 1,3 , Lun-Ching Chang 2 , Laura M. Yee 1 & Dmitriy Sonkin 1* TP53 is one of the most frequently altered genes in cancer; it can be inactivated by a number of different mechanisms. NM_000546.6 (ENST00000269305.9) is by far the predominant TP53 isoform, however a few other alternative isoforms have been described to be expressed at much lower levels. To better understand patterns of TP53 alternative isoforms expression in cancer and normal samples we performed exon-exon junction reads based analysis of TP53 isoforms using RNA-seq data from The Cancer Genome Atlas (TCGA), Cancer Cell Line Encyclopedia (CCLE), and Genotype-Tissue Expression (GTEx) project. TP53 C-terminal alternative isoforms have abolished or severely decreased tumor suppressor activity, and therefore, an increase in fraction of TP53 C-terminal alternative isoforms may be expected in tumors with wild type TP53. Despite our expectation that there would be increase of fraction of TP53 C-terminal alternative isoforms, we observed no substantial increase in fraction of TP53 C-terminal alternative isoforms in TCGA tumors and CCLE cancer cell lines with wild type TP53, likely indicating that TP53 C-terminal alternative isoforms expression cannot be reliably selected for during tumor progression.
Tumor suppressor gene TP53 plays an important role in tumor biology and has been extensively studied since its discovery about 40 years ago 1 . TP53 can be inactivated by a variety of different mechanisms such as missense loss of function mutations, frame shift and nonsense mutations, splice site mutations, deletions, rearrangements, and loss of expression [2][3][4] . To fulfill its proper biological function four TP53 polypeptides must form a tetramer which functions as a transcription factor 5,6 . Therefore, even if one out of four polypeptides have an inactivating mutation it may lead to a dominant negative phenotype of variable degree due to effects on the tetramer 2,7 .
NM_000546.6 (ENST00000269305.9), also known as p53α, is by far the predominant TP53 isoform, however a few other alternative isoforms have been described to be expressed at much lower levels [8][9][10] . To better understand patterns of TP53 alternative isoforms expression in cancer and normal samples we performed exon-exon junction reads based analysis of TP53 isoforms using RNA-seq data from The Cancer Genome Atlas (TCGA), Cancer Cell Line Encyclopedia (CCLE), and Genotype-Tissue Expression (GTEx) projects. Exon-exon junction reads based analysis allows us to look for unambiguous evidence of junctions between two exons using short reads RNA sequencing data. In order to differentiate between predominant TP53 isoform NM_000546.6 and other alternative isoforms, this analysis was focused on exon-exon junctions unique to alternative isoforms and not present in predominant TP53 isoform NM_000546.6. Figure 1 illustrates such exon junctions. Unfortunately, TP53 N-terminal alternative isoforms do not have unique exon-exon junctions and therefore analysis is limited to TP53 C-terminal alternative isoforms.

Materials and methods
RNA-seq data (101 base pairs read length) from 834 CCLE cancer cell lines, corresponding TP53 mutation calls based on DNA and/or RNA sequencing, and TP53 CN-ratio based on AFFY SNP 6.0 arrays have been used in analysis 11 . (CN-ratio derived from AFFY SNP 6.0 arrays is the ratio of signal intensity in a tumor sample versus normal reference samples normalized to total DNA quantity; thus, a CN-ratio of 1 corresponds to a diploid locus.) TCGA RNA-seq data (48 to 76 base pairs read length) from 8795 samples with corresponding TP53 mutation calls and arrays based TP53 CN-ratio has been used in this analysis 12,13 . GTEx RNA-seq data (76 base pairs read length) from 9512 normal tissue samples from 549 donors has been used in analysis 14 . RNA-seq data (BAM files) from GTEx, TCGA, CCLE were used as provided by corresponding sources.
Reads are considered as exon-exon junction reads if the aligned read has no mismatches (no more than half of read could be soft clipped), has a breakpoint exactly matching expected exon-exon boundary, and the read has at least ten nucleotides on both sides of breakpoint as illustrated on Fig. 2.
Exonic fraction of junctions distinct from TP53 main isoform is calculated as (number of distinct junctions reads between relevant exons) / (number of TP53 main isoform junctions reads between relevant exons + number of distinct junctions reads between relevant exons) for each sample selected for analysis. In isoform p53β (NM_001126114.2) junctions between exon 9 and exon 10, exon 10 and exon 11 are distinct from TP53 main isoform NM_000546.6 junction between exon 9 and exon 10, for better comparison number of junction reads between exon 9 and exon 10, exon 10 and exon 11 in isoform NM_001126114.2 is averaged for exonic fraction calculations. This particular exon-exon junction is also present in ∆40p53β/∆133p53β/∆160p53β isoforms and is used to calculate exonic fraction for C-terminal isoforms.
Statistical analysis is exploratory in nature, and is meant to describe the datasets at hand. Mean and median isoform fractions are calculated, as appropriate. P-values are assessed at an adjusted significance level, where the adjustment to the significance level is made to account for multiple comparisons using a Bonferroni correction. The number of t-tests performed in the primary analysis of the TCGA data in Tables 2 and 3 is 79 and therefore  the Bonferroni adjusted significance level for Tables 2 and 3 is ~ 6.33E-4. The number of t tests performed in the secondary analysis of the TCGA data in Supplemental Table 4 and Supplemental Table 5 is 76 and therefore the Bonferroni adjusted significance level for Supplemental Table 4 and Supplemental Table 5 is ~ 6.58E-4. Number of t tests performed in the analysis of CCLE data in Supplemental Table 7 is 10 and therefore the Bonferroni adjusted significance level for Supplemental Table 7 is 0.005. P values are considered to be statistically significant only if they meet these adjusted significance levels. www.nature.com/scientificreports/

Results
As a first step in our analysis we used GTEx RNA-seq data in order to measure fraction of C-terminal and NM_001126112.2 alternative isoforms mRNA expression in normal tissues using exon-exon junctions reads based approach. Details on calculating exonic fraction of exon-exon junctions is described in materials and methods. Table 1 lists exonic fraction average and median for alternative C-terminal isoforms across GTEx tissue types. (Supplemental Table 1 provides exonic fractions for C-terminal and NM_001126112.2 alternative isoforms for each GTEx sample used in analysis.) As can be seen from Table 1, the highest median percentage for C-terminal alternative isoforms expression is ~ 4% in skin and spleen. The average percentage for C-terminal alternative isoforms expression across all GTEx tissue types is ~ 2%. NM_001126112.2 expression is a few times lower than of C-terminal alternative isoforms, with highest expression also observed in skin. Similar patterns for C-terminal and NM_001126112.2 alternative isoforms expression are observed in gtexportal.org. TCGA has subset of samples in which in addition to tumor sample there is matching adjacent normal tissue sample (Supplemental Table 2). Such paired samples present a valuable opportunity to compare C-terminal and NM_001126112.2 alternative isoforms expression in wild-type (WT) TP53 tumors and matching adjacent normal tissue. We used data from Supplemental Table 2 to select paired samples with tumor samples without TP53 mutations, we considered such tumors to be TP53 WT if they also exhibited a log2(CN-ratio) > − 0.9 and TP53 mRNA RNA-Seq V2 RSEM normalized > 300 12 . C-terminal alternative isoforms fraction average is ~ 0.62% in paired TP53 WT tumors and ~ 0.66% in paired adjacent normal tissues; there is no statistically significant increase in C-terminal isoforms presence in TP53 WT tumors in comparison to paired adjacent normal tissues, paired t test p value 0.689. NM_001126112.2 alternative isoform fraction average is ~ 1.29% in paired TP53 WT tumors and ~ 1.27% in paired adjacent normal tissues; there is no statistically significant difference in NM_001126112.2 isoform presence between paired TP53 WT tumors and adjacent normal tissues, paired t test p value 0.94.
We also used data from Supplemental Table 2 to compare paired TP53 tumors with frame shift, nonsense, splice site mutations and adjacent normal tissues. C-terminal alternative isoforms fraction average is ~ 2.4% in paired tumors TP53 with frame shift, nonsense, splice site mutations and ~ 0.9% in paired adjacent normal tissues, which corresponds to a statistically significant difference in C-terminal isoform presence between paired tumors with TP53 frame shift, nonsense, splice site mutations and adjacent normal tissues, paired t test p value 0.0053. This difference is likely driven by some of frame shift, nonsense, splice site mutations causing aberrant C-terminal splicing. NM_001126112.2 alternative isoform fraction average is ~ 0.85% in paired tumors with TP53 frame shift, nonsense, splice site mutations and ~ 1.66% in paired adjacent normal tissues; there is no statistically significant difference in NM_001126112.2 isoform presence between paired tumors with TP53 frame shift, nonsense, splice sites mutations and adjacent normal tissues, paired t test p value 0.056. This is likely due to vast majority of frame shift, nonsense, splice site mutations located after NM_001126112.2 isoform specific exon-exon junction in 5' UTR. TCGA samples are from 33 different tumor types, keeping in mind differences in C-terminal and NM_001126112.2 alternative isoforms expression across different tissue types, we performed comparisons between TP53 WT tumors (no TP53 mutations, log2(CN-ratio) > − 0.9 and TP53 mRNA RNA-Seq V2 RSEM normalized > 300) and tumors with TP53 missense mutations in each tumor type with at least 5 samples in each group. Supplemental Table 3 provides exonic fractions for C-terminal and NM_001126112.2 alternative isoforms for each TCGA sample used in this analysis. Table 2 provides comparison results for C-terminal alternative isoforms for 22 TCGA tumor types with sufficient number of samples. As can be seen from Table 2 across all 22 tumor types, there is no statistically significant increase in C-terminal alternative isoforms in TP53 WT tumors in comparison to tumors with TP53 missense mutations. As can be seen from Supplemental Table 4 across all 22 tumor types there is no statistically significant difference in NM_001126112.2 alternative isoform in TP53 WT tumors in comparison to tumors with TP53 missense mutations.
We also performed comparisons between TP53 WT tumors (no TP53 mutations, log2(CN-ratio) > − 0.9 and TP53 mRNA RNA-Seq V2 RSEM normalized > 300) and tumors with TP53 frame shift, nonsense, splice site    www.nature.com/scientificreports/ mutations across 19 TCGA tumor types with sufficient number of samples. As can be seen from Table 3, in many tumor types, there is a statistically significant difference in C-terminal isoforms presence between tumors with TP53 frame shift, nonsense, splice site mutations and TP53 WT tumors. This difference is likely driven by some of frame shift, nonsense, splice site mutations causing aberrant C-terminal splicing. As can be seen from Supplemental Table 5 in all, but one of the 18 tumor types there is no statistically significant difference (after correction for multiple hypothesis testing using Bonferroni method, as described in the Materials and Methods section) in NM_001126112.2 alternative isoform in TP53 WT tumors in comparison to tumors with TP53 frame shift, nonsense, splice site mutations. (The statistically significant difference in one tumor type is potentially an artifact due to small sample size.) This is likely because the vast majority of frame shift, nonsense, splice site mutations are located after the NM_001126112.2 isoform specific exon-exon junction in 5' UTR.
CCLE is a well characterized collection of cancer cell lines with comprehensive genomic data which allows us to investigate patterns of TP53 C-terminal and NM_001126112.2 alternative isoforms expression. Supplemental Table 6 provides exonic fractions for C-terminal and NM_001126112.2 alternative isoforms, TP53 status and other relevant detailed data for each cell line. We performed comparisons between TP53 WT CCLE cell lines and CCLE cell lines with TP53 missense mutations with RNA-seq data. C-terminal alternative isoforms fraction average is ~ 3.63% in TP53 WT cell lines and ~ 3.64% in cell lines with TP53 missense mutations; there is no statistically significant increase in C-terminal isoform presence in TP53 WT cell lines in comparison to cell lines with TP53 missense mutations, t test p value 0.988 (2 tails, unequal variance). NM_001126112.2 alternative isoform fraction average is ~ 0.82% in TP53 WT cell lines and ~ 0.88% in cell lines with TP53 missense mutations; there is no statistically significant difference in NM_001126112.2 isoform presence in TP53 WT cell lines in comparison to cell lines with TP53 missense mutations, t test p value 0.287 (2 tails, unequal variance).
We also performed comparisons between TP53 WT cell lines and cell lines with TP53 frame shift, nonsense, splice site mutations. C-terminal alternative isoforms fraction average is ~ 23.95% in cell lines with TP53 frame shift, nonsense, splice site mutations; there is a statistically significant difference in C-terminal isoforms presence between cell lines with TP53 frame shift, nonsense, splice site mutations and TP53 WT cell lines, t test p value 6.62E-27 (2 tails, unequal variance). NM_001126112.2 alternative isoform fraction average is ~ 1.2% in cell lines with TP53 frame shift, nonsense, splice site mutations; there is no statistically significant difference in NM_001126112.2 isoform presence in TP53 WT cell lines in comparison to cell lines with TP53 frame shift, nonsense, splice site mutations, t test p value 0.12 (2 tails, unequal variance). Supplemental Table 7 summarizes patterns of TP53 C-terminal and NM_001126112.2 alternative isoforms expression.

Conclusions
TP53 C-terminal alternative isoforms have abolished or severely decreased tumor suppressor activity, and therefore an increase in fraction of TP53 C-terminal alternative isoforms may be expected in tumors with wild type TP53. However, as we described in the results section, we observed no substantial increase in fraction of TP53 C-terminal alternative isoforms in TCGA tumors and CCLE cancer cell lines with wild type TP53, likely indicating that TP53 C-terminal alternative isoforms expression cannot be reliably selected for during tumor progression. Small, but noticeable C-terminal alternative isoforms expression differences across GTEx tissue types coupled with our observation that TP53 C-terminal alternative isoforms expression cannot be reliably selected for during tumor progression hints at the possibility that function of TP53 C-terminal alternative isoforms may lay in fine tuning TP53 activity. It is also interesting to note that presence of TP53 C-terminal alternative isoforms specific exon-exon junctions in TCGA tumors and in CCLE cancer cell lines is driven in part by tumors with frame shift, nonsense, splice site mutations causing in some cases aberrant C-terminal splicing.