Delayed processing of blood samples impairs the accuracy of mRNA-based biomarkers

The transcriptome of peripheral white blood cells (PWBCs) are indicators of an organism’s physiological state, thus making them a prime biological sample for mRNA-based biomarker discovery. Here, we designed an experiment to evaluate the impact of delayed processing of whole blood samples on gene transcript abundance in PWBCs. We hypothesized that storing blood samples for 24 h at 4 °C would cause RNA degradation resulting in altered transcriptome profiles. There were no statistical differences in RNA quality parameters among samples processed after one, three, six, or eight hours post collection. Additionally, no significant differences were noted in RNA quality parameters or gene transcript abundance between samples collected from the jugular and coccygeal veins. However, samples processed after 24 h of storage had a lower RNA integrity number value (P = 0.03) in comparison to those processed after one hour of storage. Using RNA-sequencing, we identified four and 515 genes with differential transcript abundance in samples processed after storage for eight and 24 h, respectively, relative to samples processed after one hour. Sequencing coverage of transcripts was similar between samples from the 24-h and one-hour groups, thus showing no indication of RNA degradation. This alteration in transcriptome profiles can impair the accuracy of mRNA-based biomarkers, therefore, blood samples collected for mRNA-based biomarker discovery should be refrigerated immediately and processed within six hours post-sampling.

Parameters of total RNA based on sampling location and processing delay. We extracted total RNA from 30 samples in one batch, with an average yield of 11.8 µg ± 4.5. There was no difference (P > 0.05) of the parameters from the samples obtained from the coccygeal versus jugular vein (Table 1, Supplementary  Fig. S1). We then compared the effect of delayed processing on the parameters from samples obtained from the jugular vein. There was no difference (P > 0.05) for values of absorbance (A 260 and A 280 ) and the ratio (A 260 /A 280 ). However, there was an effect (P = 0.03) of the time for delayed processing on the RNA integrity number (RIN). The samples processed 24 h post-collection presented lower RIN relative to the samples processed 1 h postcollection ( x RIN.1hr =8.52 ± 0.37, x RIN.24hr =8 ± 0.37, P = 0.03, Z-test, Table 1, Supplementary Fig. S2).
Quality of libraries produced based on sampling location and processing delay. Because the lowest value for RIN was 7.4, which is suitable for transcriptome analysis 27 , we proceeded with RNA-sequencing and produced genome-wide transcriptome data for all 30 samples. On average, we produced 29,871,716 ± 3,365,045 pairs of reads per sample (ranging from 21,139,000 to 34,856,707, median 29,941,861, Table 2).
There was no difference (P > 0.05) on library parameters of 3'/5' bias, efficiency of reads assigned to the annotation, number of genes in relationship to the location of blood sampling, nor the delayed processing of the samples ( Table 2, Supplementary Fig. S3 and S4). We noted, that among the samples with delayed processing, the library with the lowest efficiency of reads assigned to the annotation (40%) did not originate from the sample with  www.nature.com/scientificreports/ lowest RIN (7.4), but both samples were processed after 24 h of preservation at 4 °C. Further interrogation of the relationship between RIN and library percentage of reads assigned to the annotation showed only a moderate correlation between these two metrics (Pearson's r = 0.3799, P = 0.0610). There was no difference (P > 0.05) for number of genes detected pre-or post-filtering in relationship to sampling location or delayed processing ( Table 2, Supplementary Fig. S3 and S4). After filtering for lowly expressed genes (FPKM > 1 and CPM > 1 in five or more samples), we quantified transcript abundance for 12,414 proteincoding genes, followed by 287 long non-coding RNAs and 109 pseudogenes. Differential transcript abundance based on sampling location and processing delay. First, we tested whether transcript abundance would be distinguishable based on the location of sampling. The results show no difference (FDR > 0.05) between transcripts from samples obtained from coccygeal or jugular veins. Next, we assessed the consistency of transcript abundance within each animal by calculating the correlation of the transcript abundance between the two sources of sampling. The Pearson's correlation coefficients were greater than 0.99 for all subjects. Both results convergently show no variation in transcript abundance within subject based on sampling source (Fig. 2). Thus, mRNA quantitation data collected from liquid biopsies are consistent regardless of which vein is used for sampling.
Second, we asked if the transcript abundance in PWBCs would change if blood samples remained stored at 4 °C for different periods of time, relative to the processing of the blood samples within one hour of collection. There was no differential transcript abundance between the samples stored for three or six hours at 4 °C relative to the samples processed within one hour of collection (FDR > 0.05, Fig. 3A).
By comparison, we identified four and 515 genes with differential transcript abundance between samples stored for eight and 24 h, respectively, at 4 °C relative to the samples processed within one hour of collection (Fig. 3A). Notably, the four genes detected in the '8 h vs 1 h' contrast were also detected in the '24 h vs 1 h' contrast with higher transcript abundance in the preserved samples relative to those processed within one hour of collection (Fig. 3B). Furthermore, 291 and 224 genes presented greater and lower abundance, respectively, for the '24 h vs 1 h' contrast (please see Supplementary Table S1 for the lists of genes, and Supplementary Fig. S5 for the individual graphs of transcript abundance for all genes for the contrast '24 h vs 1 h'). These results show that storage of blood samples for ≥ 8 h prior to cryopreservation of PWBCs causes significant changes in the transcriptome profile.
Analysis of the relationship between the decline in transcript abundance and mRNA coverage. Considering the results of differential transcript abundance, we asked if lower values of FPKM for transcripts in the samples processed after 24 h of storage at 4 °C were caused by reduced transcript coverage, which www.nature.com/scientificreports/ can be indicative of RNA degradation 28 . First, we inspected whether there was a global trend of transcripts to have prominent reduced coverage in one of the extremities (3' or 5'). Coverage plots for the 224 genes with lower transcript abundance at 24 h of delayed processing showed a relative nucleotide sequencing depth similar to the 12,264 genes that were not differentially abundant (Fig. 4A, B). We further tested whether the nucleotide coverage of the 224 genes with lower transcript abundance at 24 h of delayed processing was statistically different from distribution of the same genes observed at 1 h of processing (Fig. 4B). First, we calculated the Kolmogorov-Smirnov D-statistic 29 for the relative nucleotide coverage of genes from samples obtained from the jugular and coccygeal veins (both processed within 1 h of blood collection), which we referred to as ( D j,c,1hr ). We also calculated the D-statistic for the relative nucleotide coverage of genes from samples obtained from the jugular vein processed at 24 h and 1 h post-sampling, which we referred to as ( D j,24h,1hr ). Next, we calculated the difference between the two statistics ( Delta D = (D (j,24h,1hr) −D (j,c,1hr) ) ). We reasoned that, for a given gene, Delta D would approximate to zero if the variation in the sequencing coverage was similar between the samples processed at different times (24 h vs 1 h) and the samples processed at the same time (1 h). Indeed, only seven out of the 224 genes (24 h < 1 h) had Delta D within the range of -0.25 and 0.25 (Fig. 4C, left plot). Furthermore, the range of Delta D calculated for the genes with lower transcript abundance at 24 h of storage (24 h < 1 h) was within the range of Delta D calculated for the genes with no transcript variation with the passing of 24 h post-collection (Fig. 4C, center plots). Altogether, these results provided strong evidence that the overall sequencing coverage of transcripts was similar between samples processed after 24 h storage at 4 °C and within one hour of sampling.
Gene ontology enrichment analysis of differentially expressed genes. Because we did not observe a systematic reduction in transcript coverage, we reasoned that the differential transcript abundance was a cellular regulatory response to the preservation of blood samples ex vivo. It was noteworthy that three out of four genes with greater transcript abundance at '8 h vs 1 h' (H1-4, H2AC10 and H4C3) were involved in chromatin configuration, specifically annotated with the gene ontology terms 'nucleosome assembly' (H1-4 and H4C3), and 'chromatin silencing' (H2AC10).
Further interrogation of the 291 genes that had greater abundance at '24 h vs 1 h' also revealed an enrichment of the category 'nucleosome assembly' with a series of histone related genes (H1-4, H2BC12, H2BC13, H2BC14, H2BC18, H2BC4, H2BU1, H4C14, H4C3, H4C4 and H4C8, fold enrichment = 9.78, Fig. 5A, please see Supplementary Table S2 for a complete list of categories and gene annotation). All these genes overlapped with the molecular function 'DNA binding' (BSX, DNMT3B, H1-4, H2AC21, H2AC6, H2AW, H2BC12, H2BC13, Gray shapes indicate genes whose transcript abundance are not significantly altered following the preservation of blood samples (FDR > 0.05). Squares indicate genes whose transcript abundance were significantly altered (FDR < 0.05) based on the DESeq2 algorithm. Triangles indicate genes whose transcript abundance were significantly altered (FDR < 0.05) based on the edgeR algorithm. Circles indicate genes whose transcript abundance were significantly altered (FDR < 0.05) based on both DESeq2 and edgeR algorithms. (B) Transcript abundance for genes with differential abundance in both the ' We also asked if there was enrichment of gene ontology categories within the 224 genes that had less transcript abundance after 24 h of storage at 4 °C, and there were several categories significantly enriched (FWER < 0.05, Fig. 6A, please see Supplementary Table S4 for a complete list of categories and gene annotation). Notably, there were a series of signaling related categories such as 'positive regulation of interferon-gamma production' , 'positive regulation of interleukin-8 production' , 'negative regulation of interferon-gamma production' , 'positive regulation of ERK1 and ERK2 cascade' , 'positive regulation of MAPK cascade' 'positive regulation of interleukin-1 beta production' , 'positive regulation of interleukin-12 production' , 'positive regulation of interleukin-6 production' , 'positive regulation of NF-kappaB transcription factor activity' , 'positive regulation of NIK/NF-kappaB signaling' , 'positive regulation of peptidyl-tyrosine phosphorylation' , and 'positive regulation of phosphatidylinositol 3-kinase signaling' .  In parallel with the identification of the significant enrichment of the categories involved in regulation of signaling and gene expression, the test for enrichment of molecular functions identified that many of those 224 genes were associated with functions that involve interaction with DNA to regulate gene expression, such as 'RNA polymerase II cis-regulatory region sequence-specific DNA binding' , 'DNA-binding transcription factor activity, RNA polymerase II-specific' , and 'DNA-binding transcription factor activity' (Fig. 6B , Supplementary Table S5).

Discussion
The main purpose of our study was to understand the dynamics of RNA degradation and the consequences of this RNA degradation on the quantification of transcript abundance in PWBCs from samples stored in the fridge (4 °C). We collected multiple samples from the same subject and proceeded with a strategic delay in the processing of samples, followed by immediate cryopreservation of PWBCs. Our methodical interrogation of the RNA quality and systematic analysis of transcriptome data lead us to identify critical factors related to the short-term preservation of blood samples for RNA analysis: (i) the vein used for sampling blood is not a source of significant and systematic changes in the transcriptome profiling of PWBCs; (ii) storing blood samples under refrigeration for 24 h does reduce their RIN values by approximately one unit, however the drop in RIN values does not interfere with the quantification of transcripts from protein-coding genes or long non-coding RNAs produced in PWBCs; (iii) even if blood samples are refrigerated, the abundance of gene transcripts produced in PWBCs starts to drop irregularly as early as three hours past blood sampling, but changes are consistent across   23 . However, the results observed from the RNA-sequencing did not show indication of reduced RNA quality. The values of the 3'/5' bias for all libraries ranged from 0.47 to 0.56, with no effect of the processing time on the averages. In samples with degraded RNA, there is a bias towards transcript coverage on the 3' end, whereas samples with 3'/5' bias values close to 0.5 have balanced coverage of RNA extremities and are only observed in samples with high RNA quality 30 . Thus, there was no systematic coverage bias towards the 3' end of polyadenylated transcripts in our samples.
Also based on our hypothesis, we anticipated that transcripts with significantly lower quantification would be a consequence of RNA degradation following a period of storage of blood samples at 4 °C. Here we reasoned that coverage plots for the 224 genes with lower abundance at 24 h in the contrast '24 h vs 1 h' would be distinct between the libraries produced from the 24 h group versus the 1 h group. Contrary to our expectation, the coverage charts (Fig. 4A) showed a virtually identical coverage of transcripts with significantly lower quantification whether on samples processed within one hour or 24 h of collection. Furthermore, a comparison of the distributions using the Kolmogorov-Smirnov test confirmed no significant changes in transcript coverage based on the amount of time that samples were preserved.
A possible explanation for the discrepancy between the significantly lower values of RIN for samples preserved for 24 h and the consistent sequencing coverage across transcripts is on the source of data. The RIN values are computed based on data collected from a series of features of an electropherogram, most of which involve information from ribosomal RNAs (5S, 5.8S, 18S and 28S) 31 . On the other hand, RNA-sequencing libraries were produced with enrichment of polyadenylated transcripts, and thus, the results from 3'/5' bias nor coverage plots do not account for ribosomal RNA. Our results indicate that, although a correlation between transcript coverage and RIN values have been identified 28,31 , this relationship may be prominent in samples with RIN values less than 8.
Our results show that there is a prominent systematic alteration of transcript abundance in PWBCs when blood samples are preserved for 24 h at 4 °C, which is aligned with previous reports 23,26 . Interestingly, we determined that this alteration of transcript abundance across individuals starts as early as eight hours post-collection.
Because we could not find indication that RNA degradation was a cause of these alterations, we reasoned that the alteration in transcript abundance was a consequence of the PWBCs responding to the cold temperature (4 °C) and lack of oxygen. The consequences of long-term exposure of mammalian cells at 4 °C have not been well studied, but Al-Fageeh and Smales 32 proposed that the active transcription of a selected group of genes would cause a wide-spread reduction in transcription activity. Well-aligned with this possible mechanism, three out of four genes up regulated in PWBCs after the storage of blood samples for eight hours at 4 °C have a role in chromatin organization, including nucleosome assembly, which can be related to a compaction of the chromatin and reduction in transcriptional activity.
The alteration of transcript abundance in PWBCs after storing blood samples at 4 °C for 24 h has been observed before 26 . However, our genome-wide transcriptome analysis shows that the changes are more prominent after 24 h of storage of blood samples at 4 °C. The genes with greater transcript abundance after 24 h of storage of blood samples at 4 °C relative to those processed within one hour of collection seem to be enriched for few biological processes and again with a high enrichment for genes involved in nucleosome assembly. It is possible that the cells increase the transcription of histone related genes to increase the genome-wide compaction of chromatin. The greater number of biological processes enriched for genes with lower transcript abundance after 24 h of storage of blood samples at 4 °C relative to those processed within one hour of collection corroborate the notion of a global silencing in transcription.
Considering the results of significant differential transcript abundance observed in the present study, we reasoned that the prolonged storage of blood samples at 4 °C would be relevant for investigations searching for mRNA markers in PWBCs. The overlap of our results with transcript abundance of genes also expressed in PWBCs and previously associated with fertility in heifers 20,21 identified two genes (NKG2A, PPP1R3B, Supplementary Fig. S6) whose analysis of differential transcript abundance would have been compromised by storage of blood samples at 4 °C for eight hours or longer. These results strongly indicate that blood samples collected for studies of mRNA biomarkers should: (i) be preserved on ice as soon as they are collected and processed as early as possible, preferably within six hours of collection, for the proper cryopreservation of PWBCs, or (ii) if possible, collected in tubes that allow for the immediate preservation of RNA transcript abundance in the whole blood. However, we note that the chemical or cryopreservation of whole blood for RNA extraction requires further depletion of hemoglobin transcripts if the samples will be used for RNA-sequencing 33,34 .
The transcriptome of PWBCs changes after blood sampling, even if the samples are refrigerated. A systemic alteration is detected at eight hours post blood collection and follows a pattern where PWBCs increase the transcription of genes related to chromatin compaction. This compaction is likely to reduce the transcription of several genes that function across multiple cellular processes in PWBCs. It is evident that this alteration in transcriptome profiles after prolonged storage can mask the transcriptome signature of a specific physiological phenomenon.
Our findings can be used as a guide for the establishment of protocols for blood processing when samples are supposed to be used for genome-wide quantification of transcripts in PWBCs. Blood samples collected for mRNA-based biomarker discovery should be refrigerated immediately and processed within six hours post-sampling. This recommendation can be considered by investigators working in diverse several areas of life sciences.

Methods
The reporting in this study follows the recommendations in the ARRIVE guidelines 35 . Please see Supplementary table S6 for catalog number of kits and reagents used in this work.

Animal handling and sample collection. All animal handling and use was approved by the Institutional
Animal Care and Use Committee (IACUC) at Virginia Tech. All procedures involving animal handling were performed in accordance with IACUC guidelines and regulations. Eleven crossbred beef heifers (Angus x Simmental cross), averaging 14 months of age, located at Kentland Farm (Virginia Tech, Blacksburg, VA) were subjected to estrus synchronization. On day zero we administered an intramuscular injection of gonadotrophin-releasing hormone (GnRH, 100 μg; Factrel ® ; Zoetis Incorporated, Parsippany, NJ) and inserted a controlled internal drug release (CIDR, 1.38 g Progesterone; Eazi-Breed™ CIDR ® ; Zoetis Inc.) device in each heifer. On day seven we removed the CIDR insert and administered an intramuscular injection of prostaglandin F2alpha (PGF2α, 25 μg; Lutalyse ® ; Zoetis Inc.), which was followed by a second injection of GnRH on day ten of the protocol. We used estrus synchronization to mitigate possible effects that the stages of the estrus cycle may have on gene expression 36 .
We collected blood samples from heifers that expressed estrus (n = 5) at the time artificial insemination would normally be performed. Fifty mL of blood were sampled from the jugular vein and 10 mL from the coccygeal vein of each heifer using vacutainers containing 18 mg K2 EDTA (Becton, Dickinson, and Company, Franklin Lakes, NJ). Each tube was inverted several times to prevent blood coagulation and placed on ice immediately until processing.
Experimental design and blood processing. Blood tubes were sprayed thoroughly with a disinfectant (Lysol ® ) prior to storage. While on ice, tubes containing samples from the jugular vein were randomly assigned into five groups: 1 h, 3 h, 6 h, 8 h, and 24 h, which correspond to the time the samples remained at 4 °C prior to processing. We processed blood samples from the coccygeal vein in group 1 h for comparison of gene expression with samples from the jugular vein.
The buffy coat was separated from whole blood by centrifugation for 20 min at 2000xg at 4 °C. The buffy coat of each sample was aspirated and dispensed into 14 mL of a red blood cell lysis buffer solution (1.55 M ammonium chloride, 0.12 M sodium bicarbonate, 1 mM EDTA, Cold Spring Harbor Protocols). The mixture was gently mixed on a rocker for 10 min at room temperature, and then centrifuged for 10 min at 800xg at 4 °C. The supernatant was removed, and each sample was mixed with 200 µL of TRIzol™ Reagent (Invitrogen™, Thermo Fisher Scientific, Waltham, MA). The mixture of TRIzol™ and PWBCs was transferred into cryotubes (Corning Incorporated, Corning, New York) and then snap frozen in liquid nitrogen prior to storage at − 80 °C 20,21 . Total RNA extraction. Total RNA was extracted from the PWBCs using the acid guanidinium thiocyanate-phenol-chloroform procedure 37,38 , with the aid of Phasemaker™ tubes (Invitrogen™, Thermo Fisher Scientific, Waltham, MA), following the manufacturer's instructions. Briefly, the samples were thawed on ice and 800 μL of TRIzol™ was added to each. Once homogenized, the mixture was transferred into Phasemaker™ tubes, where it was mixed with 200 μL of chloroform and centrifuged for 5 min at 12,000xg at 4 °C to complete phase separation. Next, the aqueous phase was collected into 1.7 mL microtubes and mixed with 0.5 μL of glycoblue. Then, 500 μL of 100% isopropanol was added to each tube and they were centrifuged for 10 min at 12,000xg at 4 °C to precipitate the RNA. The RNA pellet was collected and washed twice with 1 mL of 75% ethanol and centrifuged for 2 min at 7,500xg at 4 °C. Then, the RNA pellet was air-dried briefly and eluted in nuclease free water and maintained on ice for quantification and assessment of quality.
We quantified the total RNA concentration (A 260 ) and purity (A 260 /A 280 ratios) using a NanoDrop™ 2000 Spectrophotometer (Thermo Fisher Scientific, Waltham, MA). We also quantified the RNA using a Qubit RNA High Sensitivity Assay Kit (Invitrogen™, Thermo Fisher Scientific, Waltham, MA) assayed on a Qubit 4 Fluorometer (Invitrogen™, Thermo Fisher Scientific, Waltham, MA). Next, we evaluated the RNA integrity by assaying a sample on an Agilent 2100 Bioanalyzer (Agilent, Santa Clara, CA) using the Agilent RNA 6000 Pico Kit (Agilent, Santa Clara, CA).
Library preparation and high-throughput sequencing. We diluted the RNA samples to 1 ng/mL for library preparation and confirmed the concentration using a Qubit RNA High Sensitivity Assay Kit (Invit-rogen™, Thermo Fisher Scientific, Waltham, MA) and Qubit 4 Fluorometer (Invitrogen™, Thermo Fisher Scientific, Waltham, MA). Five hundred ng were used as starting material for library preparation using the TruSeq ® Stranded mRNA Library Prep (Illumina, Inc, San Diego, CA) and the IDT-ILMN TruSeq UD indexes. Sequencing was assayed in a NovaSeq 6000 sequencing platform (Illumina, Inc, San Diego, CA) using a NovaSeq 6000 SP Reagent Kit v1.5, to produce paired-end reads 150 nucleotides long. Preparation of libraries and sequencing assays was performed by staff at the Virginia Tech Genomics Sequencing Center.
Alignment of sequences and filtering. We removed the sequencing adapters using cutadapt (v. 2.8) and the sequences indicated by the manufacturer (Illumina, Inc, San Diego, CA). Next, we aligned the sequences to the cattle genome 39 Quantification of transcript abundance and gene filtering. We used featureCounts (subread v. 2.0.1 46 ) to count the fragments that matched to the Ensembl cattle annotation gene (Bos_taurus.ARS-UCD1.2.103). Genes annotated as protein coding, long non-coding RNA and pseudogene were retained. Following the calculation of counts per million (CPM) and reads per million per kilobase (FPKM) we retained genes that presented FPKM and CPM greater than one in five or more samples.
Quantification of library properties. We calculated the 3'/5' bias in our libraries using RNA-SeQC (v. 2.4.2 30 ), and the proportion of reads assigned to annotation by dividing the number of reads mapped to the Ensembl annotation divided by the total number of reads sequenced.

Statistical analyses. RNA metrics (RIN, A 280 , A 260 and A 280 /A 260 ) and number of genes detected per li-
brary. We used paired Student's t 47,48 and Wilcoxon 49 tests to access the null hypothesis of no difference between two sampling locations (H 0 :μ jugular = μ coccygeal ). Within the samples obtained from the jugular vein, we used a generalized linear mixed model to access the null hypothesis of no difference between groups of delayed processing (H 0 :μ (T1h) = μ (T2h) = … = μ (T24h) ). The model included time of processing (T (1 h, 3 h, 6 h, 8 h or 24 h) ) as fixed effect and animal as random variable (A (1,2,3,4 or 5) ). When the model indicated significance of the fixed effect (P < 0.05), we used the Z-test 50 and the Dunnett's approach 51 for simultaneous tests for general linear hypothesis 52,53 to compare the average of the groups T (3 h, 6 h, 8 h or 24 h) with the baseline T (1 h) . Averages were inferred as statistically different when Bonferroni adjusted P < 0.05. Library 3'/5' bias, proportion of reads assigned to annotation and genes detected. We used a generalized linear mixed model, with a binomial family and a logistic regression function to access the null hypothesis of no difference between groups of delayed processing (H 0 :μ (T1h) = μ (T2h) = … = μ (T24h) ). The time of processing was included in the model as fixed effect and animal was set as random effect. Averages were inferred as statistically different when P < 0.05. Differential transcript abundance. We compared the transcript abundance from samples obtained from the jugular and coccygeal veins by using a paired-sample structure (H 0 :μ jugular = μ coccygeal ). Next, we compared the transcript abundance from samples obtained from the jugular vein that were processed at different times. The analyses were performed with the R packages 'edgeR' 54 using the quasi-likelihood F-test and 'DESeq2′ 55 using the Wald's test. In the case of the delayed processing, we set up contrasts to compare the different processing times versus T 1h (H 0 :μ (T1h) = μ (T2h) ; ….; H 0 :μ (T1h) = μ (T24h) ). We adjusted nominal P values for multiple hypothesis testing using the Benjamini-Hochberg false discovery rate 56 . We assumed a difference in transcript abundance to be significant when FDR < 0.05 in the results obtained by both 'edgeR' and 'DESeq2' packages and absolute Log (fold-change) > 0.5. We utilized this approach to report robust results of differential transcript abundance independent of algorithm biases or limitations 20,21,57,58 .
Gene ontology enrichment analysis. We tested lists of genes for enrichment of gene ontology terms using the R package 'GOseq' 59 and the genes retained after filtering as a background list 60,61 . Nominal P values were adjusted for multiple hypothesis testing by family wise error rate 62,63 .
Contrasts of transcript coverage. We quantified the relative position of each nucleotide in relation to the total number of nucleotides in the transcript, given in percentage. In addition, we calculated the relative proportion of occurrence of each nucleotide in relation to the total coverage of the gene. Then, for each gene in different groups, in a pair-wise manner, we compared the relative position of each nucleotide weighed by the relative coverage using the weighted Kolmogorov-Smirnov test, as described elsewhere 29 .