Introduction

Blood is a fluid connective tissue that links the entire biological system of an individual, and is composed of plasma and red and white blood cells1. Liew and colleagues1 coined the idea of the “sentinel principle”, whereby blood can harbor molecular indicators of physiological changes in organs, tissues, and cells. Gene transcripts in peripheral white blood cells (PWBCs) are among these molecular indicators. The transcriptome profile of PWBCs is distinct from one individual to another2,3, and the profile is as dynamic as the physiological changes that an individual experiences4,5,6. Most importantly, changes in gene expression are detected in the blood relative to several environmental and pathological factors (reviewed in1,7).

Liquid biopsy, including from blood, has emerged as a powerful source of biological material for studying messenger RNA (mRNA) based biomarkers8. For instance, mRNAs have been associated with chemo-sensitivity in advanced gastric cancer patients9, non-small cell lung cancer10, acute ischemic stroke11, neuroendocrine tumor12, prostate cancer13, hepatocellular carcinoma14,15, and Huntington’s disease16. In reproductive health, several studies have focused on changes in genes expressed in PWBCs. A cohort of women who were enrolled in the PREVIENI project17, and identified as infertile, presented altered levels of diverse genes expressed in the PWBCs relative to fertile women18,19. Recently, we have identified several genes that are differentially expressed when contrasting heifers of different pregnancy outcomes (pregnant by AI, pregnant by natural breeding, or not pregnant)20,21. Therefore, the analysis of mRNAs is one possible avenue for the determination of bloodborne molecules that serve as biomarkers of health.

The processing of blood samples for the separation of the buffy coat followed by resuspension in TRIzol Reagent and immediate cryopreservation at − 80 °C is very efficacious for the extraction of RNAs with high quality and purity20,21,22,23 suitable for producing data by RNA-sequencing20,21,24,25. However, when the site of collection cannot be used to process the blood samples, there is a window of time between sampling and collection. Malentacchi et al. detected the alteration of transcript abundance of one gene, out of seven tested by polymerase chain reaction, when samples were stored for 24 h at 4 degrees Celsius (°C)26. To date, no study has carried out a systematic interrogation of the transcriptome of PWBCs to understand the consequences of storing blood on the alteration of transcript abundance.

Here, we designed an experiment to systematically interrogate the consequences of storing blood samples at different periods up to 24 h at 4 °C on RNA degradation and the transcriptome profile in PWBCs. We hypothesized that the preservation of blood samples for 24 h at 4 °C would lead to RNA degradation, which would result in an alteration in the transcriptome profile of PWBCs.

Results

Overview of the experimental design

We collected blood samples from five estrus-synchronized heifers. Ten mL of blood were drawn from the coccygeal vein and five samples of 10 mL were drawn from the jugular vein within seconds among all samplings within each animal. All samplings were performed within 45 min. All tubes were preserved on ice and the samples from the jugular vein were randomly assigned to different groups for delayed processing (one, three, six, eight or 24 hours (h), Fig. 1A). At the assigned time, PWBCs were isolated, pelleted and resuspended in TRIzol™ Reagent for cryopreservation at  − 80 °C. We extracted total RNA from all samples in one batch, assessed quantity and quality and submitted all samples for library preparation prior to freezing (Fig. 1B).

Figure 1
figure 1

Experimental design and workflow. (A) Overview of the experimental design with the number of subjects and tubes processed at different times after storage at 4 °C. (B) Workflow of the assays from sample processing to RNA-sequencing. Elements of this figure were created with BioRender.com.

Parameters of total RNA based on sampling location and processing delay

We extracted total RNA from 30 samples in one batch, with an average yield of 11.8 µg ± 4.5. There was no difference (P > 0.05) of the parameters from the samples obtained from the coccygeal versus jugular vein (Table 1, Supplementary Fig. S1). We then compared the effect of delayed processing on the parameters from samples obtained from the jugular vein. There was no difference (P > 0.05) for values of absorbance (A260 and A280) and the ratio (A260/A280). However, there was an effect (P = 0.03) of the time for delayed processing on the RNA integrity number (RIN). The samples processed 24 h post-collection presented lower RIN relative to the samples processed 1 h post-collection (\({\overline{x} }_{RIN.1hr}\)=8.52 ± 0.37, \({\overline{x} }_{RIN.24hr}\)=8 ± 0.37, P = 0.03, Z-test, Table 1, Supplementary Fig. S2).

Table 1 RNA parameters obtained from peripheral white blood cells.

Quality of libraries produced based on sampling location and processing delay

Because the lowest value for RIN was 7.4, which is suitable for transcriptome analysis27, we proceeded with RNA-sequencing and produced genome-wide transcriptome data for all 30 samples. On average, we produced 29,871,716 ± 3,365,045 pairs of reads per sample (ranging from 21,139,000 to 34,856,707, median 29,941,861, Table 2).

Table 2 Metrics for RNA-sequencing data produced from peripheral white blood cells.

There was no difference (P > 0.05) on library parameters of 3’/5’ bias, efficiency of reads assigned to the annotation, number of genes in relationship to the location of blood sampling, nor the delayed processing of the samples (Table 2, Supplementary Fig. S3 and S4). We noted, that among the samples with delayed processing, the library with the lowest efficiency of reads assigned to the annotation (40%) did not originate from the sample with lowest RIN (7.4), but both samples were processed after 24 h of preservation at 4 °C. Further interrogation of the relationship between RIN and library percentage of reads assigned to the annotation showed only a moderate correlation between these two metrics (Pearson’s r = 0.3799, P = 0.0610).

There was no difference (P > 0.05) for number of genes detected pre- or post-filtering in relationship to sampling location or delayed processing (Table 2, Supplementary Fig. S3 and S4). After filtering for lowly expressed genes (FPKM > 1 and CPM > 1 in five or more samples), we quantified transcript abundance for 12,414 protein-coding genes, followed by 287 long non-coding RNAs and 109 pseudogenes.

Differential transcript abundance based on sampling location and processing delay

First, we tested whether transcript abundance would be distinguishable based on the location of sampling. The results show no difference (FDR > 0.05) between transcripts from samples obtained from coccygeal or jugular veins. Next, we assessed the consistency of transcript abundance within each animal by calculating the correlation of the transcript abundance between the two sources of sampling. The Pearson’s correlation coefficients were greater than 0.99 for all subjects. Both results convergently show no variation in transcript abundance within subject based on sampling source (Fig. 2). Thus, mRNA quantitation data collected from liquid biopsies are consistent regardless of which vein is used for sampling.

Figure 2
figure 2

Transcript abundances obtained from the jugular and coccygeal veins from different subjects. The values presented are variance stabilizing read counts obtained from the “DESeq2” package55.

Second, we asked if the transcript abundance in PWBCs would change if blood samples remained stored at 4 °C for different periods of time, relative to the processing of the blood samples within one hour of collection. There was no differential transcript abundance between the samples stored for three or six hours at 4 °C relative to the samples processed within one hour of collection (FDR > 0.05, Fig. 3A).

Figure 3
figure 3

Comparison of transcript abundance in PWBCs from blood samples processed after prolonged storage at 4 °C. (A) M-A plots of the contrasts between each of the prolonged storage times versus samples processed within one hour. Gray shapes indicate genes whose transcript abundance are not significantly altered following the preservation of blood samples (FDR > 0.05). Squares indicate genes whose transcript abundance were significantly altered (FDR < 0.05) based on the DESeq2 algorithm. Triangles indicate genes whose transcript abundance were significantly altered (FDR < 0.05) based on the edgeR algorithm. Circles indicate genes whose transcript abundance were significantly altered (FDR < 0.05) based on both DESeq2 and edgeR algorithms. (B) Transcript abundance for genes with differential abundance in both the ‘8 h vs 1 h’ and ‘24 h vs 1 h’ contrasts.

By comparison, we identified four and 515 genes with differential transcript abundance between samples stored for eight and 24 h, respectively, at 4 °C relative to the samples processed within one hour of collection (Fig. 3A). Notably, the four genes detected in the ‘8 h vs 1 h’ contrast were also detected in the ‘24 h vs 1 h’ contrast with higher transcript abundance in the preserved samples relative to those processed within one hour of collection (Fig. 3B). Furthermore, 291 and 224 genes presented greater and lower abundance, respectively, for the ‘24 h vs 1 h’ contrast (please see Supplementary Table S1 for the lists of genes, and Supplementary Fig. S5 for the individual graphs of transcript abundance for all genes for the contrast ‘24 h vs 1 h’). These results show that storage of blood samples for ≥ 8 h prior to cryopreservation of PWBCs causes significant changes in the transcriptome profile.

Analysis of the relationship between the decline in transcript abundance and mRNA coverage

Considering the results of differential transcript abundance, we asked if lower values of FPKM for transcripts in the samples processed after 24 h of storage at 4 °C were caused by reduced transcript coverage, which can be indicative of RNA degradation28. First, we inspected whether there was a global trend of transcripts to have prominent reduced coverage in one of the extremities (3’ or 5’). Coverage plots for the 224 genes with lower transcript abundance at 24 h of delayed processing showed a relative nucleotide sequencing depth similar to the 12,264 genes that were not differentially abundant (Fig. 4A, B).

Figure 4
figure 4

Comparison of overall transcript sequencing coverage from samples that were processed within 1 h and 24 h after blood collection. (A) Genes with no change in transcript abundance 24 h of storage at 4 °C. (B) Genes with lower transcript abundance after 24 h of storage at 4 °C. (C) Plots of the Delta D-statistic for all genes separated by their results of the contrast ‘24 h vs 1 h’.

We further tested whether the nucleotide coverage of the 224 genes with lower transcript abundance at 24 h of delayed processing was statistically different from distribution of the same genes observed at 1 h of processing (Fig. 4B). First, we calculated the Kolmogorov–Smirnov D-statistic29 for the relative nucleotide coverage of genes from samples obtained from the jugular and coccygeal veins (both processed within 1 h of blood collection), which we referred to as (\({D}_{j,c, 1hr}\)). We also calculated the D-statistic for the relative nucleotide coverage of genes from samples obtained from the jugular vein processed at 24 h and 1 h post-sampling, which we referred to as (\({D}_{j,24h, 1hr}\)). Next, we calculated the difference between the two statistics (\(Delta \;D=({D}_{(j,24h, 1hr)}{-D}_{(j,c, 1hr)})\)). We reasoned that, for a given gene, \(Delta \;D\) would approximate to zero if the variation in the sequencing coverage was similar between the samples processed at different times (24 h vs 1 h) and the samples processed at the same time (1 h). Indeed, only seven out of the 224 genes (24 h < 1 h) had \(Delta \;D\) within the range of -0.25 and 0.25 (Fig. 4C, left plot). Furthermore, the range of \(Delta \;D\) calculated for the genes with lower transcript abundance at 24 h of storage (24 h < 1 h) was within the range of \(Delta \;D\) calculated for the genes with no transcript variation with the passing of 24 h post-collection (Fig. 4C, center plots). Altogether, these results provided strong evidence that the overall sequencing coverage of transcripts was similar between samples processed after 24 h storage at 4 °C and within one hour of sampling.

Gene ontology enrichment analysis of differentially expressed genes

Because we did not observe a systematic reduction in transcript coverage, we reasoned that the differential transcript abundance was a cellular regulatory response to the preservation of blood samples ex vivo. It was noteworthy that three out of four genes with greater transcript abundance at ‘8 h vs 1 h’ (H1-4, H2AC10 and H4C3) were involved in chromatin configuration, specifically annotated with the gene ontology terms ‘nucleosome assembly’ (H1-4 and H4C3), and ‘chromatin silencing’ (H2AC10).

Further interrogation of the 291 genes that had greater abundance at ‘24 h vs 1 h’ also revealed an enrichment of the category ‘nucleosome assembly’ with a series of histone related genes (H1-4, H2BC12, H2BC13, H2BC14, H2BC18, H2BC4, H2BU1, H4C14, H4C3, H4C4 and H4C8, fold enrichment = 9.78, Fig. 5A, please see Supplementary Table S2 for a complete list of categories and gene annotation). All these genes overlapped with the molecular function ‘DNA binding’ (BSX, DNMT3B, H1-4, H2AC21, H2AC6, H2AW, H2BC12, H2BC13, H2BC14, H2BC18, H2BC4, H2BU1, H3C6, H4C14, H4C3, H4C4, H4C8, MSH5, PROX2, SNAPC4, TEAD3 and TERT, fold enrichment = 1, Fig. 5B, Supplementary Table S3). Other biological processes significantly enriched were ‘neutrophil chemotaxis’ (fold enrichment = 7.34), ‘cell adhesion’ (fold enrichment = 3.19), and ‘transport membrane’ (fold enrichment = 2.4) (Fig. 5A, Supplementary Table S2).

Figure 5
figure 5

Gene ontology categories enriched in the genes with greater transcript abundance at 24 h versus one h. (A) Biological processes and (B) Molecular functions. To improve readability, only categories with more than five genes are displayed on the graphs, please, see Supplementary Table S2 and S3 for a full list of categories and the associated genes with their annotation.

We also asked if there was enrichment of gene ontology categories within the 224 genes that had less transcript abundance after 24 h of storage at 4 °C, and there were several categories significantly enriched (FWER < 0.05, Fig. 6A, please see Supplementary Table S4 for a complete list of categories and gene annotation). Notably, there were a series of signaling related categories such as ‘positive regulation of interferon-gamma production’, ‘positive regulation of interleukin-8 production’, ‘negative regulation of interferon-gamma production’, ‘positive regulation of ERK1 and ERK2 cascade’, ‘positive regulation of MAPK cascade’ ‘positive regulation of interleukin-1 beta production’, ‘positive regulation of interleukin-12 production’, ‘positive regulation of interleukin-6 production’, ‘positive regulation of NF-kappaB transcription factor activity’, ‘positive regulation of NIK/NF-kappaB signaling’, ‘positive regulation of peptidyl-tyrosine phosphorylation’, and ‘positive regulation of phosphatidylinositol 3-kinase signaling’.

Figure 6
figure 6

Gene ontology categories enriched in the genes with lower transcript abundance at 24 h versus one h. (A) Biological processes and (B) Molecular functions. To improve readability, only categories with more than five genes are displayed on the graphs, please see Supplementary Table S4 and S5 for a full list of categories and the associated genes with their annotation.

Also within the 224 genes that had less transcript abundance after 24 h of storage at 4 °C, there was significant enrichment of a series of categories involved in regulation of transcription and gene expression (FWER < 0.05, Fig. 6A, Supplementary Table S4), such as ‘positive regulation of transcription by RNA polymerase II’, ‘regulation of transcription, DNA-templated’, ‘regulation of transcription by RNA polymerase II’, ‘positive regulation of gene expression’, and ‘positive regulation of transcription, DNA-templated’.

In parallel with the identification of the significant enrichment of the categories involved in regulation of signaling and gene expression, the test for enrichment of molecular functions identified that many of those 224 genes were associated with functions that involve interaction with DNA to regulate gene expression, such as ‘RNA polymerase II cis-regulatory region sequence-specific DNA binding’, ‘DNA-binding transcription factor activity, RNA polymerase II-specific’, and ‘DNA-binding transcription factor activity’ (Fig. 6B , Supplementary Table S5).

Discussion

The main purpose of our study was to understand the dynamics of RNA degradation and the consequences of this RNA degradation on the quantification of transcript abundance in PWBCs from samples stored in the fridge (4 °C). We collected multiple samples from the same subject and proceeded with a strategic delay in the processing of samples, followed by immediate cryopreservation of PWBCs. Our methodical interrogation of the RNA quality and systematic analysis of transcriptome data lead us to identify critical factors related to the short-term preservation of blood samples for RNA analysis: (i) the vein used for sampling blood is not a source of significant and systematic changes in the transcriptome profiling of PWBCs; (ii) storing blood samples under refrigeration for 24 h does reduce their RIN values by approximately one unit, however the drop in RIN values does not interfere with the quantification of transcripts from protein-coding genes or long non-coding RNAs produced in PWBCs; (iii) even if blood samples are refrigerated, the abundance of gene transcripts produced in PWBCs starts to drop irregularly as early as three hours past blood sampling, but changes are consistent across samples after eight hours of refrigeration; and (iv) the transcriptome of PWBCs is severely altered after blood samples are refrigerated for 24 h post-collection.

According to our hypothesis, we expected that storage of blood tubes at 4 °C for a long period of time would reduce the RNA quality through degradation. Indeed, there was a reduction in RIN values from RNA obtained from PWBCs after blood samples were preserved at 4 °C for 24 h (from 8.52 ± 0.37 at 1 h to 8 ± 0.37 at 24 h). The relatively high values of RIN after the preservation of blood samples at 4 °C for 24 h are similar to RIN values reported elsewhere23. However, the results observed from the RNA-sequencing did not show indication of reduced RNA quality. The values of the 3’/5’ bias for all libraries ranged from 0.47 to 0.56, with no effect of the processing time on the averages. In samples with degraded RNA, there is a bias towards transcript coverage on the 3’ end, whereas samples with 3’/5’ bias values close to 0.5 have balanced coverage of RNA extremities and are only observed in samples with high RNA quality30. Thus, there was no systematic coverage bias towards the 3’ end of polyadenylated transcripts in our samples.

Also based on our hypothesis, we anticipated that transcripts with significantly lower quantification would be a consequence of RNA degradation following a period of storage of blood samples at 4 °C. Here we reasoned that coverage plots for the 224 genes with lower abundance at 24 h in the contrast ‘24 h vs 1 h’ would be distinct between the libraries produced from the 24 h group versus the 1 h group. Contrary to our expectation, the coverage charts (Fig. 4A) showed a virtually identical coverage of transcripts with significantly lower quantification whether on samples processed within one hour or 24 h of collection. Furthermore, a comparison of the distributions using the Kolmogorov–Smirnov test confirmed no significant changes in transcript coverage based on the amount of time that samples were preserved.

A possible explanation for the discrepancy between the significantly lower values of RIN for samples preserved for 24 h and the consistent sequencing coverage across transcripts is on the source of data. The RIN values are computed based on data collected from a series of features of an electropherogram, most of which involve information from ribosomal RNAs (5S, 5.8S, 18S and 28S)31. On the other hand, RNA-sequencing libraries were produced with enrichment of polyadenylated transcripts, and thus, the results from 3’/5’ bias nor coverage plots do not account for ribosomal RNA. Our results indicate that, although a correlation between transcript coverage and RIN values have been identified28,31, this relationship may be prominent in samples with RIN values less than 8.

Our results show that there is a prominent systematic alteration of transcript abundance in PWBCs when blood samples are preserved for 24 h at 4 °C, which is aligned with previous reports23,26. Interestingly, we determined that this alteration of transcript abundance across individuals starts as early as eight hours post-collection.

Because we could not find indication that RNA degradation was a cause of these alterations, we reasoned that the alteration in transcript abundance was a consequence of the PWBCs responding to the cold temperature (4 °C) and lack of oxygen. The consequences of long-term exposure of mammalian cells at 4 °C have not been well studied, but Al-Fageeh and Smales32 proposed that the active transcription of a selected group of genes would cause a wide-spread reduction in transcription activity. Well-aligned with this possible mechanism, three out of four genes up regulated in PWBCs after the storage of blood samples for eight hours at 4 °C have a role in chromatin organization, including nucleosome assembly, which can be related to a compaction of the chromatin and reduction in transcriptional activity.

The alteration of transcript abundance in PWBCs after storing blood samples at 4 °C for 24 h has been observed before26. However, our genome-wide transcriptome analysis shows that the changes are more prominent after 24 h of storage of blood samples at 4 °C. The genes with greater transcript abundance after 24 h of storage of blood samples at 4 °C relative to those processed within one hour of collection seem to be enriched for few biological processes and again with a high enrichment for genes involved in nucleosome assembly. It is possible that the cells increase the transcription of histone related genes to increase the genome-wide compaction of chromatin. The greater number of biological processes enriched for genes with lower transcript abundance after 24 h of storage of blood samples at 4 °C relative to those processed within one hour of collection corroborate the notion of a global silencing in transcription.

Considering the results of significant differential transcript abundance observed in the present study, we reasoned that the prolonged storage of blood samples at 4 °C would be relevant for investigations searching for mRNA markers in PWBCs. The overlap of our results with transcript abundance of genes also expressed in PWBCs and previously associated with fertility in heifers20,21 identified two genes (NKG2A, PPP1R3B, Supplementary Fig. S6) whose analysis of differential transcript abundance would have been compromised by storage of blood samples at 4 °C for eight hours or longer. These results strongly indicate that blood samples collected for studies of mRNA biomarkers should: (i) be preserved on ice as soon as they are collected and processed as early as possible, preferably within six hours of collection, for the proper cryopreservation of PWBCs, or (ii) if possible, collected in tubes that allow for the immediate preservation of RNA transcript abundance in the whole blood. However, we note that the chemical or cryopreservation of whole blood for RNA extraction requires further depletion of hemoglobin transcripts if the samples will be used for RNA-sequencing33,34.

The transcriptome of PWBCs changes after blood sampling, even if the samples are refrigerated. A systemic alteration is detected at eight hours post blood collection and follows a pattern where PWBCs increase the transcription of genes related to chromatin compaction. This compaction is likely to reduce the transcription of several genes that function across multiple cellular processes in PWBCs. It is evident that this alteration in transcriptome profiles after prolonged storage can mask the transcriptome signature of a specific physiological phenomenon.

Our findings can be used as a guide for the establishment of protocols for blood processing when samples are supposed to be used for genome-wide quantification of transcripts in PWBCs. Blood samples collected for mRNA-based biomarker discovery should be refrigerated immediately and processed within six hours post-sampling. This recommendation can be considered by investigators working in diverse several areas of life sciences.

Methods

The reporting in this study follows the recommendations in the ARRIVE guidelines35. Please see Supplementary table S6 for catalog number of kits and reagents used in this work.

Animal handling and sample collection

All animal handling and use was approved by the Institutional Animal Care and Use Committee (IACUC) at Virginia Tech. All procedures involving animal handling were performed in accordance with IACUC guidelines and regulations.

Eleven crossbred beef heifers (Angus x Simmental cross), averaging 14 months of age, located at Kentland Farm (Virginia Tech, Blacksburg, VA) were subjected to estrus synchronization. On day zero we administered an intramuscular injection of gonadotrophin-releasing hormone (GnRH, 100 μg; Factrel®; Zoetis Incorporated, Parsippany, NJ) and inserted a controlled internal drug release (CIDR, 1.38 g Progesterone; Eazi-Breed™ CIDR®; Zoetis Inc.) device in each heifer. On day seven we removed the CIDR insert and administered an intramuscular injection of prostaglandin F2alpha (PGF2α, 25 μg; Lutalyse®; Zoetis Inc.), which was followed by a second injection of GnRH on day ten of the protocol. We used estrus synchronization to mitigate possible effects that the stages of the estrus cycle may have on gene expression36.

We collected blood samples from heifers that expressed estrus (n = 5) at the time artificial insemination would normally be performed. Fifty mL of blood were sampled from the jugular vein and 10 mL from the coccygeal vein of each heifer using vacutainers containing 18 mg K2 EDTA (Becton, Dickinson, and Company, Franklin Lakes, NJ). Each tube was inverted several times to prevent blood coagulation and placed on ice immediately until processing.

Experimental design and blood processing

Blood tubes were sprayed thoroughly with a disinfectant (Lysol®) prior to storage. While on ice, tubes containing samples from the jugular vein were randomly assigned into five groups: 1 h, 3 h, 6 h, 8 h, and 24 h, which correspond to the time the samples remained at 4 °C prior to processing. We processed blood samples from the coccygeal vein in group 1 h for comparison of gene expression with samples from the jugular vein.

The buffy coat was separated from whole blood by centrifugation for 20 min at 2000xg at 4 °C. The buffy coat of each sample was aspirated and dispensed into 14 mL of a red blood cell lysis buffer solution (1.55 M ammonium chloride, 0.12 M sodium bicarbonate, 1 mM EDTA, Cold Spring Harbor Protocols). The mixture was gently mixed on a rocker for 10 min at room temperature, and then centrifuged for 10 min at 800xg at 4 °C. The supernatant was removed, and each sample was mixed with 200 µL of TRIzol™ Reagent (Invitrogen™, Thermo Fisher Scientific, Waltham, MA). The mixture of TRIzol™ and PWBCs was transferred into cryotubes (Corning Incorporated, Corning, New York) and then snap frozen in liquid nitrogen prior to storage at  − 80 °C20,21.

Total RNA extraction

Total RNA was extracted from the PWBCs using the acid guanidinium thiocyanate-phenol–chloroform procedure37,38, with the aid of Phasemaker™ tubes (Invitrogen™, Thermo Fisher Scientific, Waltham, MA), following the manufacturer’s instructions. Briefly, the samples were thawed on ice and 800 μL of TRIzol™ was added to each. Once homogenized, the mixture was transferred into Phasemaker™ tubes, where it was mixed with 200 μL of chloroform and centrifuged for 5 min at 12,000xg at 4 °C to complete phase separation. Next, the aqueous phase was collected into 1.7 mL microtubes and mixed with 0.5 μL of glycoblue. Then, 500 μL of 100% isopropanol was added to each tube and they were centrifuged for 10 min at 12,000xg at 4 °C to precipitate the RNA. The RNA pellet was collected and washed twice with 1 mL of 75% ethanol and centrifuged for 2 min at 7,500xg at 4 °C. Then, the RNA pellet was air-dried briefly and eluted in nuclease free water and maintained on ice for quantification and assessment of quality.

We quantified the total RNA concentration (A260) and purity (A260/A280 ratios) using a NanoDrop™ 2000 Spectrophotometer (Thermo Fisher Scientific, Waltham, MA). We also quantified the RNA using a Qubit RNA High Sensitivity Assay Kit (Invitrogen™, Thermo Fisher Scientific, Waltham, MA) assayed on a Qubit 4 Fluorometer (Invitrogen™, Thermo Fisher Scientific, Waltham, MA). Next, we evaluated the RNA integrity by assaying a sample on an Agilent 2100 Bioanalyzer (Agilent, Santa Clara, CA) using the Agilent RNA 6000 Pico Kit (Agilent, Santa Clara, CA).

Library preparation and high-throughput sequencing

We diluted the RNA samples to 1 ng/mL for library preparation and confirmed the concentration using a Qubit RNA High Sensitivity Assay Kit (Invitrogen™, Thermo Fisher Scientific, Waltham, MA) and Qubit 4 Fluorometer (Invitrogen™, Thermo Fisher Scientific, Waltham, MA). Five hundred ng were used as starting material for library preparation using the TruSeq® Stranded mRNA Library Prep (Illumina, Inc, San Diego, CA) and the IDT-ILMN TruSeq UD indexes. Sequencing was assayed in a NovaSeq 6000 sequencing platform (Illumina, Inc, San Diego, CA) using a NovaSeq 6000 SP Reagent Kit v1.5, to produce paired-end reads 150 nucleotides long. Preparation of libraries and sequencing assays was performed by staff at the Virginia Tech Genomics Sequencing Center.

Alignment of sequences and filtering

We removed the sequencing adapters using cutadapt (v. 2.8) and the sequences indicated by the manufacturer (Illumina, Inc, San Diego, CA). Next, we aligned the sequences to the cattle genome39,40 (Bos_taurus.ARS-UCD1.2.99) obtained from the Ensembl database41 using hisat2 (v. 2.2.042) with the –very-sensitive parameter. Using samtools (v. 1.1043), we filtered the alignment to remove unmapped reads, secondary alignments, alignments whose reads failed quality control, and duplicates. We then utilized biobambam2 (v. 2.0.9544) to mark and eliminate duplicates.

For the estimation of transcript coverage, we aligned the sequences trimmed from adapters to transcript sequences obtained from the Ensembl database41 with bowtie2 (v.2.4.245) using the –very-sensitive-local parameter.

Quantification of transcript abundance and gene filtering

We used featureCounts (subread v. 2.0.146) to count the fragments that matched to the Ensembl cattle annotation gene (Bos_taurus.ARS-UCD1.2.103). Genes annotated as protein coding, long non-coding RNA and pseudogene were retained. Following the calculation of counts per million (CPM) and reads per million per kilobase (FPKM) we retained genes that presented FPKM and CPM greater than one in five or more samples.

Quantification of library properties

We calculated the 3’/5’ bias in our libraries using RNA-SeQC (v. 2.4.230), and the proportion of reads assigned to annotation by dividing the number of reads mapped to the Ensembl annotation divided by the total number of reads sequenced.

Statistical analyses

RNA metrics (RIN, A280, A260 and A280/A260) and number of genes detected per library

We used paired Student’s t47,48 and Wilcoxon49 tests to access the null hypothesis of no difference between two sampling locations (H0jugular = μcoccygeal). Within the samples obtained from the jugular vein, we used a generalized linear mixed model to access the null hypothesis of no difference between groups of delayed processing (H0(T1h) = μ(T2h) = … = μ(T24h)). The model included time of processing (T(1 h, 3 h, 6 h, 8 h or 24 h)) as fixed effect and animal as random variable (A(1,2,3,4 or 5)). When the model indicated significance of the fixed effect (P < 0.05), we used the Z-test50 and the Dunnett’s approach51 for simultaneous tests for general linear hypothesis52,53 to compare the average of the groups T(3 h, 6 h, 8 h or 24 h) with the baseline T(1 h). Averages were inferred as statistically different when Bonferroni adjusted P < 0.05.

Library 3’/5’ bias, proportion of reads assigned to annotation and genes detected

We used a generalized linear mixed model, with a binomial family and a logistic regression function to access the null hypothesis of no difference between groups of delayed processing (H0(T1h) = μ(T2h) = … = μ(T24h)). The time of processing was included in the model as fixed effect and animal was set as random effect. Averages were inferred as statistically different when P < 0.05.

Differential transcript abundance

We compared the transcript abundance from samples obtained from the jugular and coccygeal veins by using a paired-sample structure (H0jugular = μcoccygeal). Next, we compared the transcript abundance from samples obtained from the jugular vein that were processed at different times. The analyses were performed with the R packages ‘edgeR’54 using the quasi-likelihood F-test and ‘DESeq2′55 using the Wald’s test. In the case of the delayed processing, we set up contrasts to compare the different processing times versus T1h (H0(T1h) = μ(T2h); ….; H0(T1h) = μ(T24h)). We adjusted nominal P values for multiple hypothesis testing using the Benjamini–Hochberg false discovery rate56. We assumed a difference in transcript abundance to be significant when FDR < 0.05 in the results obtained by both ‘edgeR’ and ‘DESeq2’ packages and absolute Log(fold-change) > 0.5. We utilized this approach to report robust results of differential transcript abundance independent of algorithm biases or limitations20,21,57,58.

Gene ontology enrichment analysis

We tested lists of genes for enrichment of gene ontology terms using the R package ‘GOseq’59 and the genes retained after filtering as a background list60,61. Nominal P values were adjusted for multiple hypothesis testing by family wise error rate62,63.

Contrasts of transcript coverage

We quantified the relative position of each nucleotide in relation to the total number of nucleotides in the transcript, given in percentage. In addition, we calculated the relative proportion of occurrence of each nucleotide in relation to the total coverage of the gene. Then, for each gene in different groups, in a pair-wise manner, we compared the relative position of each nucleotide weighed by the relative coverage using the weighted Kolmogorov–Smirnov test, as described elsewhere29.