Research into forensic transcriptome profiling has seen an exponential growth over the past decade and studies have demonstrated an increasing number of applications (for a review see Salzmann et al.1). Body fluid tests based on RNA markers have been successfully implemented as routine testing methods in forensic laboratories and provide additional information in cases where the source of the cell could be crucial2 while applications such as the determination of the age of stains3, the age of the donor4 and wound healing5 are also of interest to the community. RNA analysis of postmortem (PM) tissues, also called thanatotranscriptomics, can be used for multiple forensic purposes such as organ identification6,7,8,9 and post-mortem interval (PMI) estimation. In forensic medicine, the PMI is the time elapsed from the Time of Death (ToD) of an individual until the discovery of the body. Time of death estimation from PMI is a daily task of forensic pathologists, most crucial when a possible murder victim is found. In those cases, there is a need to estimate ToD quickly and accurately, since it can determine the course and success of a criminal investigation10,11.

Studies using mouse, zebrafish and human tissue samples from Genotype-Tissue Expression (GTEx) database suggest that some biological processes (BP) are still active up to 48 h after the death of an individual12,13,14, either because of the activation of regulatory genes that allow the transcription of genes that were not active before13 or because of gene regulation through induced changes in chromatin structure12. Other studies involving human tissue samples have demonstrated a bi-modal wave in PM prostate tissues initiated by over-expressed anti-apoptotic genes followed by apoptosis inducing genes15 or the simultaneous up-regulation of pro-apoptotic and down-regulation of anti-apoptotic genes in PM liver tissues16. Such studies have provided insight into PM gene regulation and have demonstrated the potential of the analysis of PM gene expression in forensic cases.

Exploiting PM processes have been the basis for PMI predictions in forensic investigations. The most widespread techniques are based on algor, rigor and livor mortis17,18. As these methods are based on the physiological state of a body the accuracy is dependent on the degree of body decomposition. At the same time, the body decomposition rate is affected by many factors, such as environmental conditions, insects, body weight or clothing19. Furthermore, the longer the PMI, the higher the inaccuracy produced by any ToD prediction method (e.g.: using the body temperature of a corpse is only useful until it reaches the ambient temperature20). An alternative, molecular method has been on the wish list of forensic scientists for decades. In particular RNA degradation has been a topic of interest for predicting PMI using the expression patterns of single genes (for reviews see1,21,22). For example, RNA degradation as measured through the expression levels of fatty acid synthase-messenger RNA (FASN) in whole blood and brain samples showed a significant correlation with the storage interval of the blood samples and with PMI in autopsy cases23. In another study a PMI prediction model was developed based on \(\beta\)-actin gene expression in rat brain while taking temperature factors into account24. Partemi and colleagues, on the other hand, showed that transcript expression of GUSB, COL1A1 and COLIII in the human heart was independent from PMI while NOS3 gene expression was found to be down-regulated with longer PMI (> 24 h)25.

The current study is, to our knowledge, the first to perform a transcriptomic analysis on blood obtained from a longitudinal sampling procedure on deceased human individuals. The aims of this study were to (1) identify up or downward clusters or patterns in the human blood thanatotranscriptome; (2) perform gene ontology (GO) analysis on the identified gene clusters; and (3) to develop a PMI prediction model using generalized linear regression based on PM gene expression patterns.

Figure 1
figure 1

General pipeline of the study. After quality control and filtering of the initial transcripts, different methods were used to check data clustering. ANOVA-Dunnett’s tests and a regression analysis were performed in parallel to further analyze the 2 clusters of up- and down-regulated genes. Results were filtered with a power analysis and the common genes identified with both methods were then used as input for a GO enrichment analysis and to train a PMI prediction model.

Figure 2
figure 2

Dotplot of the sample distribution, per individual, over the time period covered. Each horizontal line represents a single individual (right y-axis), and each dot, a sample. The histogram represents the number of samples per bin in the ANOVA-Dunnett analysis. The early and late PMI bins are indicated in blue.


A summary of the analysis pipeline can be found in Fig. 1. A total of 54 samples were collected from 7 individuals, 5 men and 2 women with ages ranging from 56 to 89 years old and PMI ranging from 2 h 21 min to 37 h 50 min (Fig. 2). The samples were distributed into bins in order to compare the mean expression values of 10,635 RNA transcripts of each postmortem blood sample. Each bin spanned 6 h, except for the first bin, which covered an interval of 8 h. The number of samples in each bin, from lowest to highest PMI were 6, 10, 13, 13, 7 and 5 samples, respectively (Fig. 2).

Figure 3
figure 3

PCA graphical representation of the variables of the model using significant transcripts after the ANOVA-Dunnett test and power analysis filtering. Principal Components (PC) 1 and 2 represent 96.2% of the variability of the data and oppose two clear clusters in relation to the Dim1 axis, that explains most variability (74.1%). The positive cluster in relation to this axis corresponds with the up-regulated cluster (Fig. 4a) while the transcripts included in the negative axis according to the Dim1 axis correspond to the ones that decrease their expression after death (Fig. 4b).

Figure 4
figure 4

Mean of the expression value of the transcripts included in (a) up-regulated cluster and (b) down-regulated cluster shown in Fig. 3 after the ANOVA-Dunnett’s test and power filtering. For each transcript, the mean expression in every bin is represented. Transcripts included in the first cluster (227 transcripts) show an increase in expression after death while transcripts included in the second cluster (108 transcripts) show a decrease in expression over time. The black line represents the mean of all transcripts per bin. Colors show whether those transcripts were also identified using linear regression (blue) or not (red). Only blue lines represent the input for downstream analyses.

Figure 5
figure 5

Venn’s diagram for significant transcripts after power filtering (a) Up-regulated transcripts both for ANOVA-Dunnett (n = 227) and linear regression analysis (n = 499). (b) Down-regulated transcripts also identified with ANOVA-Dunnett (n = 108) and linear regression analysis (n = 465). Only common genes between both methods (n = 99 and n = 89) were used in downstream analyses.

Identification of PM gene expression patterns

Clustering of the PM gene expression data was suggested by the Hopkins statistic (H = 0.60). ANOVA with a Dunnett post-hoc test between the early (2 h 21 min–8 h 00 min) and the late PMI bin (PMI 34 h 00 min–40 h 30 min) identified 227 up-regulated and 108 down-regulated transcripts after power analysis (see Supplementary spreadsheet 1). PCA of the these transcripts showed two separated and clearly opposed clusters while 3 transcripts clustered together but were separated from the two main groups (Fig. 3). The mean expression of those two clusters confirmed our initial hypothesis of up- and down-regulated transcripts after death (Fig. 4a,b). Moreover, the 3 independent arrows (corresponding to genes GBF1, ABCB8 and TAOK3) also showed decreased expression after death. These transcripts were therefore included in the down-regulated cluster of transcripts for downstream analysis. On the other hand, linear regression analysis identified 499 up-regulated and 465 down-regulated transcripts after power filtering (see Supplementary spreadsheet 2) . Finally, 99 up-regulated transcripts (Fig. 5a) and 89 down-regulated transcripts (Fig. 5b) were identified with both methods as they showed a continuous change in expression (Supplementary Tables S1 and S2) with a significant difference between the early (2 h 21 min–8 h 00 min) and late PMI bins (PMI 34 h 00 min–40 h 30 min). Supplementary Figures S2a and S2b show the patterns of the mean expression of the transcripts included in the up-regulated and down-regulated clusters, respectively while examples of gene expression patterns of the top three up- and down-regulated patterns are depicted in Supplementary Figures S3A and S3B.

GO enrichment analysis

GO enrichment analyses were performed for both significantly up-regulated and down-regulated transcripts. The highest enriched processes within the up-regulated transcripts were the establishment of mitochondrion localization, the positive regulation of insulin receptor signaling and nucleotide excision repair (Table 1). The significant genes included in each of the top enriched processes are listed in Table 2. A representation of the expression of these genes can be seen in Fig. 6. On the other hand, processes involved in the death-inducing signaling complex assembly, the positive regulation of macrophage differentiation, the toll-like receptor 3 signaling pathway and the regulation of necrotic/necroptotic cell death were most enriched among the down-regulated genes (Table 3). These enriched GO terms were represented through the same 3 genes (CASP8, RIPK1, and FADD).

Table 1 Top enriched pathways, ordered by decreasing odds ratio, with their corresponding GO term identifier and pathway. The input data were the differentially expressed transcripts in the up-regulated cluster according to the ANOVA-Dunnett’s test and regression (99 genes). The expected count and the actual number of genes of the input that are included in a GO term are those used in the statistical analysis, in order to calculate the p-value. Size is the total number of genes included in that particular GO term.
Table 2 Top 6 enriched processes found with the GO enrichment analysis using up-regulated transcripts in the late PMI bin (PMI 34 h 00 min–40 h 30 min) compared to the early (2 h 21 min–8 h 00 min) PMI bin. Term ID correspond to the pathway ID and the genes are the ones involved in that specific biological process, also found in the cluster.
Figure 6
figure 6

Top 6 processes enriched by genes found to be significantly expressed in the up-regulated cluster between the early (2 h 21 min–8 h 00 min) and the late PMI bin (PMI 34 h 00 min–40 h 30 min). For each process, the mean expression, per bin, of every gene involved is represented.

PMI prediction modelling

The third objective of this paper was to develop a PMI prediction model from the transcript expression data obtained using the current longitudinal study. The mean RMSE obtained from a 10-repeat 5-fold CV for all alpha values tested ranged from 4.75 to 5.56 h (Table 4). The number of transcripts used in the models ranged from 16 to 189 (all the transcripts identified in common between ANOVA and regression plus the intercept). The lowest RMSE of 4.75 h was obtained using 30 genes (alpha = 0.3) (Table 5). The mean of their coefficient values ranged from − 0.75 to 1.11 and those transcripts with a positive coefficient were found to be included in the up-regulated cluster and those with a negative slope were included in the down-regulated cluster. Predicted vs. expected PMI values were compared between models using 188 vs. 30 transcripts and showed an overall overestimation of lower PMI values and an underestimation of late PMIs after 20 h PM (Fig. 7, Table 6).

Table 3 Top enriched pathways, ordered by decreasing odds ratio, with their corresponding GO term identifier and pathway. The input data were the differentially expressed transcripts in the down-regulated cluster according to the ANOVA-Dunnett test and regression (89 genes). The expected count and the actual number of genes of the input that are included in a GO term are those used in the statistical analysis, in order to calculate the p-value. Size is the total number of genes included in that particular GO term.
Table 4 Summary of the results of the 10-repeat 5-fold predictions with glmnet. The results represent the mean of 10 repetitions. alpha represents the elastic net factor, RMSE is the mean RMSE found in each of the 10 repetitions, SD is the mean SD found in each repetition, Num. factors is the number of transcripts (plus the intercept) used to calculate each model and the lambda is also a mean between the 10 repetitions.
Table 5 Coefficients of the contributing transcripts when the prediction model was performed with alpha = 0.3. Shown are the 30 transcripts that were present in all the repetitions and the coefficient is the mean of the 10 repetitions. All transcripts were found as significant with the linear regression method, after power. Transcripts with a positive slope were included in the up-regulated cluster and transcripts with a negative slope were included in the down-regulated cluster.
Figure 7
figure 7

Actual vs. predicted PMI values for alpha 0 (using all genes, red color) and 0.3 (lowest RMSE obtained, blue color) in the glmnet 10-repeat-5-fold CV test. The linear regression of both alpha values is also shown, together with the formula and the R2 value.

Table 6 Mean of the 10-repeat prediction values for alpha = 0 and alpha = 0.3. For each sample, the known PMI is shown and then the mean PMI predicted in each of the 10 repetitions with alpha = 0 and alpha = 0.3.


PM gene expression patterns

Our data are the first (to our knowledge) to report on a longitudinal human thanatotranscriptomic study. PM human blood samples were collected from individuals with a known ToD over a period of maximum 38 h after death with a regular time interval at constant room temperature. The gene expression data was produced by 3\('\) end mRNA-seq using a standard sequencing depth (5 million reads per sample) after globin RNA depletion. This strategy aimed to detect more highly expressed transcripts even when the 5\('\) end of the transcripts is not available any more due to degradation. Because the sequencing was done towards the 3\('\) end of the mRNA shorter reads were detected more efficiently, and the number of reads aligned per transcript did not depend on the original transcript length. Furthermore, the globins eclipsing effect was avoided by filtering in the library preparation steps. As decomposition is known to start almost immediately after death as cells initiate the process of autolysis for self-destruction through enzymatic digestion26 , rapid changes in gene expression can be expected due to RNA degradation. Transcriptome sequencing of the PM blood samples collected in the current study did indeed identify genes that showed a continuous decrease after death. Interestingly, genes were also identified that showed a clear up-regulated pattern. Similar patterns have been observed previously in several animal13,27 and human tissues9,12,15,16 but this is the first time these patterns were demonstrated in a longitudinal human study.

Three individual transcripts (ABCB8, GBF1, TAOK3) clustered together but were separated from the two main clusters in the PCA plot (Fig. 3). These three transcripts showed decreased expression over time and were therefore included in the down-regulated cluster for downstream analysis. We hypothesize that these three transcripts cluster separately as the decreasing pattern is different from the other genes (Fig. 4). The ABCB8 gene encodes an ATP-binding cassette protein and is essential for iron homeostasis in mitochondria28, possibly by acting as an iron exporter29. The GBF1 gene produces a guanine nucleotide exchange factor and is localised in the Golgi apparatus. It regulates vesicular trafficking and mitochondrial positioning in cells in a microtubule-dependent manner30. TAOK3 encodes for a serine/threonine kinase and was demonstrated to play a role in insulin resistance31. The different clusters identified through PCA suggest that transcripts might have different susceptibilities to degradation or are actively regulated in the dying cell. In order to find out which hypothesis would be more likely, a GO enrichment analysis was performed.

GO enrichment analysis

Gene ontology enrichment analysis was performed in order to shed light on the processes that are up- and down-regulated after death. Up-regulated genes comprised several processes involved in the regeneration of cells. The nucleotide excision repair (NER) pathway was part in 3 out of the top 6 enriched pathways and is essential in repairing damaged DNA at single nucleotide level. An early rise in postmortem oxidative stress was previously observed in PM muscle tissue obtained from adult male Wistar rats27. Oxidative stress as measured through the levels of intracellular reactive oxygen species (ROS) and reactive nitrogen species (RNS) are well known to induce DNA damage via oxidation and DNA strand breaks32. As a response to the increase in cellular damage the cell might therefore initiate the NER pathway to reduce or minimize the damage done. A second highly enriched GO term was protein demannosylation. While mannosidic glycoepitopes are vital for maintaining proteostasis33, aberrant glycosylation patterns have been hypothesised to be part of stress-induced or danger signals and thereby reflect the cell’s phenotypic status to incite an immunological response34. Finally, the process of mitochondrial localization has a role in the maintenance of DNA within mitochondria by binding the nucleoids (containing the mitochondrial DNA) to the inner membrane35. As an example, optic atrophy protein 1 (OPA1) has been demonstrated to have a role in mitochondrial inner membrane fusion and knockdown of OPA1 resulted in mitochondrial fragmentation, potential dissipation, and cristae disorganization, all of which were associated with cytochrome-c release located at the inner membrane and caspase-dependent apoptotic cell death36,37,38.

Multiple processes involved in cell death were enriched among the down-regulated genes but were represented through 3 common genes: CASP8, RIPK1 and FADD (Table 3). Cell death in mammals can be activated through two distinct pathways: (i) the intrinsic or mitochondrial pathway and (ii) the extrinsic pathway through cell death receptors39. Caspase-8 (CASP8) is involved in programmed cell death as a proapoptotic protease. The Death Effector Domain of FADD, a signal transducer downstream of cell death receptor CD95 (also called Fas), binds to the N-terminal prodomain of caspase-8 resulting in cell death40,41. Both FADD and CASP8 are essential during murine embryogenesis while cells in mice deficient in FADD or CASP8 are resistant to death receptor-induced apoptosis42. CASP8 has also been shown to interact with RIPK1 to induce apoptosis43,44,45. Whereas apoptosis is mediated by the caspase-pathway , necroptosis is a form of regulated necrotic cell death that can be activated under apoptosis-deficient conditions46. RIPK1 can play a role in both processes depending on the cell type and context47,48. CASP8 and RIPK2, but not RIPK1, was also shown to be down-regulated by Javan and colleagues16 in PM liver samples. In the same study FADD, however, was up-regulated.

Gallego Romero and colleagues suggested that actively mediated degradation of transcripts may occur during necrosis; namely, that degradation of RNA in a dying tissue may not be a completely random process and that the relative importance of stochastic degradation may increase such that at later time-points degradation becomes increasingly uncoupled from biological function49. Ferreira and colleagues stated that their pathway analysis suggested active regulation12, whereas Pozhitkov and colleagues considered that their data suggested a step-wise shutdown after death13. Pozhitkov and colleagues detected an increase of developmental genes13 while Ferreira and colleagues detected the deactivation of the immune system and an increase of processes related to blood coagulation and responses to stress12.

Taken together, our results suggest that soon after death cells actively switch to survival mode through the activation of DNA repair pathways and suppress apoptosis and necroptosis-related pathways and that mitochondria take a lead role in these PM processes. Further studies with larger data sets might reveal additional PM gene expression patterns besides the ones uncovered in the current study. These patterns might also shed more light on RNA degradation rates of specific transcripts.

PMI prediction

The third objective of this report was to develop a prediction model based on PM gene expression data to estimate the PMI. We were able to develop a model with an RMSE of 4.75 h using a small number of transcripts (n = 30). Figure 7 shows that there is an overestimation of PMI during the first hours after death, while after 20 h PM the predictions tend to underestimate the variable. Other studies have previously reported the use of RNA-seq data in order to predict PMI from gene expression after death. Ferreira and colleagues obtained an R2 of 0.77 between observed and expected PMI when they used the information of all the tissues available in the GTEx project12. They do not show a correlation parameter only for blood, however, we do consider that obtaining an R2 = 0.741 (when alpha = 0.3), despite the limited number of samples, supports the hypothesis that this type of data holds great promise for PMI prediction modeling. Whether the model might be valid for younger individuals is currently unknown, but as specific mRNA markers are associated with age and can be thus used to estimate age, this would be an interesting topic of future research50.

Limitations of the study

The experimental design used in this study has some critical advantages and disadvantages that need to be taken into account when interpreting the results of the study. Due to the restrictions of a known ToD only a limited number of participants were able to be recruited in the study. In addition, the PM blood coagulation process starts as soon as the blood flow ceases and reaches the dependent parts of the body after 6–8 h51. This process is dependent on many factors, such as the amount of available blood in the veins, and has hampered the collection of samples in individual 7 where only 3 PM blood samples were able to be collected. For the same reason the maximum incubation time of 36 h was chosen as blood is rarely available for collection through the needle aspiration collection process after 36 h.

In our study bodies were kept at room temperature for up to 38 h PM. It is currently unknown what the effect is of either the endogenous or exogenous microbiome on the thanatotranscriptome and therefore also on the prediction accuracy of the PMI model. Evidence for a healthy blood-microbiome is still sparse52 but translocation of intestinal bacteria to extra-intestinal locations such as cardiac blood has been observed in sacrificed mice after 5 min53. Future research could investigate the effect of migrating microorganisms and their effect on the thanatotranscriptome of the host while taking care to avoid contamination from other locations than the intended sampling site by employing validated decontamination procedures.

Because the study was carried out under a controlled environment with a high age group and limited PMI, the effect of a changing environment, age and late PMI (> 38 h) on gene expression patterns could not be evaluated and therefore restrict the suitability of the model for real-life cases.


This study reports on a novel approach to increase our understanding of gene transcription after death using a longitudinal experimental design whereby multiple human PM blood samples were collected at regular time intervals from the same individuals with a known ToD. We demonstrate that at least two main distinguishing types of behavior of up- and down-regulated transcripts are present up to 38 h after death. Moreover, we provide evidence of actively regulated processes involved in the regeneration of the cell through DNA damage repair and the suppression of apoptotic and necroptotic pathways. Finally, a PMI prediction model with an accuracy of 4.75 h up to 38 h PM was developed based on 30 RNA transcripts.

Materials and methods

Sample collection

PM blood samples were collected from deceased individuals with a known time of death who donated their body to science. All individuals had died within the hospital (University Hospitals Leuven, Leuven, Belgium) and were immediately transferred to a private room upon arrival at the morgue of the hospital. From the ToD until the end of the entire sampling procedure the bodies were kept at room temperature (\({18}^\circ\)) . The cause of death of the individuals were heart failure, septic shock, staphylococcus infection, intracranial bleeding, septic shock and euthanasia for patients 1 to 7 respectively. No specific mRNA markers could be linked to either cause of death and therefore no transcripts were excluded on the basis of cause of death. Blood samples were taken during a time frame of 24 h starting upon arrival and with a 3 h interval (i.e. 7 blood samples per individual if possible). PMI was calculated as Sampling time (h) − Time of death (h). Blood was alternately collected from the vena femoralis (left and right) and vena subclavian (left and right). If blood collection failed at one location, sampling continued from another location until 2.5 mL was collected in a PAXgene Blood RNA Tube (IVD). Ethical approval for this study was obtained from the Ethical Commission of the University Hospitals Leuven (case number S58486) and informed consent was obtained from the participants before donation. The study was performed in accordance with the relevant internal guidelines as well as the Declaration of Helsinki.

RNA preparation and sequencing

Blood samples were incubated at room temperature for 5 h after which they were stored at \({-20}^\circ\) for 24 h and transferred to \({-\,80}^\circ\) as described by the manual. RNA extraction was performed using the PAXgene Blood RNA Kit (IVD) according to manufacturer’s instructions. RNA quantification was performed using NanoDrop spectrophotometry (Thermo Fisher Scientific) and RNA quality was assessed on the Bioanalyzer 2100 using the RNA 6000 Nano Kit (Agilent). RNA samples were stored in \({-\,80}^\circ\) until cDNA library preparation. cDNA libraries were created using the BlueBee mRNA-Seq Library Prep Kit FWD for Illumina (Lexogen) including the Globin Block module to remove the majority of the globin mRNA transcripts. 100 ng RNA was used during first strand cDNA synthesis. Library amplification was performed using 16 PCR cycles. cDNA pools were quantified using the Qubit dsDNA HS assay (Thermo Fisher Scientific). Sizing of the pools was performed on the BioAnalyzer 2100 using the High Sensitivity DNA assay (Agilent). Final cDNA libraries were pooled and sequenced on the Illumina NextSeq platform (single read, high output, 75 bp). The bioinformatics pipeline was performed on the BlueBee platform. This platform has a specific pipeline to process reads originating from Lexogen’s QuantSeq 3\('\) mRNA Library Prep Kit and includes quality control steps, trimming and read alignment to the human reference genome (GRCh38) using STAR alignment54. Number of reads per sample can be seen in Supplementary Figure S1. Raw read counts were normalised in BlueBee using the DeSeq2 pipeline using the median of ratio’s method55. After standardisation, transcripts with zero reads in 6 samples or more were filtered out to avoid random counts. Out of a total of 60.199 transcripts, 10.635 remained and were used as input for the downstream analyses. All data was standardised using z-scores.

Identification of PMI-associated genes

All statistical analyses were performed in R (version 3.6.3)56

An overview of the study pipeline is provided in Fig. 1. A total of 54 samples from 7 individuals were collected and used in the statistical analyses.

Clustering assessment

Two methods were performed in order to evaluate if the data was randomly or uniformly distributed, rather than clustered: mathematically, with the Hopkins statistic, and visually, with a PCA analysis in relation to PMI using Factoextra (v. 1.0.5)57 and FactoMineR (v. 1.42)58.

ANOVA and Dunnett’s test

All samples were ordered by increasing PMI and then grouped in bins. Analysis of Variance (ANOVA) was performed on the means of all transcript expression values between each bin. With the package multcomp (v. 1.4.10)59 a Dunnett’s post-hoc test was used to filter out results which were at least significant between the early PMI and late PMI bins. All PMI bins were therefore included in the ANOVA and posterior Dunnett’s test. The significant transcripts were divided into clusters according to the previous PCA results. Additionally, a Supplementary spreadsheet 3 was included, with the ANOVA-Dunnett’s test results between increasing bins (e.g.: 1 vs. 2–6, 1–2 vs. 3–6, 1–3 vs. 4–6, 1–4 vs. 5–6 and 1–5 vs. 6). The number of significant genes for each comparison were 1247, 1364, 1142, 2155 and 1646, respectively.

Regression analysis

Regression was also used to analyze the PMI correlation with gene transcription. For each transcript, the linear model parameters (the slope and a constant) were estimated from the samples, and a hypothesis test determined whether the slope was significantly different from zero. The p-value was calculated with a built-in-function of the stats package56 and those transcripts with a value lower than 0.05 were considered significant, and then assigned to a cluster depending on the sign of the slope.

Power analysis

Power analysis was performed twice: a first time with the ANOVA-Dunnett’s significant results, and a second time for the regression analysis significant results. In both cases the significant results were filtered again, and only those with power greater than 0.8 were included in the pathway analysis.

For the first power test, the package used was pwr (v. 1.2.2)60. The parameters needed were the number of groups (k = 2, the control bin and the last bin), the samples size (the mean of the number of samples included in the control group and in the bin number 6), and the effect size. Because the SD was 1 for all the transcripts, the effect size in this analysis was the mean of the expression value in the last bin minus the mean of the expression value in the control group. The significance, alpha, was set to be the standard 0.05. For the power calculation over the regression analysis results, the same significance was chosen. The package used in this case was WebPower (v. 0.5.2)61. The other parameters were the sample size (n=54), the number of predictors (p=1, referring to PMI) and the effect size f was calculated for each transcript as the effect size proposed in the manual of the package62 (i.e.: \(f^2 = R^2/ (1-R^2)\)).

GO enrichment analysis

The biomaRt package (v. 2.40.5)63 was used to perform a GO term enrichment analysis. This package performs a statistical analysis to find out which BP or pathways are enriched with genes contained in the input clusters. Before analysis, all transcripts’ IDs were converted to the respective Entrez Gene identifiers and used as gene universe. The transcripts contained in the clusters used as input were also translated into genes. The database used as reference for gene annotation and mapping was the Homo sapiens genes (gGRCh38). In this context, the p-value refers to the probability of seeing a specific number of genes present in a concrete GO term, just by chance. The pathways recovered were the ones with a p-value equal or lower than 0.05, and with at least 3 genes from input present in the specific GO term. The process was repeated twice: once with the significantly up-regulated transcripts identified using the ANOVA-Dunnett’s tests and linear regression, and a second one with the significantly down-regulated genes, both after power filtering. Lastly, a False Discovery Rate (FDR) correction was calculated for the p-values of each cluster.

PMI prediction modelling

A generalized linear model was implemented to predict PMI using Root Mean-Squared Error (RMSE) as the optimization parameter. The glmnet package (v. 3.0.1)64 was used to build a prediction model. Multiple values of alpha were tested after to generate the lowest RMSE. Cross-validation (CV) was applied as a validation technique and repetition was added to increase its robustness.