Direct formalin fixation induces widespread transcriptomic effects in archival tissue samples

Sequencing technologies now provide unprecedented access to genomic information in archival formalin-fixed paraffin-embedded (FFPE) tissue samples. However, little is known about artifacts induced during formalin fixation, which could bias results. Here we evaluated global changes in RNA-sequencing profiles between matched frozen and FFPE samples. RNA-sequencing was performed on liver samples collected from mice treated with a reference chemical (phenobarbital) or vehicle control for 7 days. Each sample was divided into four parts: (1) fresh-frozen, (2) direct-fixed in formalin for 18 h, (3) frozen then formalin-fixed, and (4) frozen then ethanol-fixed and paraffin-embedded (n = 6/group/condition). Direct fixation resulted in 2,946 differentially expressed genes (DEGs) vs. fresh-frozen, 98% of which were down-regulated. Freezing prior to formalin fixation had ≥ 95% fewer DEGs vs. direct fixation, indicating that most formalin-derived transcriptional effects in the liver occurred during fixation. This finding was supported by retrospective studies of paired frozen and FFPE samples, which identified consistent enrichment in oxidative stress, mitochondrial dysfunction, and transcription initiation pathways with direct fixation. Notably, direct formalin fixation in the parent study did not significantly impact response profiles resulting from chemical exposure. These results advance our understanding of FFPE samples as a resource for genomic research.

Formalin-induced gene expression profiles in archived tissue samples. Study 1. Direct formalin fixation resulted in a large shift in transcriptional profiles for FFPE mouse liver samples compared to FR (Fig. 2a). Differential gene expression analysis identified that direct fixation (FFPE vs. FR) resulted in 2,946 DEGs, 98% of which were down-regulated (Supplementary Figure S1, Supplementary Table S1). The degree of downregulation was relatively modest with average fold-change of − 2.8 ± 1.1 (s.d.). Ingenuity Pathway Analysis showed the formalin effect impacts canonical pathways consistent with enrichment in oxidative stress, mitochondrial dysfunction, sirtuin signaling pathway (associated with cellular bioenergetics), and translation initiation signaling (EIF2) (Supplementary Table S2). Freezing the tissue prior to fixation in ethanol (FR > OH) or formalin (FR > FFPE) did not show the same effects of direct formalin fixation (Fig. 2b), with FR > OH and FR > FFPE samples clustering close to FR and accounting for less variance in the dataset (Fig. 2a). This trend was also reflected at the pathway and upstream regulator level, where FR > FFPE samples tended to display more similarities with FR > OH as opposed to FFPE samples based on hierarchical clustering of p-values (Supplementary Figure S2, Supplementary Table S2, S3).
Strikingly, freezing prior to fixation resulted in 94% (FR > OH vs. FR) or 95% (FR > FFPE vs. FR) fewer DEGs, indicating that the vast majority of formalin effects occurred at the time of fixation for fresh tissue only, suggesting a direct transcriptional response of the cells being fixed rather than a post-mortem artifact (Fig. 2b). Moreover, there were only 62 common DEGs across the three preservation groups compared to FR, indicating that mechanical damage associated with formalin fixation and/or tissue processing in paraffin (after fixation) had very minor effects on transcriptomic profiles in this study (Fig. 2b).
Despite the large effect of direct formalin fixation on global transcriptional profiles (FFPE vs. FR), it did not have a clear impact on phenobarbital (PB) responses. PB treatment induced 173 DEGs in FR, compared to 180 for FFPE, 159 for FR > FFPE, and 261 for FR > OH (Fig. 3a, Supplementary Table S4). Sixty percent of PB DEGs in FR (104) were shared with the three other preservation groups, and of those 104 DEGs only 39 were identified in common with the 2,908 formalin effect DEGs (Supplementary Figure S3). The overall pattern of DEGs changed Scientific Reports | (2020) 10:14497 | https://doi.org/10.1038/s41598-020-71521-w www.nature.com/scientificreports/ by PB was similar across preservation groups and included well-established biomarkers of PB effect such as Cyp2b10, Cyp2c50, and Cyp3a11 (Supplementary Figure S4, Supplementary Table S5). For instance, Cyp2b10 in FR was induced 5,765-fold by PB and 1,744-to 4,111-fold across the other preservation groups. As a less dramatic example, Aldh1a1 in FR was induced fourfold by PB and three-to fivefold in the other preservation groups (Supplementary Table S4). Because preservation-related reductions in PB and vehicle control (Con) gene counts were relatively consistent across marker genes, significant PB effects compared to Con were still detectable (Fig. 3b). Based on the Matthew's correlation coefficient (MCC), FR > FFPE showed the highest concordance of PB DEGs with FR at 0.77. The MCC assumes all the DEGs in FR were true positives, which is not likely the case but needed as a benchmark for assessing preservation-related changes in quality with respect to gene expression. While FR > OH appeared to have more false positives (113), this group also had fewer false negatives (25) than FFPE (55), resulting in an MCC of 0.69. The FFPE group ranked lowest amongst the preservation groups  When PB DEGs were mapped to potential upstream regulators, gene patterns were consistent with enrichment and predicted activation of constitutive androstane receptor (CAR) and pregnane X receptor (PXR) and PB exposure across all preservation groups (    Table S11). This was also the case for predicted upstream regulators, some of which also happen to be regulators of oncogenesis such as MYC, MYCN, and INSR. The formalin effect overlapped with gene signatures of some drug/chemical compounds including pirnixic acid, ciprofibrate, gentamicin, and mono(2-ethylhexyl) phthalate, the primary metabolite of DEHP. Freezing prior to preservation in Study 1 mostly negated the gene response profile for these predicted upstream regulators (Supplementary Figure S7b Table S12).
When focusing on individual molecules of FFPE vs. FR impacted pathways, effects on mitochondrial dysfunction and oxidative phosphorylation included downregulation of several components in the electron transport chain such as multiple ATP synthase subunits, ubiquinone oxidoreductase subunits, and ubiquinol-cytochrome c reductase, which were not impacted in FR > FFPE or FR > OH samples (Supplementary Tables S13 and S14). Effects specific to direct formalin fixation (FFPE vs. FR) on the sirtuin signaling pathway and EIF2 signaling included genes related to controlled cell death, RNA synthesis, DNA damage recognition, and translation processes (Supplementary Tables S15 and S16).
As in Study 1, direct formalin fixation reduced overall gene counts across control and treated groups but had minimal impact on detection of DEHP (2A) and Furan (2B) effects at gene and pathway levels. For example, of the 163 and 1,173 formalin effect DEGs from Studies 2A and 2B, respectively, only 8 DEGs overlapped with DEHP treatment genes and 27 DEGs overlapped with Furan treatment genes (Supplementary Figure S8). Additional analysis of the chemical treatment gene response from Studies 2A and 2B are reported in detail elsewhere 8,9 . Table 1. Significant phenobarbital (PB)-induced changes in gene expression compared to control (Con) confusion matrix across preservations. DEGs identified by FDR-adjusted p-value < 0.05 and absolute foldchange value ≥ 2. MCC coefficient = 1 represents a perfect prediction; 0 represents no better than random; -1 indicates disagreement between observation and prediction. DEGs differentially expressed genes, TP true positives, FP false positives, FN false negatives, TN true negatives, MCC Mathews correlation coefficient. a Assuming FR control contains the actual PB-induced DEGs; which may not be the case.

Discussion
The primary goal of this study was to investigate the effects of formalin fixation on gene expression profiles in archival FFPE samples. Our findings clearly demonstrate that direct formalin fixation results in widespread transcriptomic effects, as detected by RNA-seq analysis. We show for the first time that formalin effects on total RNA-seq profiles occur at the time fresh liver tissue is fixed and that this effect is largely abolished by freezing first, suggesting the gene response to formalin is in part a biological response and not simply post-mortem artifact. Gene profiles resulting from direct fixation indicated disruption of fundamental pathways of cell metabolism, including oxidative stress, mitochondrial dysfunction, and bioenergetic pathways. However, these effects did not confound the detection of chemical treatment-related responses. These findings could have important implications for studies using FFPE samples, especially if adequate matched controls are not available. Sequencing analysis of FFPE specimens provides an opportunity to study samples from well-characterized archives, ranging from toxicology repositories 13 to clinical tumor biobanks 14,15 . At the gene level, direct formalin fixation resulted in a broad low-magnitude effect on global transcriptomic profiles. Several lines of evidence suggest that this response may be due to disruption of cellular processes during fixation of fresh cells. First, the effect was eliminated by freezing the tissues prior to fixation in Study 1. Second, FFPE DEGs showed widespread down-regulation of expression on RNA-seq analysis, as expected with formaldehyde-induced cross-linking. Third, pathway-level and functional analysis of FFPE DEGs showed disruption of key metabolic functions related to cellular respiration, mitochondrial toxicity, and bioenergetics, which are consistent with agonal effects of cells accumulating formaldehyde crosslinks. Finally, formaldehyde is a ubiquitous endogenous metabolite of many different cell types, which have evolved sensitive mechanisms to detect and respond to it 16 .
Another recent study described formalin responses on RNA-seq profiles while investigating the impact of delay-to-fixation on FFPE clinical tumor resections 17 . Interestingly, the act of preservation for standard FFPE samples, regardless of delay-to-fixation times, had a larger impact on gene response when compared to freshfrozen samples. Jones et al. also reported an upregulation of genes in FFPE RNA that was consistent with structural DNA modifications, as opposed to the metabolic effects identified in the present study 17 . This discrepancy could be partly due to differences in tissues analyzed (e.g. tumor vs. normal) and/or differences in tissue type or species investigated. Cancerous tissues are notoriously heterogeneous and exhibit altered RNA profiles relative to normal tissues, including genes related to cellular bioenergetics, potentially obscuring the relatively lowmagnitude formalin effects described in the current analysis. Whatever the case, these studies highlight the importance of preanalytical variables in determining the response of tissues to formalin fixation and warrant additional work to determine whether similar effects are seen across different tissue types and species under different study conditions.
The observation of global transcriptomic effects due to formalin fixation is not entirely unexpected considering the dynamics of tissue preservation. Most importantly, covalent binding of formaldehyde within cells of fixed tissue does not occur immediately. The chemical modifications of RNA, including cross-linking, can continue for up to 24 h following immersion of a tissue 4,18,19 . Formalin penetrates tissue quickly, but a lag in tissue penetration of formalin into the middle of a specimen may still result in some cells remaining unfixed or underfixed for minutes or even hours after initial immersion. This lag may provide sufficient time-to-cell death for agonal transcriptomic effects to occur. However, several studies, including Study 2B and others 9 , also indicate that age in FFPE block can have a marked impact on RNA-Seq profiles. Collectively, this evidence suggests that a combination of events is occurring, direct "agonal" effects at the time of fixation as well as non-specific RNA degradation with time.
The downregulation in genes identified by RNA-seq in FFPE tissue from mouse liver is consistent with formalin fixation-related covalent modifications of nucleic acid. RNA-seq involves sequencing all isolated RNA transcript pieces including exons, introns, and intergenic regions. Pre-mRNA or partially processed transcripts will be detected using this method, which may explain why previous work identified a modest decrease in percent reads fully aligned to exonic regions of the transcriptome from more recently archived FFPE samples compared to paired FR with a corresponding increase in the percent of intronic and intergenic alignments 9,20 . Other groups have identified a similar pattern when comparing FFPE to FR samples (e.g. Refs. 8,17 ). The cause for a higher percentage of intronic reads in FFPE compared to FR is not well characterized but could be due to a variety of formalin effects on library preparation and specific cellular processes, such as having proteins bound to intronic regions, resulting in interference with transcription and translation. This disruption can impact the detection of genes in FFPE samples compared to FR samples because reads are diverted to intronic and intergenic regions. Newer technologies such as TempO-sequencing may be able to bypass part of this problem through the use of short probes that bind to specific exonic regions within genes, maintaining greater specificity 21 .
The large transcriptional effects of formalin fixation did not confound detection of chemical responses. Despite some loss in gene count signal across chemical treated and control samples in FFPE tissue samples from Study 1, well known marker genes (Cyp2b10, Cyp3a11, Cyp2c50, etc.) were identified and DEGs were mapped to known receptors involved in PB mode of action like CAR and PXR. This finding was also observed and previously reported for Study 2A 9 and 2B 8 and their respective chemical treatments. When contrasting chemical treatment vs. vehicle control, Hester et al. reported FFPE samples demonstrated 31-45% loss in gene counts relative to paired FR but still showed a significant dose-dependent upregulation of DEHP target genes Acot and Cyp4a and pathway-level enrichment consistent with peroxisome proliferator-activated receptor-alpha (PPARα) 9  www.nature.com/scientificreports/ in overrepresentation in Nrf2-mediated oxidative stress response, xenobiotic metabolism, and p53 signaling pathways, consistent with the furan mode of action 8 . However, with Study 1 there were potential spurious PBrelated DEGs in the FFPE dataset that were absent in FR. These potential false-positive DEGs highlight the need for matched disease/normal or chemical/vehicle controls when working with gene expression data from FFPE tissue samples. More research is needed to better develop methods that may assist in informatically subtracting out such formalin effects when matched FFPE controls are not available. Current sequencing technologies provide unprecedented opportunities to bridge genomic information with high-quality phenotypic data from toxicological and clinical biorepositories. However, our understanding of formalin fixation and how it interacts with new methods such as sequencing in liver remains quite limited. This study provides new insights into formalin effects on RNA-seq data from liver archival tissues. While our results highlight potential widespread effects of direct formalin fixation (and marked variability across studies), they also show that, with adequate controls, FFPE samples can be effectively used to produce reliable transcriptomic data. These findings should inform retrospective mechanistic investigations and precision medicine initiatives using archival tissue specimens.

Methods
Experimental overview. Study 1: prospective investigation of direct formalin fixation on transcriptomic profiles in mouse liver. Paired frozen and FFPE mouse liver samples used for this study were originally collected from 10-to 11-week old male B6C3F1 mice treated daily for 7 days with 600 ppm phenobarbital (PB) (99.8% purity, lot# SLBF7347V; Sigma-Aldrich, St. Louis, MO) or vehicle control (0 ppm, Con) in the drinking water (n = 6 mice/treatment group). Complete details of this study are described in Lake et al. 22 . All procedures involving animals in this study were conducted under protocols approved by the Institutional Animal Care and Use Committee of the U.S. EPA. Phenobarbital was used as a reference chemical given its well-characterized transcriptomic effects in mouse liver (e.g. 23 .).
Each liver tissue sample was processed four different ways ( Fig. 1). At the time of sample collection, the tissue was either flash frozen in liquid nitrogen followed by storage at − 70 to − 80 °C or directly fixed in 10% neutral buffered formalin (NBF) for ~ 18 h prior to embedding in paraffin block (FFPE group). Fresh 10% NBF (ThermoFisher Scientific; cat. #SF100-4; Fairlawn, NJ) contains approximately 4% formaldehyde (weight/volume). Frozen and fixed mouse liver samples included pieces of left lateral, caudate, and right medial liver lobes. Frozen liver samples were later divided on dry ice into three portions (~ 20-30 mg each), which were returned to − 80 °C storage (FR), fixed for ~ 18 h in 70% ethanol followed by paraffin-embedding (fixative control, FR > OH), or fixed for ~ 18 h in 10% NBF followed by paraffin-embedding (FR > FFPE) (n = 12 mice per condition, 6 Con and 6 PB-treated).
Histologic processing and sectioning of FFPE, FR > OH, and FR > FFPE samples followed standard methods 20 . Briefly, tissues were washed two times in 80% ethanol (30 min each); two times in 95% ethanol (45 min each); three times in 100% ethanol (45 min each); three times in xylenes (ThermoFisher Scientific, Fairlawn, NJ) (30 min each); two times in Paraplast Tissue Embedding Media (ThermoFisher Scientific, Fairlawn, NJ) at 58-60 °C (30 min each); and, finally, two times in Paraplast at 58-60 °C under a vacuum (1 h each). After processing, FFPE blocks were stored in a research laboratory at room temperature. At the time of RNA isolation, tissue age in block was ~ 16 months.
While tissue cross-linking by formaldehyde begins almost immediately upon exposure 4 , the specific timing of formalin effects on biological processes such as RNA transcription is not understood. Freezing prior to fixation was intended to test whether observed the transcriptomic effect of formalin fixation occurs as an agonal event (i.e. during cell death, when certain aspects of metabolic and transcriptional machinery may still be operational) or as a post-mortem artifact (i.e. after cell death has occurred and transcription has stopped). Ethanol was used as a fixative control as it does not produce the cross-links and other covalent modifications seen with formalin fixation 4 .

Study 2: retrospective evaluation of formalin fixation on transcriptomic profiles in mouse liver.
For Study 2A, paired frozen and FFPE mouse liver samples were collected from male B6C3F1 mice treated with di(2-ethylhexyl) phthalate (DEHP) (CAS No: 117-81-7; 99.5% purity, Chem Service, West Chester, PA) in the feed for 7 days at dietary doses of 0; 1,500 (1.5 k); 3,000 (3 k); and 6,000 (6 k) ppm. Experimental design and background information are described in Lake et al. and Hester et al. 9,22 . Each liver sample was divided and directly flash frozen in liquid nitrogen followed by storage at − 70 to − 80 °C (FR) or directly fixed in 10% NBF for 18-24 h followed by processing to FFPE block. A total of 16 sample pairs were used (n = 16 FR, n = 16 FFPE, n = 4 per dose group). After histologic processing, FFPE blocks were stored at ambient temperature for < 2 years. For Study 2B, paired frozen and FFPE mouse liver were collected from female B6C3F1 mice exposed to 0 or 8 mg/kg furan in corn oil by daily gavage for 3 wks. Experimental design and background information are described in Webster et al. 8 . At collection, each liver sample was divided and directly flash frozen in liquid nitrogen and maintained at ≤ − 70 °C or fixed in 10% NBF for 18 h. or 3 wks. before processing to FFPE block. A total of 8 sample pairs were used (n = 8 FR, n = 8 FFPE 18 h., n = 8 FFPE 3 wks., at n = 4 per dose group). FFPE blocks were stored at ambient room temperature for < 1 yr. Note: Ischemia time was not investigated in these studies. Sample collection was well controlled and time to freezing or fixation did not exceed 15 min for any samples, in accordance with recommendations to limit ischemia time to < 12 h. for FFPE samples undergoing RNA analysis 23,24 . RNA isolation. For frozen tissue samples, total RNA was extracted and purified using homogenization in RNAzol RT (Molecular Research Center, Cincinnati, OH) and elution with RNeasy MinElute columns (Qiagen GmbH, Hilden, Germany) 25  Any read with one base > 95% frequency, homopolymers ≥ 4 within a read, and an average Q-score below 25, or length < 25 bases were also filtered by fastq-mcf3 tool. For Study 2B, read ends were trimmed by quality (Q-scores ≤ 7) and reads with length ≤ 25 or composed of any one base at > 95% frequency were removed by Partek Flow. Total RNA-seq reads were aligned to External RNA Controls Consortium (ERCC) spike-ins to assess the success of library construction and sequencing. A subset of the reads (~ 1 million) was aligned to other added control sequences (PhiX and other Illumina controls used during library preparation), residual sequences (globin and rRNA), and poly-A/T sequences that persisted after clipping. Reads were also aligned to a sampling of intergenic regions to assess the level of DNA contamination.
Subsequent data analysis was carried out using Partek Flow NGS v 6.17.1128 (Partek Inc., St. Louis, MO). Total RNA-seq reads were aligned using STAR v2.5.3a and counts matrices were generated using the Expectation-Maximization algorithm 26 implemented in Partek Flow. Clipped FASTQ files were aligned to the Mus musculus reference genome (GRCm38/mm10) and quantified to the transcriptome (RefSeq transcript 81-2017-05-02). A 0.0001 offset was added adjust for zero counts. Gene features with geometric mean of ≤ 1 count across all samples were filtered prior to counts per million normalization (CPM). This involves dividing feature counts by total sample library size and multiplication by 10 6 . Low expression filtering to exclude unexpressed or lowly expressed gene features across all samples which may interfere with downstream differential gene expression analysis removed 11,173/24,543 gene features for Study 1; 11,107/24,543 gene features for Study 2A; and 11,368/24,543 gene features for Study 2B. Filtered, normalized gene counts were then analyzed for differential gene expression using Partek Gene-Specific Analysis algorithm (GSA). Significance was defined as false discovery rate (FDR)-adjusted p-value of < 0.05 and absolute fold-change count ≥ 2.
Other statistical analyses. Other data were analyzed using R statistical software (v3.5.3) 27 , pheatmap (v1.0.12), VennDiagram (v1.6.20), ggplot2 (v3.1.1), stats (v3.5.3), and car (v3.0-2) packages. Continuous data were first screened for normality using the Shapiro-Wilk Test and homogeneity of variance using the Levene's Test. As most statistical comparisons for the sequencing quality metrics did not meet normality or homogeneity of variance criteria, all data were analyzed using non-parametric two-tailed Wilcoxon signed-rank tests with a Holm multiple comparison adjustment. An adjusted p-value < 0.05 was the threshold for significance. To assess the effects of chemical exposure (Study 1), phenobarbital treatment was contrasted to the respective control within the same preservation condition (i.e., FR, FR > FFPE, FFPE, etc.). Where chemical treatment and preservation interactions were not expected or observed for pre-sequencing, pre-alignment, and preanalytical comparisons, PB and Con samples were combined for analyses. Pairwise comparisons included each preservation group vs. FR control.