Genome reconstruction of white spot syndrome virus (WSSV) from archival Davidson’s-fixed paraffin embedded shrimp (Penaeus vannamei) tissue

Formalin-fixed paraffin-embedded (FFPE) tissues are a priceless resource for diagnostic laboratories worldwide. However, DNA extracted from these tissues is often not optimal for most downstream molecular analysis due to fragmentation and chemical modification. In this study, the complete genome of white spot syndrome virus (WSSV) was reconstructed from ~ 2-year-old archived Davidson’s-fixed paraffin-embedded (DFPE) shrimp tissue using Next Generation Sequencing (NGS). A histological analysis was performed on archived DFPE shrimp tissue and a sample showing a high level of WSSV infection was selected for molecular analysis. The viral infection was further confirmed by molecular methods. DNA isolated from DFPE and fresh frozen (FF) tissues were sequenced by NGS. The complete genome reconstruction of WSSV (~ 305 kbp) was achieved from both DFPE and FF tissue. Single nucleotide polymorphisms, insertion and deletions were compared between the genomes. Thirty-eight mutations were identified in the WSSV genomes from the DFPE and FF that differed from the reference genome. This is the first study that has successfully sequenced the complete genome of a virus of over 300 kbp from archival DFPE tissue. These findings demonstrate that DFPE shrimp tissue represents an invaluable resource for prospective and retrospective studies, evolutionary studies and opens avenues for pathogen discovery.

Scientific RepoRtS | (2020) 10:13425 | https://doi.org/10.1038/s41598-020-70435-x www.nature.com/scientificreports/ In human clinical research, many researchers have successfully extracted nucleic acids from FFPE tissue for PCR-based amplification work despite the degradation of nucleic acids and thereby giving them access to a previously untapped resource. However, it must be clarified that in most cases the successful use of FFPE tissue for molecular analysis depends largely on how the sample was fixed (temperature, time, pH). In cancer research FFPE tissues have been extensively used for genome sequencing of tumor tissues for copy-number and mutation-analysis, expression profiles, screening for mutational hotspots, single-cell sequencing and genome sequencing from Laser Capture Microdissected cells [1][2][3][12][13][14] . FFPE tissues have been used in pathogen discovery and uncovering novel genetic features in pathogen genomes. For example, the Spanish Influenza Pandemic Virus was reconstructed from FFPE tissue from 1918 15 . The virus species involved in 1918 pandemics showed large differences to the contemporary human influenza H1N1 strain 15 . Recently, FFPE tissue has been used for the detection and discovery of a novel rotavirus 5 . In another recently published retrospective study, FFPE tissue was used to sequence the RNA genome of ~ 15 kb length of the Newcastle disease virus (NDV) that naturally infects many avian species 16 . The study revealed the continuous evolution and previously unrecognized genetic diversity in NDV 16 .
The first study of the use of FFPE material in aquatic organisms dates back to 1995 when Krafft et al. 17 used fixed tissue to detect morbillivirus in lung tissue of bottlenose dolphins. In finfish, mollusk and crustacean pathology, so far no attempt has been made to explore the feasibility of using FFPE/DFPE tissues for any retrospective genetic studies or pathogen discovery. In shrimp aquaculture, existing as well as emerging diseases are a threat to a sustainable growth of the industry worldwide. Outbreaks of diseases in shrimp aquaculture cause major economic losses to shrimp farmers directly, and indirectly impact the lives and livelihood of those who depend of shrimp farming especially in developing nations with large coastal boundaries. There is an urgent need to understand the origin and evolution of pathogens in shrimp aquaculture to prevent epizootics that are becoming more common than ever. Archived Davidson's-fixed paraffin-embedded (DFPE) tissues in the Aquaculture Pathology Laboratory of The University of Arizona are an untapped invaluable resource for pathogen discovery, metagenomic and evolutionary studies to understand the origin, evolution and spread of shrimp pathogens worldwide. In this study, we demonstrated the feasibility of using DFPE tissue in pathogen discovery by reconstructing the complete genome of a large dsDNA-containing virus, white spot syndrome virus (WSSV), with a genome size of ~ 305 kbp from DFPE tissues. To our knowledge, this is the first report of genome reconstruction of such a large genome from archived tissue for any virus known to infect humans, animals or plants. This study shows the utility of DFPE tissues in shrimp pathology research, opens avenues for novel pathogen discovery and enables us to address questions related to the origin and evolution of shrimp viral pathogens that continue to cause catastrophic losses to farmers globally.

Results
Selection of Davidson's-fixed paraffin embedded blocks. Histopathological evaluation of the experimentally infected Penaeus vannamei shrimp revealed a severe infection graded as G3-G4 in all tissues examined. A representative histology section showing the intranuclear inclusions in the gut epithelial cells of P. vannamei that are pathognomonic of WSSV infection is shown in Fig. 1 Fig. 1).  Sequence analysis of the WSSV genome from DFPE shrimp tissue and annotation. A total of 308,724,322 million reads were generated from the second round of sequencing in a PE 2 × 150 bp format. From the total of reads, 3,056,486 unique reads (approximately 1.0% reads) were mapped to the WSSV China reference genome and generated complete coverage of the entire genome with a mean coverage of 1,550.6. The maximum and minimum coverage obtained were 5,219 and 317, respectively. The reconstructed WSSV genome was 305,094 bp and presented a pairwise identity of 99.90% with the genome of the China reference strain. A total of 193 DNA coding sequences were annotated using RAST and Geneious Prime (Fig. 2). Additionally, a histogram of read size distribution is presented in Supplementary Fig. 2.

Confirmation of SNPs and sequence variations.
The WSSV genome sequence obtained from DFPE tissue sample were aligned with GenBank sequence of the WSSV China reference strain and the SNPs were identified. A total of 38 sequences variations that include SNPs, deletions or insertions were detected. The genomic regions that contained the SNPs and sequence variations were amplified by PCR, sequenced and aligned with the reference WSSV genome and the WSSV genome reconstructed from DFPE tissue ( Table 1, Fig. 3). The SNPs and sequence variations were confirmed when the Sanger sequence and the DFPE derived sequence matched. All 38 of the mutations found in the WSSV genome reconstructed from DPFE were confirmed by Sanger sequencing.

Discussion
Detection and characterization of novel viruses is frequently hampered by the lack of properly stored materials 5 .
For the retrospective identification of viruses associated with disease outbreaks, often only formalin-fixed paraffin-embedded (FFPE) tissue samples are available 5 . Retrieving genomic information from FFPE has always been a challenge but the availability of genome sequencing technologies such as NGS has now made it possible to reconstruct genetic information from archived samples. In the case of viral diseases of shrimp, often the etiologic agent was identified long after the disease was initially reported and spread within and across countries worldwide. While samples originally collected during disease outbreaks were used for histopathological and ultrastructural studies to elucidate possible etiologic agent, those samples were almost never used for genetic characterization of the pathogen associated with the disease. As a result, it remains unknown how the pathogen evolved during the time frame from when the disease outbreaks were initially reported vs. when the genomic characterization of the pathogen was accomplished. For example, the Infectious Hypodermal and Hematopoietic Necrosis Virus (IHHNV) was initially detected in 1981 by Lightner et al. 18 however, the genetic characterization of the virus was carried out 19 years later by Shike et al. 19 . IHHNV was the first shrimp virus for which the genome sequence was determined 19 . Considering a high rate of nucleotide substitution (1.39 × 10 -4 substitutions/site/year) 20 of IHNNV, it is possible that the strain that caused massive mortalities of blue shrimp (P. stylirostris) in the early 80's in Mexico and later in the rest of the Americas is different from the strains that have been characterized later. Another interesting fact that supports this hypothesis is that the current strains of IHNNV do not cause mortalities or major histological alterations in P. vannamei and P. stylirostris shrimp and no major epizootics have been attributed to IHHNV in recent years 20,21 . This could be due to accumulation of mutations in the IHHNV genome and/or development of host resistant/tolerance over time. Upon reconstructing the WSSV genome from DFPE tissues, we have shown that the genetic characterization of pathogens containing large genomes is possible from archived fixed tissue (DFPE) and this opens the door for future retrospective studies to better understand the genomic properties of pathogens from the past that once caused mass mortalities but causes little to no mass mortality anymore. These studies would enable to better understand the evolution of host-pathogen interactions not only in shrimp but also in viruses infecting other animals and humans.
In this study, the DNA extracted from the DFPE tissue were used for pathogen detection via PCR and qPCR. WSSV was successfully detected by qPCR and nested PCR following OIE-recommended (Paris, France) protocols and WSSV genomic fragments ranging 69 bp (for real-time PCR) to 1,477 bp (for 1st step of the nested PCR) were amplified. However, it is important to mention that for the nested PCR protocol the increase in the amount of template DNA was key to obtaining amplification in all samples (N = 7). In preliminary assays, amplification was only obtained in two samples (17-702 A6 and 17-702 A7) when using > 130 ng of DNA per reaction. All the fragments designed to amplify the areas where the SNP where located ranged between 100-250 bp and thus fell within the size range of qPCR and nested PCR diagnostics. A previous published study involving human housekeeping genes by Ludyga et al. 8 have shown that products between 100-300 bp can be reliably amplified from FFPE. The study by these authors also showed that the amplifiable fragment size decreases with storage time with the maximum amplifiable fragment decreasing from 687 bp from samples from the year 2000 to 129 bp from samples from 1971 8 . Our results show that a relative short storage time (~ 2 years) of DFPE shrimp tissue, as used in this study, can provide DNA of sufficient quality that can be used to amplify DNA fragments of almost 1,500 bp. In addition, these results suggest that Davidson's fixative is not as damaging to nucleic acids as previously postulated by Hasson et al. 11 and that DNA from DFPE shrimp tissue does not undergo sever degradation in short storage times (~ 2 years). However, we should underscore that some degradation does occur, as observed from the percentage of virus mapping reads where the FF sample had a high percentage (~ 4%) in comparison to the DFPE samples (~ 1-1.3%) and our inability to amplify large PCR products at lower DNA concentrations.
Single-nucleotide polymorphism changes in bacterial genomes can cause significant changes in phenotype, including antibiotic resistance and virulence, therefore detecting them within metagenomes is vital 22  www.nature.com/scientificreports/ replication and pathogenicity. In Taura syndrome virus (TSV) of shrimp, it was shown that a single nucleotide mutation changed the predicted tertiary structure of the RNA-dependent-RNA polymerase in a highly virulent strain compared to less virulent strain 23 . The error rates for some NGS data sets generated by Illumina technologies are very low: a rate 0.0021 (errors per base) 22 . Despite the low error rate, it was critical to confirm that the detected SNPs, insertions and deletions were present in the WSSV genome reconstructed from DFPE and were not sequencing errors. By amplifying, and sequencing the regions where the variations were located, we were able to confirm all the SNPs, insertions and deletions. This results further highlight the robustness of this methodology since it proves that small sequence variations can be efficiently detected from DFPE tissue and it underscores its value for retrospective phylogenetic analysis. www.nature.com/scientificreports/ In recent years, the availability of novel genome sequencing technologies have increased the chance and speed of detection of unknown viruses in samples collected from humans and animals. In particular NGS played an important role in the discovery and characterization of many novel viruses 5,24-26 . Next generation sequencing using DNA isolated from FFPE tissue enabled pathogen detection, identification of endogenous viral elements, genome sequencing, exome and transcriptome sequencing in animals and humans [1][2][3][4][5]7,12,14,16,27,28 . Although FFPE tissues have been used to detect known viral sequences, the application of FFPE tissues for detection of novel viruses is very limited. Recently, Bodewes et al. 5 showed that sequence-independent amplification in combination with NGS can be used to detect sequences of known and unknown viruses in herring gull and ferrets, although with relatively low sensitivity. The findings of Bodewes et al. 5 indicate that NGS from FFPE is a viable approach to detect known DNA (Adenovirus) and RNA (influenza A/H1N1) viruses, and unknown RNA viruses (novel herring gull rotavirus). Our results confirm that NGS from DNA extracted from DFPE tissue is also a viable approach to detect know viral sequences. However, our results show this approach is robust and can generate enough data to sequence very large viral genomes with a very high coverage. Furthermore, our results suggest that with sufficient data even the sequencing of complete bacterial genomes from this type of samples might be possible.
Unlike plant and human virology, shrimp virology is a relatively newly emerged field of virology. The first shrimp viral disease was reported only about 50 years ago (Baculovirus penaei) 29 and the first shrimp virus was sequenced about 20 years ago (IHHNV) 19 . However, as shrimp aquaculture has evolved from a subsistence level of farming to a major industry providing jobs to millions of people around the world directly and indirectly especially in countries with large coastal boundaries, viral diseases poses a serious threat to the sustainable growth of this nascent industry. As of now, viral disease prevention through biosecurity and early disease diagnosis remain as corner stones to mitigate losses in shrimp aquaculture. Since these diseases primarily spread through the movement of infected broodstock and post-larvae across countries and continents, it is critical to understand how these pathogens evolve in new environment as virus-infected animals are moved across continents and how naïve host adapt to new pathogens. The ability to reconstruct DNA viral genomes as large as 300 kbp size from DFPE tissues shows the feasibility to generate baseline genetic data from archived tissue and determine how pathogens have evolved over time. To our knowledge, this is the first study that shows the feasibility of using NGS as a viable option for genetic characterization of shrimp pathogens and potentially discovering novel pathogens from samples stored in pathology laboratories worldwide. DNA extraction, quantification and PCR. DNA was extracted using the commercial kit FFPE DNA Purification Kit (NORGEN BIOTEK CORP) in accordance with the manufacturer's recommendations with some modifications. During the deparaffinization step, the xylene washes were doubled, and the pellet was air dried for 20 min. Finally, during the lysate preparation step, the incubation at 90 °C was increased from 1 to 1 h 15 min. Two elution's were obtained from each sample. Additionally, from the samples fixed in liquid nitrogen DNA was extracted using the Genomic DNA isolation kit (NORGEN BIOTEK CORP) following the manufacturer's instructions. The quantity and quality of the DNA was determined using a NanoDrop 2000. The presence of WSSV was further confirmed by qPCR and nested-PCR following published protocols 30,31 . For the nested-PCR protocol published by Lo et al. 30 one modification was made, the input volume of DNA was increased to 2.5 µl (191.25-349.25 ng/per reaction).

Next generation sequencing (NGS).
To test the feasibility of performing NGS using DNA isolated from DFPE shrimp tissue, we conducted two rounds of NGS. The first sample was sequenced using an Illumina MiSeq System (PE 2 × 300 bp) (Illumina). Once we determined it was possible to efficiently sequence DNA from archived DFPE tissues using an Illumina MiSeq System, we sequenced two additional samples. The second round of sequencing was done using an Illumina HiSeq 2500 System (PE 2X150 bp) (Illumina) to generate a more robust data set. DNA extracted from both DFPE tissue and FF tissue were sent for NGS at OmegaBioservices, Norcross, GA. Library for the DNA samples were generated at OmegaBioservices using the Library Kit, KAPA Hyper prep for WGS (Roche). For the DNA extracted from the DFPE tissue the fragmentation step prior to library generation was avoided since the isolated DNA was already fragmented.
Mapping and annotation. The DNA reads were paired and duplicate reads were removed using the Dedupe plugin in Geneious Prime 32 . DNA reads from the WSSV shrimp tissue (DFPE and FF) were checked for quality and trimmed before being mapped to the China WSSV reference genome (GenBank accession number: AF332093) using Geneious Prime (Biomatters) 32 . The Geneious mapper was used for the mapping analysis and the setting were set to detect structural varients 32 . The WSSV isolate of the APL originated from China and hence the China WSSV reference genome was utilized, however, it is unknown if they represent the same strain. The mean coverage of each based was calculated. The contigs generated from the mapping were annotated using the RAST Server and Geneious Prime 32,33 . The WSSV complete genome reconstructed from DFPE (GenBank accession: MN840357) was submitted to GenBank. The MAUVE software was used to perform whole genome alignments and comparisons 34 . The genomes of WSSV obtained from the DFPE tissue, FF tissue and the WSSV reference genome were compared to identify differences among the sequences.