Recovery of the first full-length genome sequence of a parapoxvirus directly from a clinical sample

We recovered the first full-length poxvirus genome, including the terminal hairpin region, directly from complex clinical material using a combination of second generation short read and third generation nanopore sequencing technologies. The complete viral genome sequence was directly recovered from a skin lesion of a grey seal thereby preventing sequence changes due to in vitro passaging of the virus. Subsequent analysis of the proteins encoded by this virus identified genes specific for skin adaptation and pathogenesis of parapoxviruses. These data warrant the classification of seal parapoxvirus, tentatively designated SePPV, as a new species within the genus Parapoxvirus.

extracted directly from skin lesion material. We employed a combination of second (Illumina MiSeq) and third (Oxford Nanopore MinION 20 ) generation sequencing to recover the full-length genome of this seal parapoxvirus, including telomere sequences and hairpin structure. To our knowledge, this is the first report of a full-length seal parapoxvirus genome sequence, as well as the first report of recovery of a full-length parapoxvirus sequence directly from clinical material. Thus, the full-length genome sequence reported here can be assumed to faithfully reflect the naturally acquired pathogenic parapoxvirus strain, without any adaptions resulting from in vitro culturing 18,21 .

Results and Discussion
Identification of a parapoxvirus from a grey seal (Halichoerus grypus). In April 2015, a young grey seal taken to a seal rehabilitation center showed large spherical dermal lesions with severe ulcerations at both flippers and in the mucosa of the oral cavity ( Figure 1A). Histological examination of the affected skin areas showed extensive ulcerations and necrosis with epidermal loss and infiltration of neutrophils and few macrophages. Ballooning degeneration with severe swelling of hair follicular epithelial cells was observed. Occasionally, cytoplasmic eosinophilic inclusion bodies were present ( Figure 1B). Transmission electron microscopical analysis of follicular epithelial cells clearly identified densely packed, multifocal clusters of viral particles in the cytoplasm ( Figure 1C) indicative of parapoxvirus particles based on the elongated, ovoid shaped core surrounded by a membrane and a superficial membrane. Enveloped particles had a length of approximately 200 to 250 nm and a width of 100 to 150 nm.
Parapoxvirus infection was confirmed by performing a pan-parapoxvirus PCR targeting a 552 bp sequence of the genomic region encoding the major envelope protein 5 . Sequence analysis established 97.3% homology to the partial seal parapoxvirus (SePPV) sequences described earlier 5 . In situ-hybridization (ISH) with a probe specific for the amplified region revealed abundant parapoxvirus DNA-positive hair follicle epithelial cells corresponding to the cells with ballooning degeneration ( Figure 1D). In contrast, cells of the basal cell layer lacked a specific hybridization signal ( Figure 1D).
Unfortunately, virus isolation using a seal tissue-derived cell line was not successful. Limiting amount of tissue and suboptimal condition of the tissue most likely contributed to the toxic effects observed during cultivation processes.
Recovery of the full-length seal parapoxvirus genome applying high throughput short read sequencing together with Nanopore MinION sequencing. DNA from skin lesion was subjected to high throughput multiplex sequencing on an Illumina MiSeq Instrument, as well as nanopore sequencing on a third generation Oxford Nanopore MinION device 20 . Approximately 5% out of a total of 2,272,653 short reads generated on the MiSeq instrument did not map to host sequences and thus were considered to be of potential exogenous origin. De novo assembly and iterative mapping of the non-host reads yielded 19 contigs (sizes between 590 bp and 23,767 bp) that showed distant sequence similarity to the virus family Poxviridae, genus Parapoxvirus. These contigs accounted for 68.33% (71,680 reads) of all non-host reads; there were no other contigs or reads indicative of the presence of other pathogenic viruses in this sample. Iterative mapping of all sequences to full-length genome of ORFV allowed the assembly of a single contig of 127,941 bp (minimal coverage: 260 over 99% of the contig) ( Table 1). The sequence was classified as a seal parapoxvirus due to its close homology to a short fragment from the major envelope protein-coding sequence described in 2002 5 . We used nanopore sequencing to verify the assembly of short sequencing reads along the coding region and the termini of the virus. Nanopore sequencing produces relatively high error rates and thus is of limited use for de novo sequencing. However, the long read lengths which can be produced by this technique make it ideally suited to confirm the overall structure of sequence contigs. As shown in Figure S1 and Table S1, nanopore sequencing of the primary clinical material produced a total of 48 reads which mapped across at least 30 kbp of the viral genome, with the longest read covering a continuous stretch of 56.2 kbp. Together, although there were a few gaps at the right end of the genome, the nanopore reads covered more than 92% of the assembled genome, thus confirming the accuracy of the short read assembly.
Nucleotide sequence alignments among all fully sequenced parapoxvirus genomes revealed that the seal virus is only distantly related to the other genus members, showing the closest homology (77.3% sequence identity) to the bovine papular stomatitis virus (BPSV) sequence ( Figure 2A, Table 2). The seal parapoxvirus, tentatively named SePPV, has the smallest genome (128 kbp) among the fully sequenced members within this genus, followed by bovine papular stomatitis virus with 138 kbp ( Figure 2B). Similar to other parapoxviruses, the SePPV genome sequence shows a relatively high GC content. However, with 55.9% the GC content is the lowest known so far within this genus ( Figure 2B; Table 1). In addition to the core sequence, we were able to resolve the hairpin termini of the parapoxvirus genome, including the telomere resolution sequence important for effective replication of the virus ( Figure 2C). The inverted terminal repeats (ITR) of SePPV encompass 2,087 bp, coordinates 1-2,087 sense orientation and 125,855-127,941 antisense orientation. Interestingly the ITR contains a complete duplication of the ORF encoding for a dUTPase (similar to ORF007 dUTPase, Supplementary Dataset S1). In addition, the ITR contains a partial duplication of the ankyrin repeat (similar to ORF008 ankyrin repeat, Supplementary Dataset S1). While a partial duplication of an ankyrin repeat has been described for Camelpox virus (AF438165), the observed duplication of the ORF007 has not been described for parapoxviruses or poxviruses in general.
The contig generated by de novo assembly contained 230 single open reading frames (ORFs) of which 120 were identified as putative SePPV genes. Of these, 116 ORFs are coding for proteins with significant homology to annotated ORFs in other parapoxviruses (Table 1; Supplementary Datasets S1, S2). The relative order of the genes is similar to other parapoxviruses thereby supporting the classification as a novel species within the genus Parapoxvirus. Similar to the other members of the genus Parapoxvirus, the SePPV contains ORFs encoding for proteins involved in pathogenesis (Supplementary Datasets S1, S2). SePPV encodes a viral homologue of IL10 (SePPVgORF114), which has been shown in ORFV to be a potent anti-inflammatory virokine since deletion of the ORF results in an attenuated virus 22,23 . In addition, SePPVgORF101 expresses an anti-inflammatory chemokine binding protein CBP, which in ORFV plays a role in disrupting chemotactic recruitment of leukocytes 24 . SePPV contains an ORF, SePPVgORF013, of which the gene product has significant homology to an inhibitor of interferon response which blocks activation of the dsRNA dependent protein kinase 25 . SePPV also encodes proteins, encoded by SePPVgORF017 and SePPVgORF109 with significant homology to factors described for other parapoxviruses involved in NFκB signaling, ORF24 and ORF121. The function of the ORF24 gene product in ORFV is described to decrease TNFα induced phosphorylation whereas ORF121 of ORFV most likely is involved in the inhibition of NFκB-p65 phosphorylation and nuclear translocation 26,27 . As described for all established parapoxvirus species, we also identified an ORF encoding a vascular endothelial growth factor (VEGF) homologue [28][29][30][31] . The SePPVgORF118 encoded VEGF shows closest homology to BPSV, however different to BPSV with regard to the location of the ORF coding for VEGF, SePPV VEGF is located at the right end of the genome 29,32 . Viral VEGF most likely enhances viral growth by promoting cellular regeneration of the epidermis. Viruses devoid of VEGF do not induce extensive blood vessel formation and dermal swelling which is discussed to provide protection for immune cells 24 . Interestingly, BPSV encoded VEGF different to all other parapoxvirus encoded VEGFs shows higher homology to mammalian VEGF-A 24, 29 . SePPV encodes for 4 unique open reading frames not identified in ORFV and other parapoxviruses. The hypothetical proteins encoded by these ORFs do not show any significant homology with known proteins from the family Poxviridae (Supplementary Dataset S1). In comparison to ORFV, 16 ORFs including 9 hypothetical proteins, 2 ankyrin repeat proteins, 2 putative IMV membrane proteins and 2 proteins involved in virion morphogenesis are not present in SePPV (Supplementary Table S2). In addition, different to ORFV and other parapoxviruses we did not identify inhibitors of granulocyte-macrophage colony stimulating factor and interleukin-2, which are known as GIF (Supplementary Table S2) and play a role in the regulation of the adaptive immune response of the host 33,34 .
As shown in Tables 3 and 4, on the protein level SePPV demonstrates relatively uniform distances to other members of the genus (84.3 and 84.5% mean amino acid identity for DNA polymerase and DNA topoisomerase, respectively), with the closest relative again being BPSV.
For phylogenetic analysis all proteins in SePPV sequence were aligned with proteins identified in 14 representative genomes of the subfamily Chordopoxvirinae. Proteins which showed an alignment of >90% of the length of the individual protein within SePPV were used to construct a concatenated polyprotein. This polyprotein consisting of 47 proteins was used for the phylogenetic tree analysis (Figure 3). In addition, phylogenetic analysis was also applied with complete coding sequences of the DNA polymerase and DNA Topoisomerase I confirming the results obtained with the concatenated polyprotein (supplementary material Figure S2A,B, Tables 3 and 4). Thus, the results of our phylogenetic tree analyses together with the description of the gene order and annotated ORFs clearly warrant the classification of SePPV as a distinct species within the genus Parapoxvirus.

Conclusion
We report the first reconstruction of a full-length genome sequence of a member of the family Poxviridae directly from a clinical sample using Illumina short read sequencing combined with Oxford Nanopore sequencing. Poxviruses are challenging to grow in culture and culture adapted genomes might not faithfully represent those of virulent field strains. The power of combining high throughput short read sequencing with nanopore sequencing to recover a large and complex viral genome directly from complex clinical material as demonstrated here should be useful also for other virus families in particular those with large viral genomes.   Table 4. Percentage amino acid identity between the DNA topoisomerase I within the genus parapoxvirus. Upper comparison gradient indicates percentage identity between two sequences; lower comparison gradient indicates the distance between two sequences as calculated by the distance measure Jukes-Cantor. Percentage identity higher than 90% is shown in bold numbers.

Methods
the front flippers followed by lesions around mouth and nose ( Figure 1A). Skin samples from lesions of the left front flipper were sent to the University of Veterinary Medicine, Hannover, Germany, for further diagnosis in the framework of veterinary microbiological diagnostics in accordance to the German legislation. Therefore no ethical approval was required for the use of these samples. Upon symptomatic treatment, the lesions declined and the seal could eventually be discharged in June 2015.
Polymerase chain reaction (PCR) and Sanger Sequencing. Total DNA was extracted from skin lesions using a High Pure PCR Template Preparation Kit (Roche Diagnostics). Conventional PCR was done as described before 5 , amplifying a part of the putative major envelope gene. To confirm the specificity of PCR, amplicons were sequenced (LGC Genomics GmbH, Berlin, Germany) and aligned using the Clustal W multiple alignment tool implemented in BioEdit 35 . Histology and in-situ hybridization. Formalin-fixed tissue biopsies of the altered skin were processed routinely into paraffin wax. Tissue section of 3-4 µm thickness were cut and stained with hematoxylin and eosin (HE) for light microscopical examination.
For transmission electron microscopy a direct pop-off technique from the HE-stained slide was used as previously described 36 .
Illumina library preparation and MiSeq short read sequencing. Total DNA from 75 mg tissue was mechanically homogenized in PBS using disposable tissue grinder pestles. After low speed pelleting, supernatant was filtered (0.2 µm) and DNA was subsequently isolated using DNeasy blood & tissue Kit (Qiagen).
DNASeq library compatible with short read Illumina sequencing was generated using the NEB Ultra DNA library Kit (NEB) starting with 500 ng DNA, as measured by Qubit (Invitrogen) and following the manufacturer's instructions. Briefly, DNA was fragmented, end repaired and subsequently the adapter were ligated. Agencourt AMPure XP beads were used to size select the DNA fragments containing the adapters. Finally, the library was amplified by 15 PCR cycles. The fragment size distribution of the library was analyzed on a BioAnalyzer High Sensitivity LabChip showing a size range between 400 and 446 bp with the main peak of the library at 401 bp. The library was diluted to 2 nM and multiplex-sequenced together with five samples on the Illumina MiSeq (2 × 250 bp paired end run, estimated 4.3 million reads/sample).
Oxford Nanopore library preparation and MinION sequencing. Nanopore sequencing library preparations using Nanopore Sequencing Kits SQK-MAP005 and SQK-MAP006 were essentially performed as described in the protocols and guidelines provided by Oxford Nanopore Technologies (ONT). Briefly, 1 µg of the genomic DNA isolated from skin lesions was fragmented to an average size of 8-15 kb using g-TUBEs (Covaris). DNA fragments were end-repaired and adenylated using NEBNext Ultra II End-Repair/dA-tailing Module (NEB) followed by cleanup with Ampure XP beads (Beckmann Coulter). Sequencing and hairpin adapters (ONT) were ligated using NEB Blunt/TA Ligase Master Mix (NEB) followed by incubation with the hairpin tether (ONT). Prior to sequencing, 6 μl of the eluate (pre-sequencing mix), 75 μl running buffer (ONT), 60 µl nuclease free H2O and 4 μl fuel mix (ONT) were combined gently and were immediately loaded onto the prepared MinION flowcells. Sequencing was performed using 48 hr sequencing run scripts with addition of freshly prepared input material to the MinION flowcell every 12 hrs until no further active pores were available anymore.
Nanopore events were converted into FastQ containing Fast5 data using Metrichor basecalling with the respective 2D workflows. Fasta files were extracted from Fast5 data using poretools 25 . Long reads (>3000 bp) were subsequently aligned with LAST 26 to the SePPV assembly generated from Illumina sequencing data using the following parameters: -s2 -T0 -Q0 -a1 -f1. Alignments were further filtered by a minimum length of 3,000 bp to reduce false positive results due to low complexity region alignment. The joint assembly of MinION and MiSeq was performed using SPAdes (v3.6.0) using the 'careful' option and otherwise standard parameters 37 . Amino acid sequences of single proteins (DNA polymerase and DNA topoisomerase I) were aligned using the CLUSTAL W multiple alignment tool, CLC Main workbench, version 7.6.4. For phylogenetic analyses, genomes were trimmed manually and neighbor-joining trees were calculated using nucleotide distance measurement Jukes-Cantor parameters. Bootstrap analysis was performed with 1000 iterations.
Following an approach described before 38 phylogenetic analysis of a concatenated protein sequences was performed by maximum-likelihood tree construction using PHYML3 39