Nanopore sequencing and de novo assembly of a misidentified Camelpox vaccine reveals putative epigenetic modifications and alternate protein signal peptides

Saud, Zack; Hitchings, Matthew D.; Butt, Tariq M.

doi:10.1038/s41598-021-97158-x

Download PDF

Article
Open access
Published: 07 September 2021

Nanopore sequencing and de novo assembly of a misidentified Camelpox vaccine reveals putative epigenetic modifications and alternate protein signal peptides

Zack Saud¹^na1,
Matthew D. Hitchings²^na1 &
Tariq M. Butt¹^na1

Scientific Reports volume 11, Article number: 17758 (2021) Cite this article

1845 Accesses
3 Citations
2 Altmetric
Metrics details

Subjects

Abstract

DNA viruses can exploit host cellular epigenetic processes to their advantage; however, the epigenome status of most DNA viruses remains undetermined. Third generation sequencing technologies allow for the identification of modified nucleotides from sequencing experiments without specialized sample preparation, permitting the detection of non-canonical epigenetic modifications that may distinguish viral nucleic acid from that of their host, thus identifying attractive targets for advanced therapeutics and diagnostics. We present a novel nanopore de novo assembly pipeline used to assemble a misidentified Camelpox vaccine. Two confirmed deletions of this vaccine strain in comparison to the closely related Vaccinia virus strain modified vaccinia Ankara make it one of the smallest non-vector derived orthopoxvirus genomes to be reported. Annotation of the assembly revealed a previously unreported signal peptide at the start of protein A38 and several predicted signal peptides that were found to differ from those previously described. Putative epigenetic modifications around various motifs have been identified and the assembly confirmed previous work showing the vaccine genome to most closely resemble that of Vaccinia virus strain Modified Vaccinia Ankara. The pipeline may be used for other DNA viruses, increasing the understanding of DNA virus evolution, virulence, host preference, and epigenomics.

Haplotype-resolved de novo assembly of the Vero cell line genome

Article Open access 20 August 2021

Illumina and Nanopore methods for whole genome sequencing of hepatitis B virus (HBV)

Article Open access 08 May 2019

mRNA vaccine quality analysis using RNA sequencing

Article Open access 21 September 2023

Introduction

DNA viruses include those which have DNA genomes and replicate using DNA-dependent DNA polymerase. They are grouped into two classes, comprising single stranded DNA viruses and double stranded DNA viruses. The latter group contains the infamous Variola Virus (VARV), the causative agent of smallpox, which belongs to the family Poxviradae, subfamily Chordopoxvirinae and genus Orthopoxvirus. There are currently 12 accepted species within the genus, the other notable members including; Vaccinia virus (VACV)—the prototype Orthopoxvirus used as a vaccine to eradicate human smallpox and which has no known natural host¹, Cowpox virus (CPXV)—administered successfully by Edward Jenner as the first documented successful vaccine², Monkeypox virus (MPXV)—a zoonotic virus endemic to the African subcontinent³, and Camelpox (CMLV)—the most genetically similar extant species to VARV⁴.

Poxviruses have linear, double-stranded DNA genomes that vary from 130 to 230 kbp⁵. The telomere ends of the genome form covalently closed hairpin structures at the termini⁶. The hairpin is at the end of a long, inverted terminal repetition (ITR) containing sets of short, tandemly repeated sequences⁵. For orthopoxviruses, the size of the ITRs range from approximately 200–500 base pairs for variola viruses, to almost 12,000 base pairs for several vaccinia virus strains⁷. Large ITR regions can pose problems for first generation Sanger sequencing⁸ and second-generation Illumina sequencing⁹, which are capable of producing sequence read lengths of up to around 1000 bp and 300 bp (or around 500 bp linked pair-end) respectively. Such tracts of repetitive sequences in a genome can be resolved by third-generation long read sequencing technologies^10,11,12, which are capable of producing read lengths in excess of 100,000 bp.

The central portions of most poxvirus genomes are highly conserved, and contain essential genes involved in key functions such as transcription, DNA replication and virion assembly¹³. In contrast, genes that cluster at the ends of the genome are usually species or host specific, and encode virulence factors that modulate the host immune system^13,14. Various proteins encoded by the genome have been shown to interact with DNA or precursor nucleotides⁵. The K7 protein has been shown to promote histone methylation associated with heterochromatin formation¹⁵. Furthermore, vaccinia virus (VACV) C4¹⁶, C6¹⁷, C16¹⁸, B14¹⁹, E3²⁰, F16²¹, and N2²² gene products can be detected in the host nucleus, thus implicating them in some form of transcriptional regulation. To our knowledge, no research has been aimed towards assessing whether these proteins epigenetically modify the viral DNA. Furthermore, despite what is known of the capability of DNA viruses to exploit host cellular epigenetic processes to their advantage during infection^23,24, the epigenome status of most DNA viruses remains unknown.

Third generation sequencing technologies have advanced epigenomic research by providing platforms that allow for the identification of modified nucleotides from sequencing experiments without the need for specialized secondary sample preparation protocols^25,26,27. Such a direct approach for interrogating an epigenome is particularly beneficial for viral epigenetic research, as samples often contain high amounts of contaminating host DNA, which can complicate specialized DNA methylation probing techniques such as bisulfite sequencing²⁸ and antibody based approached²⁹. Furthermore, motifs with non-canonical epigenetic modifications can be identified by distinguishing a deviation of the raw signal from that of a standard model at a given nucleotide sequence^26,30. Such non-canonical epigenetic modifications would distinguish viral DNA from that of host DNA, making them attractive targets for advanced therapeutics and diagnostics³¹. A drawback of Nanopore sequencing technology is that reads generally suffer from a comparatively high error rate (particularly in regions containing homopolymers) in comparison to other sequencing technologies, although advances in library preparation chemistry, pore technology and algorithms (basecalling, assembly and polishing) have greatly improved overall assembly error rates³².

In this study, we use nanopore sequencing to assemble the genome of a live attenuated CMLV strain, Ducapox, that was stated to comprise a CMLV isolate from the United Arab Emirates (CaPV298-2)³³. The vaccine has since been found to contain two gene regions that more closely resembled that of VACV strain Modified Vaccinia Ankara (VACV-MVA)³⁴. A separate study of the strain using second generation WGS found the vaccine genome matches that of VACV-MVA, with the exception of two genomic deletions (5195 and 890 bp in size), however, the authors questioned the authenticity of these genomic deletions due to both the reference-based assembly approach adopted, and the low sequencing coverage of the genome³⁵. We present a sequencing and annotation pipeline for long read de novo assembly of Poxvirus genomes and identify putative epigenetic modifications within the genome. Using the latest version of signal peptide predication software, we identify a predicted protein with a previously undescribed signal peptide, and present several predicted signal peptides that were found to differ from previously described sequences. The pipeline may be used for other DNA viruses, increasing the understanding of DNA virus epigenomics.

Results

Sequencing statistics and de novo assembly

A total of 405,925 base called sequences were produced from the MinION sequencing run, of which 16,059 (3.95%) remained after size filtering and removal of non-viral DNA (Table 1). Most of the non-viral DNA was found to be of simian origin, consistent with the virus having been propagated in Vero cells. The Flye assembler produced a viral contig that was 195,695 bp in length. After ITR correction and all polishing steps, the assembly was 159,696 bp in length. Read coverage was found to be more uniformly distributed in the final assembly in comparison to the initial assembly (Flye assembly using > 3000 Viral DNA Read Set), the latter of which was found to have uneven read coverage distributions at the contig ends (Fig. 1). This is indicative of the final polished assembly containing terminal repeat sequence lengths that more closely match that of ground truth. Furthermore, a large coverage of reads had mapped to the ITR at the 3′ end of the genomes, indicative of poor ITR assembly, when reads were mapped to the Ducapox short-read assembly (supplementary information 1a). The mappings highlight the short-comings of adopting reference-based alignment assemblies using short-reads, as the large coverage of mapped reads to the 3′ ITR region was also observed when the same > 3000 Viral DNA read set was mapped to VACV Acambis 3000 MVA (supplementary information 1b).

Table 1 Read metrics of sequencing before and after non-viral DNA removal.

Full size table

Whole genome sequence comparisons

A blast search of the final polished assembly revealed the genome to most closely match that of Vaccinia virus strain Acambis 3000 MVA (Genbank Accession: AY603355.1), with a blast percentage identity score of 99.99%. A dotplot comparison of the Ducapox long read assembly vs VACV Acambis 3000 MVA revealed genomic deletions of 5449 bp and 916 bp in size in the Ducapox genome, corresponding to VACV Acambis 3000 MVA genome positions 3735–9183, and 23,219–24,134, respectively (Fig. 2). These deletions were confirmed by visualizing the mapping of reads to the genome assembly, and confirming that unbroken reads traversed the deletion sites (supplementary information 2a and 2b). The VACV Acambis 3000 MVA was also found to be 227 bp and 435 bp longer at its ends, in comparison to the Ducapox genome. The deletions in the Ducapox genome are further contrasted by a multiple sequence alignment between the Ducapox long read genome assembly, the Ducapox short read genome assembly, and the VACV Acambis 3000 MVA genome in supplementary information 2c. Both average and median identity scores were found to be higher, and error rates lower, when the > 3000 Viral DNA read set was mapped to the Ducapox genome than when mapped to VACV Acambis 3000 MVA (Table 2). 2 proteins predicted in the initial long-read assembly were found to be a single protein in the short-read assembly, as a result of a frameshift caused by the insertion of an additional adenine residue in a homopolymer track wherein the length of the homopolymer was 6 adenine residues in the short read assembly, and 7 adenine resides in the long-read assembly causing a truncation of the first protein (supplementary information 2d). Remarkably, in the long-read protein set, a second open reading frame within the first protein that frameshifted resulted in the formation of a second protein that was in-frame with the end portion of the truncated protein (supplementary information 2d).

Table 2 Read alignment identity and error metrics of exclusive viral read set to the long-read assembly and to VACV Acambis 3000 MVA.

Full size table

Genome annotations and functional analyses

The Ducapox genome was found to contain a total of 186 predicted protein coding genes (Fig. 3). A total of 194 genes were initially predicted by Prodigal, however, 8 of these predicted genes were found to contain no functional domain, and had no significant percentage identity to any protein in the Swissprot database, hence were removed from subsequent analyses. 13 out of these 186 proteins were found to contain predicted signal peptides (Table 3, supplementary information 3). A comparison of the proteins predicted by SignalP v5.0 (the latest version) and the signal peptides listed in the Uniprot database revealed that SignalP v5.0 predicted one previously unreported signal peptide in the protein A38L. Two proteins (A39R and HA) were found to have signal peptides predicted by SignalP v5.0 that matched those in the Uniprot database. The remaining 10 proteins contained signal peptides predicted by SignalP v5.0 that differed from those in the Uniprot database (predicted mature protein sequences in supplementary information 4). StructRNAfinder predicted a single structural RNA—the Pox_AX_element (RF00385), whis is involved in directing the efficient production and orientation-dependent formation of late RNAs^36,37. A comparison of the predicted proteins from the long-read assembly against those generated from short read assembly was conducted using a protein blast, by aligning two or more sequences (BLOSUM62 comparison matrix; Gap costs: Existence 11, Extension 1). A total of 176 proteins were found to have equal length and 100% percentage identity between the two genome protein sets. An additional 7 proteins were found to have equal lengths and 100% identity, excepting for the fact that the short-read protein set contained letters that allowed for multiple amino acids to occupy the positions bringing the total identical proteins to 183 (supplementary information 5a). Of the remaining 3 proteins, 2 from the long-read assembly protein set were found to have better hit scores to VACV proteins in the UniProt database, and a single short read protein set had better hit scores to VACV proteins in the UniProt database (supplementary information 5a). Of the additional 10 proteins in the short-read protein set, 13 were found to either have no hit to VACV proteins in the UniProt database, or had hits that were less than half the length of a given protein.

Table 3 Predicted proteins containing predicted signal peptides. For each predicted protein, the conventional signal peptide as stated by Uniprot is listed, as well as the signal peptide predicted by SignalP v5.0. A novel signal peptide was predicted by SignalP v5.0 for the protein A38L.

Full size table

Assessment of putative epigenetic modification sites

A total of three motifs were identified in the Ducapox genome that consistently produced raw signals that diverged from the standard model. The AGAAGRC motif was found at 31 regions within the genome of which 24 regions had a coverage > 50. Signal fluctuations differing from the canonical model were observed around the central AAG nucleotides (Fig. 4). A Tomtom search of the motif detected no similar known motifs. The AARRRGATKH motif was found at 61 regions within the genome of which 48 regions had a coverage > 50. Signal fluctuations differing from the canonical model were observed around the central GA nucleotides (Fig. 5). A Tomtom search of the motif showed the reverse-complement to most closely match MA0467.1 (Crx binding motif; Mus musculus) in the JASPAR database.

The WWAATGWC motif was found to be present at 114 regions within the genome of which 90 regions had a coverage > 50. Signal fluctuations differing from that of the canonical model were observed around the central TGT nucleotides (Fig. 6). A Tomtom search of the motif showed the reverse-complement to most closely match MA1112.1 (NR4A1; Homo sapiens) in the JASPAR database. For each putatively modified motif detected by Tombo, the coverage, genomic position, signal fluctuations compared to a standard model, and number of regions containing each motif can be found in the TomboResultsOutput folder of the project Git (https://github.com/zacksaud/Ducapox-Assembly-Project/tree/master/TomboResultsOutput). No methylation sites with a frequency above 0.5 were detected with Nanopolish (supplementary information 6). No evidence of 5mC methylation was detected by Megalodon (supplementary information 7).

Discussion

Except for two confirmed genomic deletions, the whole genome sequence of this vaccine was shown to closely resemble that of VACV-MVA, supporting our earlier study in which we reported that two gene regions of this vaccine most closely resembled those of the aforementioned strain³⁴. Our findings also corroborate with a previous study that used short read Illumina sequencing, and a reference guided assembly to generate a partial Ducapox genome, wherein the authors noted the putative deletions, but could not confirm the validity of the deletions due to the both the assembly pipeline and sequencing technology used³⁵. At 159,695 bp in length, the vaccine genome, to our knowledge, is the smallest amongst the non-vector derived orthopoxviruses. We postulate that the deletions may have been a result of passage of a misidentified VACV-MVA strain, as it is known that poxvirus genomes tend to decrease in size with serial passage³⁸. It has been demonstrated that VACV has a defined origin of replication, which supports a model for poxvirus genome replication that involves leading and lagging strand synthesis³⁹. Studies on poxvirus DNA replication described putative Okazaki fragments of about 1,000 nt in length (suspiciously similar in size to the 916 bp deletion of the Ducapox sequence) and RNA primers on the 5′-ends of newly made chains of VACV DNA^40,41.

We predicted a previously unreported signal peptide in protein A38L. The A38L gene product is a 33 kDa integral membrane glycoprotein⁴². Overexpression of the protein has been shown to promote Ca²⁺ influx into infected cells⁴³. The latest version of SignalP predicted alternate peptide signals for 10 other proteins. These include; the gene product of C8L—the function of which remains unknown, the gene product of B19R—a type 1 interferon decoy⁴⁴, the gene product of E10R—associated with membranes of intracellular mature virions and plays a role in morphogenesis⁴⁵, the gene product of B8R- another interferon decoy⁴⁴, the gene product of B7R- which is involved with virulence⁴⁶, the gene product of B16R- an IL-1β binding protein⁴⁷, the gene product of SPI-3- a cell fusion inhibitor protein⁴⁸, the gene product of PS/HR—which plays a role in the dissolution of the outermost membrane of extracellular enveloped virions to allow virion entry into host cells and also participates in wrapping mature virions to form enveloped virions⁴⁹, and finally the gene product of A43R—which enhances intradermal lesion formation⁵⁰. Signal peptides play a range of different roles within cells that include marking proteins for secretion, intracellular translocation, and keeping catalytic proteins in an inactive precursor form until the signal peptide is cleaved⁵¹. Further research is needed to determine whether biochemical analyses of these new mature proteins yield any further insight into protein function.

We have presented regions within the Ducapox genome that contain motifs wherein the Nanopore signal diverges from the standard model, which may be indicative of bases within these regions containing epigenetic modifications. Although the Nanopore sequencing is a valuable tool for identifying putative epigenetic sites within a genome, the device does not allow for the identification of either the individual base that is modified, nor does it allow for the identification of the modifying chemical group. Thus, further analyses are required to confirm the results, such as isolation and purification of the motifs containing the putative epigenetic modifications and generating amplicons that could be Nanopore sequenced to confirm reversion of the amplicon raw signal to that of the standard model. Modifications that distinguish viral DNA from that of the host may be targets for advanced therapeutics. Should these epigenetic modifications be confirmed and chemically characterized, another important question would concern whether the modifications were the result of a viral protein, or the result of a host protein, and whether the base modifications are exclusive to the isolate of Vaccinia virus, or more widely distributed amongst poxviruses.

Given the relative cheapness of Nanopore sequencing, future research could investigate the evolutionary trajectory of orthopoxviruses with continued passage. Experiments such as determining whether different evolutionary trajectories occur when a seed stock of a virus is passaged in differing permissive cell lines would be of great interest. Furthermore, the Nanopore would allow for the assessment of differing epigenome modifications with continued passage. Such studies would assist in providing further evidence towards efforts to better understand the origins of Vaccinia virus⁵². Additionally, long read sequencing transcriptomics techniques have recently shed light on the high variation in transcript lengths at certain Vaccinia genome loci, termed chaotic regions^53,54. Long read sequencing coupled with these transcriptomics techniques could provide greater insight into the loss of Poxvirus virulence with passage. Much research has gone into the elucidation of nucleic acid modifying proteins of Vaccinia virus, for instance, Vaccinia virus K7R protein has been shown to promote histone methylation associated with heterochromatin formation¹⁵. Furthermore, it is postulated that epigenetic and genetic mechanisms may also lead to VACV-induced transcription silencing, and VACV infection induces a global degradation of host and viral mRNA⁵⁵. Also, VACV mRNA capping is carried out in three reactions performed by viral enzymes wherein guanine N-7 methylation occurs, and VACV encodes the VP39 protein (J3R) that is known to add a methyl group at the 2′-O position of the first transcribed nucleotide adjacent to the 5′ cap⁵⁵. Poxviruses are unique among most DNA viruses in that DNA replication occurs in the cytoplasm, independent of the nucleus of the infected host cell, and accordingly, its genome encodes for factors required for both cytoplasmic transcription as well as DNA replication⁵. Hence should the putative epigenetic modifications of the viral DNA be validated, it would be likely that either viral proteins, or host cytoplasmic proteins would be implicated in the base modification process, as opposed to host nuclear proteins. Many mammalian cytoplasmic proteins are known to bind viral nucleic acids⁵⁶.

To conclude, we have developed a novel assembly pipeline for long read sequencing of Poxvirus genomes, that corrects the lengths of terminal ends. The two confirmed deletions of this vaccine strain in comparison to VACV-MVA make it one of the smallest non-vector derived orthopoxvirus genomes to be reported. We have used the latest software for signal peptide prediction to discover a novel predicted signal peptide in a VACV protein that has not been previously reported, as well as discovering 10 alternate predicted signal peptides in comparisons to those previously reported. We have presented putative epigenetic modifications within the Ducapox genome, based on divergence of the raw signals from a standard model for given sequence motifs. The methods we have detailed may be used for other viral genomes, thus aiding the understanding of the molecular mechanisms underpinning viral virulence, evolution and host preferences.

Methods

Source and composition of vaccine

A commercial live attenuated ‘Ducapox’ vaccine was sourced from Al Bashayer Veterinary Supplies (Dubai, United Arab Emirates), manufactured by Design Biologix (Pretoria, South Africa) and commercialized by Highveld Biological Ltd (Johannesburg, South Africa). The CMLV strain CaPV298-2, the parent strain of this vaccine, was originally isolated in the United Arab Emirates and attenuated through serial passage in Vero cell culture³³. Manufacture and expiry dates were 07–2018 and June 2019, respectively and the batch number was DPV0818.

DNA extraction

DNA was extracted using the QIAamp DNA Mini kit (Catalog # 51304, Qiagen, Hilden, Germany), following the DNA purification from tissues protocol, adding 180 μL of Buffer ATL to 25 mg of lyophilized vaccine and following the manufacturer's guidelines with the addition of adding 5 μg of Carrier RNA Poly A (Catalog # 1,017,647, Qiagen, Hilden, Germany) to the 200 μL of Buffer AL solution. The DNA preparation was analyzed for purity on a nanodrop spectrophotometer (ThermoScientific, Rochester, USA), and the concentration was determined using a Qubit dsDNA assay kit (ThermoScientific, Rochester, USA) and a Qubit 4 fluorometer (ThermoScientific, Rochester, USA).

Preparation of nanopore library and sequencing

400 ng of genomic DNA was used for Nanopore library preparation using a Rapid Sequencing Kit (SQK-RAD004, Oxford Nanopore Technologies) and barcode 18 of the Native Barcoding Expansion kit (EXP-NBD114, Oxford Nanopore Technologies). Multiplexed sequencing was performed on a MinION device (Oxford Nanopore Technologies), equipped with a R9.4.1 MinION flow cell. Base calling was performed offline with ONT’s Guppy software pipeline version 4.0.11, enabling the—pt_scaling flag, setting—trim_strategy to DNA, loading the dna_r9.4.1_450bps_hac configuration files, and setting—barcode_kits EXP-NBD114.

Long read— pre-processing, assembly, and polishing

Long read adapter trimming was performed with Porechop version 0.2.4 (www.github.com/rrwick/Porechop), setting both the—adapter_threshold and—barcode_threshold to 98. The trimmed long reads were filtered to remove reads under 3000 bases in length using NanoFilt version 2.6.0⁵⁷. The adapter trimmed, filtered long reads were assembled using Flye version 2.8⁵⁸ using the—nano-raw,—meta,—trestle and—keep-haplotypes flags. A fasta file of non-viral assembled contigs (identified using a blast search) was made from the assembly output using Bandage version 0.8.1⁵⁹. The adapter trimmed, filtered long reads were mapped to the non-viral assembled contigs using minimap2 version 2.17-r941⁶⁰, and the unmapped reads were extracted from the alignment file and converted to FASTQ using samtools⁶¹, thus generating a read set exclusively containing viral DNA. The virus specific reads were assembled using Flye version 2.8, enabling the—nano-raw, setting the minimum overlap to 5000 using the -m 5000 flag, and conducting 3 polishing iterations by setting the -i 3 flag. The assembly was polished, correcting the ITR regions, using the—only-polish flag of the tandemquast tool of the TandemTools package⁶². Long reads were mapped to the assembly using minimap2 version 2.17-r941, and the resulting alignment file was used to polish the assembly with Racon version v1.4.13⁶³ using the following parameters: -m 8 -x -6 -g -8 -w 500 -no-trimming. A total of 3 rounds of mapping and polishing with Racon were done on the assembly, after which no changes were observed. The corrected consensus was further polished with the same long read set using Medaka version 0.11.5 (https://github.com/nanoporetech/medaka), setting the—m r941_min_high_g360 flag. Figure 7 shows a graphical representation of the full assembly pipeline.

Assessment of assemblies and whole genome comparisons

The non-viral-DNA-free, adapter trimmed, filtered long reads were mapped to both the initial Flye assembly, and the final polished assembly in order to manually assess for the absence of read mapping breaks by plotting read mapping coverage of genome assemblies using pyGenomeTracks version 3.5⁶⁴. Genome comparisons were performed using the nucmer tool of Mummer 3⁶⁵. The final polished assembly was compared against the short-read Ducapox assembly (Genbank accession: MT648498.1) and Vaccinia virus strain Acambis 3000 MVA (Genbank accession: AY603355.1), the closest matching genome to the long-read assembly as determined by an online BLAST search.

Genome annotation

The polished assembly was annotated using Prodigal v2.6.3⁶⁶. The annotation gff3 file was loaded into GenSAS suite version 6.0⁶⁷, after which functional analyses were conducted in the suite using InterProScan version 5.25–68.0⁶⁸ and the ab initio predicted proteins were identified using blastp⁶⁹ by conducting a protein vs protein search against the SwissProt protein data set to determine best matches. Protein sequences were analyzed for predicted signal peptides using the SignalP v5.0⁷⁰. Non-coding RNAs were detected using StructRNAfinder⁷¹.

Assessment of putative epigenetic modification sites

A total of 2214 Fast5 files (599.9 MB) that mapped to the long-read assembly were extracted using the fast5seek tool (github.com/mbhall88/fast5seek). The Tombo suite²⁶ was used to detect Nanopore raw signals that diverged from the standard model, which could signify epigenetic modification sites. After running Tombo’s resquiggle function using the final polished genome, the detect_modifications function was run using the de_novo model with default parameters (dampened fraction estimation [2, 0]). The results of the stats file was converted to a FASTA file using the text_output function of Tombo, setting—num-regions 1000 and—num-bases 15. The central 7 nucleotides of each entry of the fasta file was plotted using the motif_with_stats (plotting the standard model, and default dampened fraction estimation [2, 0]) in Tombo, using the maximum—num-statistics number that would produce a plot for each fasta entry (determined empirically) for all entries with scores > 0.7 for “Frac. Alternate” in the fasta file. The motif_with_stats plots were assessed manually, and the motifs from plots containing increases in the fraction of modified bases (− log10(P-value) exclusively around the central motif only were kept, and these were used to create a separate fasta file containing all motifs for each of the four modified bases that were manually detected from the plots. Meme v5.1.1⁷² was used on each individual fasta file using the—dna and—mod zoops flags to determine motifs. Motifs were compared to known motifs using Tomtom v5.1.1⁷³. Nanopolish v0.13.3 was used to assess for 5mC and 6 mA epigenetic modifications (75), setting a methylation frequency of above 0.5 as indicative of evidence for methylation. The presence of 5 mC epigenetic modifications were also assessed using Megalodon (github.com/nanoporetech/megalodon).

Data availability

All data generated in this study has been deposited at the NCBI under Bioproject PRJNA663037. Nanopore sequencing read data can be accessed at the NCBI SRA using the accession number SRR12667950. Sample information can be accessed at the NCBI BioSample repository using the accession number SAMN16115327. The long-read Ducapox genome assembly generated in this study can be accessed using GenBank accession number MT946551 (The 159,696 bp assembly as version MT946551.1 and the corrected 159,695 bp assembly as version MT946551.2). The short-read Ducapox assembly and protein sequences can be accessed using GenBank accession number MT648498.1. The Vaccinia Virus strain Acambis 3000 MVA genomes can be accessed using GenBank accession number AY603355.1. Gene and protein names, and functional annotations (GO terms, InterPro, PFAM) are included in GenBank entries. Bioinformatics tool output files have been deposited in the following GitHub repository-https://github.com/zacksaud/Ducapox-Assembly-Project, as well as in the supplementary information.

References

Fenner, F., Henderson, D.A., Arita, I., Jezek, Z. & Ladnyi, I.D. Smallpox and its eradication. Geneva: World Health Organization; 1988. [March 14, 2003]. p. 1460. Reference out-of-print. See the World Health Organization, Communicable Disease Surveillance and Response Web site. www.who.int/emc/diseases/smallpox/smallpoxeradication.html.
Jenner, E. An inquiry into the causes and effects of the variole vaccinae, a disease discovered in some of the Western Counties of England, Particularly Gloucestershire and Known by the Name of the cow‐pox. London: Sampson Low, 1798.
Sklenovská, N. & Van Ranst, M. Emergence of monkeypox as the most important orthopoxvirus infection in humans. Front. Public Health 6, 241. https://doi.org/10.3389/fpubh.2018.00241 (2018).
Article PubMed PubMed Central Google Scholar
Gubser, C. & Smith, G. L. The sequence of camelpox virus shows it is most closely related to variola virus, the cause of smallpox. J. Gen. Virol. 83, 855–872. https://doi.org/10.1099/0022-1317-83-4-855 (2002).
Article CAS PubMed Google Scholar
Moss, B. Poxvirus DNA replication. Cold Spring Harb. Perspect. Biol. 5(9), a010199. https://doi.org/10.1101/cshperspect.a010199 (2013).
Article CAS PubMed PubMed Central Google Scholar
Winters, E., Baroudy, B. M. & Moss, B. Molecular cloning of the terminal hairpin of vaccinia virus DNA as an imperfect palindrome in an Escherichia coli plasmid. Gene 37, 221–228. https://doi.org/10.1016/0378-1119(85)90276-8 (1985).
Article CAS PubMed Google Scholar
Hendrickson, R. C., Wang, C., Hatcher, E. L. & Lefkowitz, E. J. Orthopoxvirus genome evolution: The role of gene loss. Viruses 2(9), 1933–1967. https://doi.org/10.3390/v2091933 (2010).
Article CAS PubMed PubMed Central Google Scholar
Sanger, F., Nicklen, S. & Coulson, A. R. DNA sequencing with chain-terminating inhibitors. Proc. Natl. Acad. Sci. USA 74(12), 5463–5467. https://doi.org/10.1073/pnas.74.12.5463 (1977).
Article ADS CAS PubMed PubMed Central Google Scholar
Bennett, S. Solexa ltd. Pharmacogenomics 5(4), 433–438. https://doi.org/10.1517/14622416.5.4.433 (2004).
Article PubMed Google Scholar
Kasianowicz, J. J., Brandin, E., Branton, D. & Deamer, D. W. Characterization of individual polynucleotide molecules using a membrane channel. Proc. Natl. Acad. Sci. USA 93, 3770–3773. https://doi.org/10.1073/pnas.93.24.13770 (1996).
Article Google Scholar
Jain, M., Olsen, H. E., Paten, B. & Akeson, M. The Oxford Nanopore MinION: Delivery of nanopore sequencing to the genomics community. Genome Biol. 17(1), 239. https://doi.org/10.1186/s13059-016-1103-0 (2016).
Article CAS PubMed PubMed Central Google Scholar
Eid, J. et al. Real-time DNA sequencing from single polymerase molecules. Science 323(5910), 133–138. https://doi.org/10.1126/science.1162986 (2009).
Article ADS CAS PubMed Google Scholar
Gubser, C., Hué, S., Kellam, P. & Smith, G. L. Poxvirus genomes: A phylogenetic analysis. J. Gen. Virol. 85(1), 105–117. https://doi.org/10.1099/vir.0.19565-0 (2004).
Article CAS PubMed Google Scholar
Moss, B. Poxviridae: The viruses and their replication. In Fields Virology 4th edn (eds Knipe, D. M. & Howley, P. M.) 2849–2883 (Lippincott Williams & Wilkins, Philadelphia, 2001).
Google Scholar
Teferi, W. M. et al. The vaccinia virus K7 protein promotes histone methylation associated with heterochromatin formation. PLoS ONE 12(3), e0173056. https://doi.org/10.1371/journal.pone.0173056 (2017).
Article CAS PubMed PubMed Central Google Scholar
Ember, S. W., Ren, H., Ferguson, B. J. & Smith, G. L. Vaccinia virus protein C4 inhibits NF-κB activation and promotes virus virulence. J. Gen. Virol. 93(10), 2098–2108. https://doi.org/10.1099/vir.0.045070-0 (2012).
Article CAS PubMed PubMed Central Google Scholar
Unterholzner, L. et al. Vaccinia virus protein C6 is a virulence factor that binds TBK-1 adaptor proteins and inhibits activation of IRF3 and IRF7. PLoS Pathog. 7(9), e1002247. https://doi.org/10.1371/journal.ppat.1002247 (2011).
Article CAS PubMed PubMed Central Google Scholar
Fahy, A. S., Clark, R. H., Glyde, E. F. & Smith, G. L. Vaccinia virus protein C16 acts intracellularly to modulate the host response and promote virulence. J. Gen. Virol. 89(10), 2377–2387. https://doi.org/10.1099/vir.0.2008/004895-0 (2008).
Article CAS PubMed PubMed Central Google Scholar
Benfield, C. T. et al. Mapping the IkappaB kinase beta (IKKbeta)-binding interface of the B14 protein, a vaccinia virus inhibitor of IKKbeta-mediated activation of nuclear factor kappaB. J. Biol. Chem. 286(23), 20727–22035. https://doi.org/10.1074/jbc.M111.231381 (2011).
Article CAS PubMed PubMed Central Google Scholar
Yuwen, H., Cox, J. H., Yewdell, J. W., Bennink, J. R. & Moss, B. Nuclear localization of a double-stranded RNA-binding protein encoded by the vaccinia virus E3L gene. Virology 195(2), 732–744. https://doi.org/10.1006/viro.1993.1424 (1993).
Article CAS PubMed Google Scholar
Senkevich, T. G., Koonin, E. V. & Moss, B. Vaccinia virus F16 protein, a predicted catalytically inactive member of the prokaryotic serine recombinase superfamily, is targeted to nucleoli. Virology 417(2), 334–342. https://doi.org/10.1016/j.virol.2011.06.017 (2011).
Article CAS PubMed Google Scholar
Ferguson, B. J. et al. Vaccinia virus protein N2 is a nuclear IRF3 inhibitor that promotes virulence. J. Gen. Virol. 94(9), 2070–2081. https://doi.org/10.1099/vir.0.054114-0 (2013).
Article CAS PubMed PubMed Central Google Scholar
Knipe, D. M. Nuclear sensing of viral DNA, epigenetic regulation of herpes simplex virus infection, and innate immunity. Virology 479–480, 153–159. https://doi.org/10.1016/j.virol.2015.02.009 (2015).
Article CAS PubMed Google Scholar
Tsai, K. & Cullen, B. R. Epigenetic and epitranscriptomic regulation of viral replication. Nat. Rev. Microbiol. 1, 1. https://doi.org/10.1038/s41579-020-0382-3 (2020).
Article CAS Google Scholar
Flusberg, B. A. et al. Direct detection of DNA methylation during single-molecule, real-time sequencing. Nat. Methods 7(6), 461–465. https://doi.org/10.1038/nmeth.1459 (2010).
Article CAS PubMed PubMed Central Google Scholar
Stoiber, M. H. et al. De novo identification of DNA modifications enabled by genome-guided nanopore signal processing. BioRxiv 2017, 094672. https://doi.org/10.1101/094672 (2017).
Article CAS Google Scholar
Amarasinghe, S. L. et al. Opportunities and challenges in long-read sequencing data analysis. Genome Biol. 21(1), 30. https://doi.org/10.1186/s13059-020-1935-5 (2020).
Article PubMed PubMed Central Google Scholar
Frommer, M. et al. A genomic sequencing protocol that yields a positive display of 5-methylcytosine residues in individual DNA strands. Proc. Natl. Acad. Sci. 89(5), 1827–1831. https://doi.org/10.1073/pnas.89.5.1827 (1992).
Article ADS CAS PubMed PubMed Central Google Scholar
Feederle, R. & Schepers, A. Antibodies specific for nucleic acid modifications. RNA Biol. 14(9), 1089–1098. https://doi.org/10.1080/15476286.2017.1295905 (2017).
Article PubMed PubMed Central Google Scholar
Müller, C. A. et al. Capturing the dynamics of genome replication on individual ultra-long nanopore sequence reads. Nat. Methods 16, 429–436. https://doi.org/10.1038/s41592-019-0394-y (2019).
Article CAS PubMed Google Scholar
Nehme, Z., Pasquereau, S. & Herbein, G. Control of viral infections by epigenetic-targeted therapy. Clin. Epigenet. 11, 55. https://doi.org/10.1186/s13148-019-0654-9 (2019).
Article Google Scholar
Kono, N. & Arakawa, K. Nanopore sequencing: Review of potential applications in functional genomics. Dev. Growth Differ. 61(5), 316–326. https://doi.org/10.1111/dgd.12608 (2019).
Article PubMed Google Scholar
Kaaden, D. R., Walz, C.P. Czerny, U. Wernery, U. & Allen., W. R. Progress in the development of a camel pox vaccine. Proceeding of the 1st Int. Camel Conference, 47–49 (1992)
Saud, Z. & Butt, T. M. Another case of mistaken identity? Vaccinia virus in another live Camelpox vaccine. Biologicals 65, 39–41. https://doi.org/10.1016/j.biologicals.2020.04.002 (2020).
Article CAS PubMed Google Scholar
Marcacci, M. et al. Genome sequencing of a camelpox vaccine reveals close similarity to modified vaccinia virus ankara (MVA). Viruses 12(8), E786. https://doi.org/10.3390/v12080786 (2020).
Article CAS PubMed Google Scholar
Howard, S. T., Ray, C. A., Patel, D. D., Antczak, J. B. & Pickup & D.J. ,. A 43-nucleotide RNA cis-acting element governs the site-specific formation of the 3′ end of a poxvirus late mRNA. Virology 255, 190–204. https://doi.org/10.1006/viro.1998.9547 (1999).
Article CAS PubMed Google Scholar
D’Costa, S. M., Antczak, J. B., Pickup, D. J. & Condit, R. C. Post-transcription cleavage generates the 3′ end of F17R transcripts in vaccinia virus. Virology 319(1), 1–11. https://doi.org/10.1016/j.virol.2003.09.041 (2004).
Article CAS PubMed Google Scholar
Lefkowitz, E. J. et al. Poxvirus bioinformatics resource center: A comprehensive Poxviridae informational and analytical resource. Nucleic Acids Res. 33, D311-316. https://doi.org/10.1093/nar/gki110 (2005).
Article CAS PubMed Google Scholar
Senkevich, T. G. et al. Mapping vaccinia virus DNA replication origins at nucleotide level by deep sequencing. Proc. Natl. Acad. Sci. USA 112(35), 10908–10913. https://doi.org/10.1073/pnas.1514809112 (2015).
Article ADS CAS PubMed PubMed Central Google Scholar
Esteban, M. & Holowczak, J. A. Replication of vaccinia DNA in mouse L cells. I. In vivo DNA synthesis. Virology 78(1), 57–75. https://doi.org/10.1016/0042-6822(77)90078-2 (1977).
Article CAS PubMed Google Scholar
Pogo, B. G. T. & O’Shea, M. T. The mode of replication of vaccinia virus DNA. Virology 84(1), 1–8. https://doi.org/10.1016/0042-6822(78)90213-1 (1978).
Article CAS PubMed Google Scholar
Parkinson, J. E., Sanderson, C. M. & Smith, G. L. The vaccinia virus A38L gene product is a 33-kDa integral membrane glycoprotein. Virology 214(1), 177–188. https://doi.org/10.1006/viro.1995.9942 (1995).
Article CAS PubMed Google Scholar
Sanderson, C. M., Parkinson, J. E., Hollinshead, M. & Smith, G. L. Overexpression of the vaccinia virus A38L integral membrane protein promotes Ca2+ influx into infected cells. J. Virol. 70(2), 905–914. https://doi.org/10.1128/JVI.70.2.905-914.1996 (1996).
Article CAS PubMed PubMed Central Google Scholar
Alcamí, A., Symons, J. A. & Smith, G. L. The vaccinia virus soluble alpha/beta interferon (IFN) receptor binds to the cell surface and protects cells from the antiviral effects of IFN. J. Virol. 74(23), 11230–11239. https://doi.org/10.1128/jvi.74.23.11230-11239.2000 (2000).
Article PubMed PubMed Central Google Scholar
Senkevich, T. G., Weisberg, A. S. & Moss, B. Vaccinia virus E10R protein is associated with the membranes of intracellular mature virions and has a role in morphogenesis. Virology 278(1), 244–252. https://doi.org/10.1006/viro.2000.0656 (2000).
Article CAS PubMed Google Scholar
Price, N., Tscharke, D. C., Hollinshead, M. & Smith, G. L. Vaccinia virus gene B7R encodes an 18-kDa protein that is resident in the endoplasmic reticulum and affects virus virulence. Virology 267(1), 65–79. https://doi.org/10.1006/viro.1999.0116 (2000).
Article CAS PubMed Google Scholar
Meisinger-Henschel, C. et al. Introduction of the six major genomic deletions of modified vaccinia virus Ankara (MVA) into the parental vaccinia virus is not sufficient to reproduce an MVA-like phenotype in cell culture and in mice. J. Virol. 84(19), 9907–9919. https://doi.org/10.1128/JVI.00756-10 (2010).
Article CAS PubMed PubMed Central Google Scholar
Turner, P. C. & Moyer, R. W. The vaccinia virus fusion inhibitor proteins SPI-3 (K2) and HA (A56) expressed by infected cells reduce the entry of superinfecting virus. Virology 380(2), 226–233. https://doi.org/10.1016/j.virol.2008.07.020 (2008).
Article CAS PubMed Google Scholar
Roberts, K. L. et al. Acidic residues in the membrane-proximal stalk region of vaccinia virus protein B5 are required for glycosaminoglycan-mediated disruption of the extracellular enveloped virus outer membrane. J. Gen. Virol. 90(Pt 7), 1582–1591. https://doi.org/10.1099/vir.0.009092-0 (2009).
Article CAS PubMed PubMed Central Google Scholar
Sood, C. L. & Moss, B. Vaccinia virus A43R gene encodes an orthopoxvirus-specific late non-virion type-1 membrane protein that is dispensable for replication but enhances intradermal lesion formation. Virology 396(1), 160–168. https://doi.org/10.1016/j.virol.2009.10.025 (2010).
Article CAS PubMed Google Scholar
Owji, H., Nezafat, N., Negahdaripour, M., Hajiebrahimi, A. & Ghasemi, Y. A comprehensive review of signal peptides: Structure, roles, and applications. Eur. J. Cell Biol. 97(6), 422–441. https://doi.org/10.1016/j.ejcb.2018.06.003 (2018).
Article CAS PubMed Google Scholar
Duggan, A. T. et al. The origins and genomic diversity of American Civil War Era smallpox vaccine strains. Genome Biol. 21, 175. https://doi.org/10.1186/s13059-020-02079-z (2020).
Article PubMed PubMed Central Google Scholar
Tombácz, D. et al. Dynamic transcriptome profiling dataset of vaccinia virus obtained from long-read sequencing techniques. Gigascience 7(12), 1139. https://doi.org/10.1093/gigascience/giy139 (2018).
Article CAS Google Scholar
Tombácz, D. et al. Long-read assays shed new light on the transcriptome complexity of a viral pathogen. Sci. Rep. 10(1), 13822. https://doi.org/10.1038/s41598-020-70794-5(2020) (2020).
Article PubMed PubMed Central Google Scholar
Dhungel, P., Cantu, F. M., Molina, J. A. & Yang, Z. Vaccinia virus as a master of host shutoff induction: Targeting processes of the central dogma and beyond. Pathogens 9(5), 400. https://doi.org/10.3390/pathogens9050400 (2020).
Article CAS PubMed Central Google Scholar
Habjan, M. & Pichlmair, A. Cytoplasmic sensing of viral nucleic acids. Curr. Opin. Virol. 11, 31–37. https://doi.org/10.1016/j.coviro.2015.01.012 (2015).
Article CAS PubMed PubMed Central Google Scholar
De Coster, W., D’Hert, S., Schultz, D. T., Cruts, M. & Van Broeckhoven, C. NanoPack: Visualizing and processing long-read sequencing data. Bioinformatics 34(15), 2666–2669. https://doi.org/10.1093/bioinformatics/bty149 (2018).
Article CAS PubMed PubMed Central Google Scholar
Kolmogorov, M., Yuan, J., Lin, Y. & Pevzner, P. A. Assembly of long, error-prone reads using repeat graphs. Nat. Biotechnol. 37(5), 540–546. https://doi.org/10.1038/s41587-019-0072-8 (2019).
Article CAS PubMed Google Scholar
Wick, R. R., Schultz, M. B., Zobel, J. & Holt, K. E. Bandage: Interactive visualisation of de novo genome assemblies. Bioinformatics 31(20), 3350–3352. https://doi.org/10.1093/bioinformatics/btv383 (2015).
Article CAS PubMed PubMed Central Google Scholar
Li, H. Minimap2: Pairwise alignment for nucleotide sequences. Bioinformatics 34(18), 3094–3100. https://doi.org/10.1093/bioinformatics/bty191 (2018).
Article CAS PubMed PubMed Central Google Scholar
Li, H. et al. The Sequence alignment/map (SAM) format and SAMtools. Bioinformatics 25(16), 2078–2079. https://doi.org/10.1093/bioinformatics/btp352 (2009).
Article CAS PubMed PubMed Central Google Scholar
Mikheenko, A., Bzikadze, A. V., Gurevich, A., Miga, K. H. & Pevzner, P. A. TandemTools: Mapping long reads and assessing/improving assembly quality in extra-long tandem repeats. Bioinformatics 36(1), i75–i83. https://doi.org/10.1093/bioinformatics/btaa440 (2020).
Article CAS PubMed PubMed Central Google Scholar
Vaser, R., Sović, I., Nagarajan, N. & Šikić, M. Fast and accurate de novo genome assembly from long uncorrected reads. Genome Res. 27(5), 737–746. https://doi.org/10.1101/gr.214270.116 (2017).
Article CAS PubMed PubMed Central Google Scholar
Ramírez, F. et al. High-resolution TADs reveal DNA sequences underlying genome organization in flies. Nat. Commun. 9(1), 189. https://doi.org/10.1038/s41467-017-02525-w (2018).
Article ADS CAS PubMed PubMed Central Google Scholar
Kurtz, S. et al. Versatile and open software for comparing large genomes. Genome Biol. 5(2), R12. https://doi.org/10.1186/gb-2004-5-2-r12 (2004).
Article PubMed PubMed Central Google Scholar
Hyatt, D. et al. Prodigal: Prokaryotic gene recognition and translation initiation site identification. BMC Bioinform. 11, 119. https://doi.org/10.1186/1471-2105-11-119 (2010).
Article CAS Google Scholar
Humann, J. L., Lee, T., Ficklin, S. & Main, D. Structural and functional annotation of eukaryotic genomes with GenSAS. Methods Mol. Biol. 1962, 29–51. https://doi.org/10.1007/978-1-4939-9173-0_3 (2019).
Article CAS PubMed Google Scholar
Jones, P. et al. InterProScan 5: Genome-scale protein function classification. Bioinformatics 30(9), 1236–1240. https://doi.org/10.1093/bioinformatics/btu03 (2014).
Article CAS PubMed PubMed Central Google Scholar
Camacho, C. et al. BLAST+: Architecture and applications. BMC Bioinform. 10, 421. https://doi.org/10.1186/1471-2105-10-421 (2009).
Article CAS Google Scholar
Almagro Armenteros, J. J. et al. (2019) SignalP 5.0 improves signal peptide predictions using deep neural networks. Nat. Biotechnol. 37(4), 420–423. https://doi.org/10.1038/s41587-019-0036-z (2019).
Article CAS PubMed Google Scholar
Arias-Carrasco, R. et al. StructRNAfinder: An automated pipeline and web server for RNA families prediction. BMC Bioinform. 19, 55. https://doi.org/10.1186/s12859-018-2052-2 (2018).
Article CAS Google Scholar
Bailey, T. L. & Elkan, C. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc. Int. Conf. Intell. Syst. Mol. Biol. 2, 28–36 (1994).
CAS PubMed Google Scholar
Gupta, S., Stamatoyannopoulos, J. A., Bailey, T. L. & Noble, W. S. Quantifying similarity between motifs. Genome Biol. 8(2), R24. https://doi.org/10.1186/gb-2007-8-2-r24 (2007).
Article CAS PubMed PubMed Central Google Scholar
Simpson, J. T. et al. Detecting DNA cytosine methylation using nanopore sequencing. Nat. Methods 14(4), 407–410. https://doi.org/10.1038/nmeth.4184 (2017).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. We thank staff at the University of Sharjah for facilitating the use of their facilities for DNA extraction.

Author information

These authors contributed equally: Zack Saud, Matthew D. Hitchings and Tariq M. Butt.

Authors and Affiliations

Department of Biosciences, College of Science, Swansea University, Singleton Park, Swansea, SA2 8PP, Wales, UK
Zack Saud & Tariq M. Butt
Swansea University Medical School, Swansea University, Singleton Park, Swansea, Sa2 8PP, Wales, UK
Matthew D. Hitchings

Authors

Zack Saud
View author publications
You can also search for this author in PubMed Google Scholar
Matthew D. Hitchings
View author publications
You can also search for this author in PubMed Google Scholar
Tariq M. Butt
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Z.S. performed DNA extraction, bioinformatics analyses, and wrote the manuscript. M.D.H. performed Nanopore sequencing. T.M.B provided oversight, reviewed the manuscript and provided laboratory support.

Corresponding author

Correspondence to Zack Saud.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information 1.

Supplementary Information 2.

Supplementary Information 3.

Supplementary Information 4.

Supplementary Information 5.

Supplementary Information 6.

Supplementary Information 7.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Saud, Z., Hitchings, M.D. & Butt, T.M. Nanopore sequencing and de novo assembly of a misidentified Camelpox vaccine reveals putative epigenetic modifications and alternate protein signal peptides. Sci Rep 11, 17758 (2021). https://doi.org/10.1038/s41598-021-97158-x

Download citation

Received: 20 September 2020
Accepted: 19 August 2021
Published: 07 September 2021
DOI: https://doi.org/10.1038/s41598-021-97158-x

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Introduction

Results

Sequencing statistics and de novo assembly

Whole genome sequence comparisons

Genome annotations and functional analyses

Assessment of putative epigenetic modification sites

Discussion

Methods

Source and composition of vaccine

DNA extraction

Preparation of nanopore library and sequencing

Long read— pre-processing, assembly, and polishing

Assessment of assemblies and whole genome comparisons

Genome annotation

Assessment of putative epigenetic modification sites

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's note

Supplementary Information

Rights and permissions

About this article

Cite this article

Share this article

Comments

Search

Quick links