A novel nudivirus infecting the invasive demon shrimp Dikerogammarus haemobaphes (Amphipoda)

The Nudiviridae are a family of large double-stranded DNA viruses that infects the cells of the gut in invertebrates, including insects and crustaceans. The phylogenetic range of the family has recently been enhanced via the description of viruses infecting penaeid shrimp, crangonid shrimp, homarid lobsters and portunid crabs. Here we extend this by presenting the genome of another nudivirus infecting the amphipod Dikerogammarus haemobaphes. The virus, which infects cells of the host hepatopancreas, has a circular genome of 119,754 bp in length, and encodes a predicted 106 open reading frames. This novel virus encodes all the conserved nudiviral genes (sharing 57 gene homologues with other crustacean-infecting nudiviruses) but appears to lack the p6.9 gene. Phylogenetic analysis revealed that this virus branches before the other crustacean-infecting nudiviruses and shares low levels of gene/protein similarity to the Gammanudivirus genus. Comparison of gene synteny from known crustacean-infecting nudiviruses reveals conservation between Homarus gammarus nudivirus and Penaeus monodon nudivirus; however, three genomic rearrangements in this novel amphipod virus appear to break the gene synteny between this and the ones infecting lobsters and penaeid shrimp. We explore the evolutionary history and systematics of this novel virus, suggesting that it be included in the novel Epsilonnudivirus genus (Nudiviridae).

www.nature.com/scientificreports/ histological Section 8,11 . Histopathologically, nudiviral infection results in nuclear hypertrophy of the hepatopancreatocytes, caused primarily by a growing viroplasm; this is explored in full for the virus in D. haemobaphes in Bojko et al. 11 . Also in D. haemobaphes, infection prevalence of up to 77.7% has been correlated with altered behaviour in infected animals 11 . The behavioural change was associated with increased activity, which positively correlated with viral burden, potentially indicating some benefit to viral transmission through increased movement 11 . Many of the amphipod hosts in which putative nudiviruses have been reported are non-native or invasive species present outside of their native ranges. Infection with this viral family may therefore have potential for transboundary transmission when their hosts are present in their invasive range 7,10,11 . Genomic data is currently lacking for all the putative nudiviruses infecting amphipods. In this study, we provide full genome characterisation of a nudivirus infecting the amphipod D. haemobaphes collected from outside of its native range. We use these data to provisionally name the virus as Dikerogammarus haemobaphes nudivirus (DhNV) and place the virus within a newly suggested genus Epsilonnudivirus of the family Nudiviridae.

Results
Genome structure of DhnV. The circular genome of DhNV is 119,754 bp and contains 106 hypothetical ORFs, with 56 on the positive strand and 50 on the negative strand (Fig. 1). Fifty-nine of the ORFs met our comparative e-value threshold of < 0.001 and were directly comparable to other members of the Nudiviridae. Up to 17% and 37% of the ORFs aligned most closely with genes from HgNV and PmNV, respectively. Two ORFs, DhNV_008 and DhNV_002, scored above 50% similarity to protein sequences on BLASTp. DhNV_008 was 52.33% similar to pif-2 from HgNV (QBB28614) with per os infectivity as the only identified protein domain. DhNV-002 was 50% similar to a hypothetical protein from PmNV (YP_009051845) where the cytoplasmic, non-cytoplasmic, tmhelix, and transmembrane domains were identified. Among the 47 ORFs that provided no similarity to other protein sequences within the threshold, InterProScan assessment identified 20 ORFs with functional domains. A protein signature match to the inhibitor of apoptosis repeat superfamily in DhNV_059 may indicate the presence of a homolog of the Iap nudivirus gene; however, BLASTp annotations did not yield any similarity results to the Iap gene found in other nudiviruses. The remaining 19 ORFs contained proteins with Zinc finger domains (DhNV_045), Tmhelix (DhNV_044), signal peptide (DhNV_084), p-loop containing nucleoside triphosphate hydrolase (DhNV_047), non-cytoplasmic domain (DhNV_025, 028, and 057), disorder predictions (DhNV_011, 020, 024, 072, 077, and 078), predicted to be cytoplasmic (DhNV_019 and 065), or had 'coil' feature(s) (DhNV_022, 026, 042, and 086).
Gene synteny among the Epsilonnudivirus, Gammanudivirus and Deltanudivirus genera. A comparison of gene synteny between DhNV and three other nudiviruses determined that DhNV had a different gene synteny to members of the Gammanudivirus and Deltanudivirus genera (Fig. 2). Comparison between ToNV and DhNV revealed a high level of genomic rearrangement, where the 32 genes that showed genetic similarity (e < 0.001) with those ORFs on the DhNV genome were located across the respective genomes, showing little conserved synteny (Fig. 2a). Comparison between DhNV and PmNV/HgNV revealed higher levels of gene synteny (Fig. 2b,c). A comparison using all three viruses identified 12 major regions of genetic novelty in the DhNV genome (Fig. 2d). This included 47 hypothetical ORFs that were unique to DhNV and showed little genetic/protein relatedness to other nudiviruses within the e-value threshold of < 0.001, one of which (DhNV_070) showed highest similarity to a gene from Pyricularia oryzae, a fungal plant pathogen, and another (DhNV_029) with highest similarity to Sucra jujuba nucleopolyhedrovirus (Table 1).

Morphological and phylogenetic comparison to other Nudiviridae. A concatenated maximum
likelihood phylogenetic analysis of eight nudiviruses and one baculovirus (outgroup) using 18 core nudivirus genes (see Sect. 4) supported the positioning of DhNV outside of the two crustacean-infecting nudiviruses with bootstrap values of 100% (Fig. 3). Within this grouping, DhNV is an early branching member of the Gammanudivirus genus and may constitute a different genus altogether. The Betanudivirus genus branches outside of the Gammanudivirus cluster and the Deltanudivirus member ToNV is the earliest branching member of these three genera. The Alphanudivirus group represents the most phylogenetically distinct nudivirus genus represented on our diagram (Fig. 3).
Illustrations of virus morphology provide another dimension of comparison among the Nudiviridae species (Fig. 3). DhNV virions consist of a double membrane surrounding an electron-dense core measuring (n = 30, mean ± SD) 302 ± 13 nm in length and 55 ± 4 nm at its diameter 11 . The rod-shaped structure is maintained across all the nudiviruses. DhNV represents one of the larger nudiviruses discovered to date, second to HzNV2, which has a length of 382 ± 30 nm.
A second concatenated maximum likelihood phylogenetic analysis of putative iap and pif-2 genes supported DhNV as an earlier branch of the crustacean-infecting nudiviruses. The addition of Macrobrachium rosenbergii nudivirus CN-SL2011 (MrNV) (NCBI:txid1217568), which only has the aforementioned genes available, branched in the Gammanudivirus genus. ToNV (Deltanudivirus) is the earliest branch of these genera, followed by HzNV2 (Betanudivirus), and the four crustacean-infecting nudiviruses. The Alphanudiviruses represent the most phylogenetically distinct lineage among nudiviruses in this tree (Fig. 4), following the same general phylogenetic theme as the details in Fig. 3. www.nature.com/scientificreports/

Discussion
We provide a full genome characterisation of DhNV, a novel member of the Nudiviridae infecting the freshwater amphipod host, Dikerogammarus haemobaphes. The genome size, ORFs and morphology of this virus correspond with related viruses from crustaceans and insects. The identification of this virus is discussed relative to its genetic and protein content, its gene synteny and the gene synteny of related viruses, and finally, its phylogenetic relatedness to other Nudiviridae. These combined data suggest a novel genus may be appropriate: Epsilonnudivirus.
A novel member of the Nudiviridae (Epsilonnudivirus) from an amphipod. Using a combination of the core genes conserved across the Nudiviridae, we show that DhNV is most related to the Gammanudivirus genus; however, with a low level of protein similarity at most loci (< 50%) it seems pertinent to explore the erection of a new genus. Our concatenated phylogenetic analysis of eight nudiviruses representing four genera is concordant with previously published trees 2 . DhNV appears to branch early from the three marine nudiviruses (Figs. 3, 4), suggesting an ancestral position to the HgNV, MrNV and PmNV. The DhNV genome encoded all core nudivirus genes, apart from p6.9, a nucleotide-binding protein. These proteins function for DNA processing, RNA transcription, and per os infectivity 13 . The p6.9 gene, which is responsible for the encapsulation of the viral genome, is characterized by a serine-arginine repeat region that could not be identified from the DhNV genome and was not present at the predicted locus where p6.9 lies in other Gammanudivirus members: between lef-5 (DhNV_051) and vlf-1 (DhNV_055). In addition to the core genes (n = 24, including three repeat homologues), most genes show similarity to other Nudiviridae under a conservative e-value threshold (< 0.001), providing strong evidence that this virus belongs within the Nudiviridae ( Table 1).
The primary source of protein similarity information for DhNV ORF's came from PmNV and HgNV, two genomically characterised viruses from the Gammanudivirus genus. Gammanudivirus members contain unique apoptosis inhibitor genes that lack a predicted RING domain 1 and appear twice in the HgNV genome. DhNV_059   DhNV_083  +  92627  93763  378  -----no predictions   DhNV_084  -93788  94495  235  -----SignalP-TM   DhNV_085  -94601  95626  341  -----no predictions   DhNV_086  -95803  98343  846  ----- www.nature.com/scientificreports/ represents a homolog of the Iap nudivirus gene in DhNV, where an inhibitor of apoptosis repeat domain was detected but is relatively different from existing Baculovirus homologues 1 . In addition to family-level gene conservation, we identified 11 "crustacean-infecting nudivirus" genes that are conserved among those that infect crustaceans. Using a gene-block approach, we identified that PmNV and HgNV share gene synteny, where the DhNV genome exhibits three reorganization events, termed 'X' , 'Y' and 'Z' (Fig. 3). These rearrangements are visible only in this virus, alongside a low average protein similarity of ~ 50%, and may indicate that a fifth nudivirus genus could be erected to hold peracarid-infecting nudiviruses. We suggest Epsilonnudivirus. In further work, greater genomic availability of viruses from peracarid hosts could help to better define these demarcation criteria. Further genomic characterisation of peracarid-infecting nudiviruses may also help to identify the evolutionary history of DhNV, especially with regards to genes that show relatedness outside the Nudiviridae. Examples include DhNV_029, which shares 35.19% similarity to the cg30-1 gene (YP_009186763) from Sucra jujuba nucleopolyhedrovirus (Table 1), a butterfly-infecting baculovirus. This is the first of two ORFs with zinc finger, RING-type domains in the DhNV genome; with DhNV_045 being the second (Table 1). These do show some relation to HgNV and PmNV, where both HgNV and PmNV contain three proteins with Zinc finger, RING-type domains: HgNV_019, 064, and 067 and KN57gp_003, 033, and 049 respectively 1,2 . DhNV_070 also shows high similarity to a non-nudivirus organism. A hypothetical protein from Pyricularia oryzae (Table 1), the fungal pathogen that causes rice blast disease, shows 41.18% similarity to DhNV_070. Protein domain analysis using InterProScan revealed mainly cytoplasmic and disorder protein domains from the P. oryzae sequence (XP_003712544) while DhNV_070 yielded a detailed signature match to proline rich extensin, commonly found in plant cell walls. This extensin domain does not appear in any Gammanudivirus protein. Finally, DhNV_076 shows some similarity to a homologue of HgNV (LOC108666550-like protein); however, both also show high levels of similarity to ORFs of invertebrate taxa, which lacks an identified domain or function. Such a conserved gene encoded by these viruses may reflect an ancient horizontal gene acquisition from a host during their evolutionary history. new perspectives surrounding the Nudiviridae. Nudivirus infections often delay development of their arthropod hosts, eventually causing death 4 . However, high prevalence of nudiviruses in hosts apparently displaying few clinical signs of infection may also suggest some host benefit of retaining such sub-clinical infections 2,5,10,11 . While the exact relationship between DhNV and host survival still requires testing, a significant association with increased activity may subsequently increase invasive capabilities of the host 11 . Examining the genome of DhNV revealed several conserved and convergent traits of crustacean nudiviruses, highlighting potential genes for diagnostic development and further research into functional roles during host infection and survival within the environment. Further sequencing and characterisation of many hypothetical proteins will provide more insight into the evolutionary history and host relationship of DhNV relative to other Nudiviridae. Through genomic analysis, phylogeny, and virion morphology it is evident that the Nudiviridae in Crustacea are highly derived from their insect relatives and a great diversity of currently undescribed taxa likely reside in other arthropod hosts on land and in water.  14 and pooled into paired and unpaired sequence files. The paired sequence data from each technique were paired-end-combined using PEAR v0.9.8 (settings: overlap similarity minimum, 20 bp) 15 to increase the read length of paired reads by combining the reads into single sequence reads. These reads were assembled using SPAdes v3.13.0 16 with default parameters and k-mer lengths 21, 33, 55, 77, 99 and 127, to produce 228,433 scaffolds with a maximum read length of 119,824 bp and minimum read length of 128 bp.
Identification and annotation of the viral genome. Scaffolds above 100,000 bp were extracted from the dataset and annotated using PROKKA v1.11 17 . The subsequent output was assessed for similarity to existing sequence data using NCBI, Blastp nr database. This identified a raw contiguous sequence of 119,824 bp as the genome of DhNV, which was subsequently circularized and checked for average coverage using CLC Genomics workbench v11 to result in a genome of 119,754 bp (coverage: 157.93X). PROKKA v1.11 17 and GeneMarkS 18 was used to annotate the viral genome (parameters: virus). A combination of these two tools resulted in 95 identical open reading frames (ORFs), 8 frames with high similarity but different gene size and three ORFs identified just by PROKKA. Combined, this provided 106 ORFs for annotation. The protein product of the 106 ORFs were compared to existing information using BLASTp via the NCBI repository (GenBank) with a cut-off e-value of < 0.001. The protein sequences were also assessed using the InterProScan tool (ebi.ac.uk/interpro/) to identify domains and predicted function. Twenty-one conserved core baculovirus/nudivirus genes were identified; however, P6.9 was not found within the genome of DhNV after analysis using BLASTp, ExPASy 19 , GeneMarkS and InterProScan.
The annotated viral genome is available through NCBI accession: MT488302.

Data availability
Sequence data from this study are available through NCBI as stated herein. Biological materials from the host are available from the Cefas Aquatic Registry and Repository.