Genome sequence and organization of the Mythimna (formerly Pseudaletia) unipuncta granulovirus Hawaiian strain

Purified occlusion bodies (OBs) of Mythimna (formerly Pseudaletia) unipuncta (the true armyworm) granulovirus Hawaiian strain (MyunGV-A) were observed, showing typical GV morphological characteristics under scanning and transmission electron microscopy (EM). The genome of MyunGV-A was completely sequenced and analysed. The genome is 176,677 bp in size, with a G+C content of 39.79%. It contains 183 open reading frames (ORFs) encoding 50 or more amino acids with minimal overlap. Comparison of MyunGV-A with TnGV, XcGV, and HearGV genomes revealed extensive sequence similarity and collinearity, and the four genomes contain the same nine homologous regions (hrs) with conserved structures and locations. Three unique genes, 12 baculovirus repeated ORF (bro), 2 helicase, and 3 enhancin genes, were identified. In particular, two repeated genes (ORF39 and 49) are present in the genome, in reverse and complementarily orientations. Twenty-four OB proteins were identified from the putative protein database of MyunGV-A. In addition, MyunGV-A belongs to the Betabaculovirus group and is most closely related to TnGV (99% amino acid identity) according to a phylogenetic tree based on the combined amino acid sequences of 38 core gene contents.

Sequence and genome characteristics of MyunGV-A. The size of the MyunGV-A genome is 176,677 bp (GenBank accession no. NC_013772), with a G+C content of 39.79%. MyunGV-A is the second largest GV sequenced to date, with XcGV (178,733 bp) 6 being larger. Computer-assisted ORF analysis detected 372 ORFs of 50 or more codons and 9 homologous regions (hrs) in the MyunGV-A genome; 189 ORFs overlap significantly or are completely contained within other MyunGV-An ORFs. The deduced protein sequences of these 189 ORFs show no significant homology to protein sequences in GenBank. The remaining 183 ORFs and 9 h are shown in Table 1 according to location, orientation, size of the predicted amino acid sequence, potential baculovirus homologues, best matched baculovirus ORF and BLAST score (bits).
The first nucleotide of the granulin start codon was defined as nucleotide 1, and the ORF encoding granulin was accordingly designated as the first ORF. The putative ORFs were numbered sequentially in this orientation. Ninety-nine ORFs are in the granulin-sense orientation and 84 in the opposite orientation. A total of 183 putative ORFs of MyunGV-A were searched for promotor motifs at 180 bp upstream of the initiation codon of each ORF; only 42 were found to have a canonical baculovirus early gene promoter motif (a TATA box followed by a CAGT or CATT motif 20 to 40 bp downstream) 24,25 . Seventy-five ORFs only possess a late promoter motif ((A/T/G) TAAG); 75 contain both early and late promoter motifs, which might allow transcription during both early and late stages of infection. Thirty-four lack any recognizable canonical promoter motif.

Comparison of MyunGV-An ORFs to other baculoviruses. Comparison of gene organization and
homology between MyunGV-A and other baculovirus genomes provides insight into gene conservation and implications for the diversity of baculoviruses. MyunGV-A shares 88 ORFs with AcMNPV, 166 with XcGV, 169 with HearGV and TnGV, 139 with MyunGV-B and 104 with CpGV ( Table 1). The average amino acid sequence identities of homologous ORFs between MyunGV-A and AcMNPV, XcGV, HearGV, TnGV, MyunGV-B and CpGV are 34%, 79%, 79%, 98%, 62% and 44%, respectively. A total of 180 ORFs were assigned a function or are homologous with other baculoviruses, of which three ORFs (68, 69 and 147) have homologues only with TnGV. ORF68 and ORF147 share 100% homology with TnGV but ORF69 94%. In addition, ORF69 has 37% homology with a kind of bacterium, Zooshikella ganghwensis. Three ORFs, ORF113, -133 and -166, were identified as unique to MyunGV-A.  www.nature.com/scientificreports/ GeneParityPlot analysis. The gene order of MyunGV-A was compared with that of AcMNPV, XcGV, HearGV, TnGV, MyunGV-B and CpGV by GnenParityPlots analysis (Fig. 2) 20 . The gene organization of MyunGV-A is distinctly different from that of AcMNPV, except for two reverse collinear gene clusters in which one is a 12-gene group including the core gene cluster of four genes, lef-5, 38K(ac98), ac96, and helicase, with relative positions that are conserved in baculovirus genomes 26 . In contrast, the gene order of MyunGV-A exhibits extensive collinearity with XcGV, HearGV, TnGV, MyunGV-B and CpGV, except for several genes in a different order that are almost bro or near bro, with the highest collinearity to TnGV. Interestingly, the arrangement of the MyunGV-A genome shows lower collinearity to MyunGV-B, a virus from the same host, than to XcGV, HearGV and TnGV.

Homologous regions (hrs).
A typical feature of most baculovirus genomes is the presence of homologous regions (hrs) interspersed throughout the genome. The numbers of hrs in 82 complete baculovirus genomes range from none to 17, with 12 baculovirus genomes lacking typical hrs sequences (Table S1). In general, hrs are characterized by AT-rich and imperfect, reiterated palindromic sequences that may be replaced with direct repeats. Eight major hr sequences (hr1-8) and one short hr sequence (hr5a) were identified in the MyunGV-A genome ( Table 1). hr1-8 contains two to five direct imperfect repeats, each of approximately 120 bp, whereas hr5a does not contain multiple repeated sequences. It is interesting to note that hr5a is located in ORF122 (vp91), and the same situations exists in the XcGV and HearGV genomes. Six hrs were identified in the MyunGV-B genome lacking sequences corresponding to hr1 and hr5/5a of MyunGV-A 27 . No hrs were found in the TnGV genome deposited in 2018 (NC_038375.1), and there is no publication on the analysis of the sequence.
Although the nucleotide sequences of repeats vary between each hr, even in the same hr, two highly conserved 10 bp core sequences (TTAAT (G/A) TCGA) were found at the roughly same positions (approximately 35 bp)   www.nature.com/scientificreports/ of each repeat 6 . In the MyunGV-A genome, the core sequences in each repeat of hr1, -2, -4, -7 and -8 are in the same directions, while those of hr3, -5, -5a and -6 are in opposite directions (Fig. 3).
Hrs have been reported to function in replication origins 28,29 and serve as enhancers of transcription of early genes 30 . In addition, the number of hrs is connected to the replication efficiency or pathogenicity of a baculovirus. Deletion of one to five hrs of AcMNPV had little or no effect on virus infection, while deleting six or seven hrs resulted in 90% BV reduction. Deletion of all eight hrs caused 99.9% BV reduction and delay of early and late gene expression but did not completely inhibit virus production 31 .

Baculovirus repeated ORFs (bro genes). Bro genes have been identified in most baculovirus genomes
sequenced to date. The number of bro genes in different baculovirus genomes varies considerably. Thirteen of 82 complete baculovirus genomes have only one bro gene, though Lymantria dispar MNPV (LdMNPV) has 16 bro genes. Bro genes are entirely absent from 19 baculovirus genomes sequenced to date (Table S1).
The exact function of bro genes is not yet clear, though their presence is very significant for baculoviruses. Studies on the function of bro genes have mostly focused on BmNPV and have found that BRO-A and C proteins can bind to DNA in infected cells 32 ; BRO-A may be involved in influencing host DNA replication, similar to a laminin-binding protein 33 .
In addition, BmNPV BRO proteins act as nucleocytoplasmic shuttling proteins via the CRM1-mediated nuclear export pathway 34 . Recently, BmNPV BRO-B and E proteins associated with host T-cell intracellular antigen 1 homologue (BmTRN-1) were shown to be involved in the inhibitory regulation of certain mRNAs at the post-transcriptional level during infection 35 . The function of other baculovirus BRO proteins has seldom been reported.
Two repeat genes in MyunGV-A. Two repeat genes (ORF39 and ORF49), with amino acid sequence identities of 100%, were found in the MyunGV-A genome; the former is in the granulin-sense orientation and the latter in the opposite orientation.
There is no homologous gene with these two genes in the XcGV, MyunGV-B and CpGV genomes. Indeed, only one gene, ORF43, of the TnGV genome matches with them, and the amino acid sequence identity is 97%. www.nature.com/scientificreports/ Two genes, ORF53 and ORF157, in the HearGV genome are homologous, with amino acid sequence identities of 86% and 85%, respectively, and the amino acid sequence identity of ORF53 and ORF157 in the HearGV genome is 99%. One gene with two copies in one baculovirus genome was found in other baculovirus genomes, such as odv-e66, p26 and dbp of EcobNPV 36 and odv-e66 and p26 of SfMNPV 37 . BLAST results of amino acid sequences of these two homologous genes in MyunGV-A in NCBI suggested they match hr3 and hr4 of Heliothis virescens ascovirus 3e (amino acid sequence identities both 49%). In addition, they match the 70.4-kDa C-terminal Zn-finger DNA-binding domain of Spodoptera frugiperda ascovirus 1a (amino acid sequence identities of 48%), which suggests that their function may be associated with DNA binding.
ORFs with no homologues in other baculoviruses. Three ORFs, including ORF113, -133 and -166, were identified as having no homologues in other baculoviruses (Table 1). These three unique ORFs have no recognizable promoter. Protein homology analysis using HHpred showed that GP133 (aa 50-359) is a likely homologue of Mannan-binding lectin serine peptidase 1 (probability, 99.97%; E value, 1.1e-28). Mannan-binding lectin serine peptidase 1 plays a central role in the initiation of the complement lectin pathway 38 . This homology indicates that ORF133 might be related to the complement lectin pathway, which deserves further research. ORF113 encodes an 8.5-kDa protein with one transmembrane domain (aa 5-27, analysed by TMHMM server v2.0) at the N terminus of the protein with no similarity to any proteins in the nonredundant protein database. ORF166 encodes a 7.7-kDa protein with no similarity to any proteins in the nonredundant protein database. www.nature.com/scientificreports/ The large gene in MyunGV-A. In most cases, helicase is the largest gene in baculovirus genomes; however, in the MyunGV-A genome, ORF45 encoding 1213 amino acids (longer than helicase-1, 1158) is the largest gene. Similar situations are present in the HearGV (ORF44, 1279 aa), TnGV (ORF39, 1213 aa) and MyunGV-B (ORF45, 1507 aa) genomes, though it is divided into two genes, ORF47 and ORF48, in XcGV 6 . Compared with XcGV, the MyunGV-A genome has an additional adenosine (A) at position40315, resulting in a reading frame shift. Protein homology analysis using HHpred and SWISS-MODEL showed no significant similarity to any other known sequences for Myun45.

Enhancins in MyunGV-A.
It was first observed in Mythimna (formerly Pseudaletia) unipuncta that GV can increase the rate of infection and fatality of NPV and decrease the larval survival time when GV and NPV coinfect larvae 10 . Subsequent studies found that the factor responsible for synergistic interaction is a GV protein that shows a synergistic effect only when larvae are infected with NPV; it was identified as a synergistic factor (SF) 39 . The synergistic effect of viral enhancing factor (VEF) was also observed in TnGV 40 . The location and sequence of the VEF gene of TnGV have been identified 41 . This enhancing protein (enhancin) can disrupt the midgut peritrophic membrane (PM), thereby resulting in the more efficient passage of virions to host midgut cells 12 .
Enhancin was identified as a metalloprotease via the discovery of a zinc-binding site as well as by inhibition with a metal chelator and reactivation with divalent ions 42 . The MyunGV-A genome has three enhancin genes (Myun157, -159 and -170). Similarly, three enhancin genes were found in MyunGV-B and TnGV, but they show large diversity in amino acid sequence identity compared to MyunGV-A. MyunGV-B enhancins are only 35% to 55% identical to that of MyunGV-A but are as high as 99% identical to that of TnGV. Four enhancin genes were found in the XcGV and HearGV genomes, of which enhancin-1, -3, and -4 have high homology (amino acid sequence identities all above 74%) to three enhancin genes of MyunGV-A. The MyunGV-A enhancin gene (enhancin-3) encoding 901 amino acids has been sequenced and characterized 12 . The canonical sequence HEXXH, the zinc-binding site in most metalloproteases, was found in enhancing-3 but not in the other two enhancins. It is not clear why three enhancins are present in MyunGV-A, and the roles of these three enhancins in promoting NPV infection remain unclear.
Enhancins are found mainly in GVs and a few NPVs. They are localized within the granulin matrix in granuloviruses and released to increase virus pathogenicity by acting in the midgut. In contrast, LdMNPV enhancins are located within ODV envelopes and facilitate ODVs to pass the host defence barrier by acting directly on the peritrophic membrane as the nucleocapsids move through the barrier 43 Table 2). Among the 24 proteins, 20 were detected with two or more peptides, and the other four were detected with one matching peptide. In addition, 15 of 24 identified proteins were detected in more than one sample. Granulin was found in 28 of the 29 samples (Table S2). The same situations were found for CuniNPV 54 , HearsNPV 48 and AgMNPV 50 . A noticeable phenomenon was also observed, whereby the identified proteins were not distributed according to their molecular mass in SDS-PAGE gels. The reason was postulated to be incomplete denaturation of OBs and the breakdown of protein complexes or protein processing 54 .
For the 24 identified proteins, six are encoded by additional genes conserved in GVs, including ORF16, ORF17, ORF18, ORF120, ORF174 and ORF175 55 . Among them, proteins encoded by two contiguous ORFs (ORF16 and 17) belong to the CpGV ORF16 L family 56 , and the protein encoded by ORF18 is similar to P10, containing a baculovirus polyhedron envelope protein (PEP) C domain (pfam04513). In addition to structural proteins or those implicated in DNA replication and transcription, four important auxiliary proteins were identified, including SOD, cathepsin and two enhancins. Enhancin-1 and enhancin-3 were detected in our proteomic studies; enhancin-3 was present in 16 samples, while enhancin-1 was present in only 1 sample. Most baculovirus enhancins, including MyunGV-A, are located in the OB matrix, whereas LdMNPV enhancins were found to be associated with ODV envelopes 43,57 . In this study, we did not attempt to determine the specific location of enhancins.
Moreover, four proteins (Myun29, Myun32, Myun44 and Myun67) with unknown functions were detected ( Table 2). An increasing number of baculovirus proteomic studies can provide valuable insight into baculovirus structure, infectious mechanisms and interactions with their hosts.  (Table S1) classified MyunGV-A into clade "a" of Betabaculovirus, which clusters infecting the larvae of the Lepidopteran family Noctuidae. Within this clade, MyunGV-A is present into a subcluster together with TnGV, the closest neighbour, sharing a common hypothetical ancestor. XcGV and HearGV form another subcluster next to the MyunGV-A and TnGV subclusters. However, MyunGV-B, another granulovirus from the same host, groups into a subcluster with SpfrGV and slightly away from MyunGV-A across MolaGV (Fig. 4). This is consistent with the above comparison results of gene organization in which MyunGV-A is similar to TnGV, XcGV and HearGV, regardless of genome size, ORF number or gene order.

Conclusion
The purified OBs of MyunGV-A show typical GV morphological characteristics under EM. The complete MyunGV-A (NC_013772.1) genome is 176,677 bases, with a G+C content of 39.79%, the second largest baculovirus genome to date. It contains 183 ORFs with a minimal size of 50 codons. The genome of MyunGV-A exhibits extensive sequence similarity and collinearity with TnGV, XcGV and HearGV. Three unique genes, 12 bro, 2 helicase and 3 enhancin genes, were identified. In particular, two repeated genes (ORF39 and 49) are present in the genome in reverse and complementarily orientations. Twenty-four OB proteins were identified from the putative protein database of MyunGV-A. According to our phylogenetic tree, MyunGV-A belongs to the Betabaculovirus group and is most closely related to TnGV.  . Phylogenetic tree of 82 baculoviruses with complete sequences. The phylogenetic tree was generated using MEGA X 58 software and performed with the maximum likelihood method and JTT matrix-based model 59 . The result was visualized using iToL 60 .