A comprehensive list of the Bunyavirales replication promoters reveals a unique promoter structure in Nairoviridae differing from other virus families

Members of the order Bunyavirales infect a wide variety of host species, including plants, animals and humans, and pose a threat to public health. Major families in this order have tri-segmented negative-sense RNA genomes, the 5′ and 3′ ends of which form complementary strands that serve as a replication promoter. Elucidation of the mechanisms by which viral polymerases recognize the promoter to initiate RNA synthesis is important for understanding viral replication and pathogenesis, and developing antivirals. A list of replication promoter configuration patterns may provide details on the differences in the replication mechanisms among bunyaviruses. By using public sequence data of all known bunyavirus species, we constructed a comprehensive list of the replication promoters comprising 40 nucleotides in both the 5′ and 3′ ends of the genome that form a specific complementary strand. Among tri-segmented bunyaviruses, members of the family Nairoviridae, including the highly pathogenic Crimean-Congo hemorrhagic fever virus, have evolved a GC-rich promoter structure differing from that of other families. The unique promoter structure might be related to the large genome size of the family Nairoviridae among bunyaviruses, and the large genome architecture might confer pathogenic advantages. The promoter list provided in this report is useful for predicting the virus family-specific replication mechanisms of bunyaviruses.


Scientific Reports
| (2022) 12:13560 | https://doi.org/10.1038/s41598-022-17758-z www.nature.com/scientificreports/ families Arenaviridae, Hantaviridae, Nairoviridae, Peribunyaviridae, and Phenuiviridae include several important pathogens that can cause severe diseases in animals, including humans, while the families Fimoviridae, Phasmaviridae, Phenuiviridae, and Tospoviridae include pathogens associated with plant diseases. Major groups of bunyaviruses possess tri-segmented negative-sense RNA genomes, and share the same genetic organization consisting of three segments, i.e., the small (S), medium (M), and large (L) segments, based on their relative sizes. Each segment acts as a template for the replication of a positive-sense antigenome, and for the transcription of mRNA. The S segment encodes the nucleocapsid protein (NP), the M segment encodes a glycosylated polyprotein precursor (GPC) that is cleaved into envelope spike proteins Gn and Gc, and the L segment encodes the L protein, an RNA-dependent RNA polymerase (RdRp) responsible for the transcription and replication of the three RNA segments. The RNA synthesis activities of the three RNA segments are regulated by nucleotide (nt) sequences within the 3′ and 5′ untranslated regions (UTRs), which flank the S, M, and L open reading frames. The terminal nts of 3′ and 5′ UTRs exhibit complementarily, and such sequences have been shown to bind to, and influence the activity of, viral RdRp, promoting transcription to yield a 5′-capped mRNA by using cleaved host mRNA as a primer, and replication that results in the synthesis of a full-length copy of the genome template.
Bunyavirus promoters are composed of two promoter elements, i.e., promoter element 1 (PE1), the genomic extreme complement region, and PE2, the complement region located behind PE1, which was first described in Bunyamwera virus (BUNV) of the family Peribunyaviridae 1,2 . PE1 comprises approximately 10-15 nts located at the extreme termini of the genome that are strictly conserved among all three segments. These nts have been shown to interact with L protein at separate sites in La Crosse virus (LACV) of the family Peribunyaviridae 3,4 . PE2 comprises segment-specific nts at subsequent positions that are required to form canonical Watson-Crick base-pairing with corresponding nts at the opposite end of the template 1,2,5 . These RdRp-RNA and RNA-RNA interactions are thought to account for the pseudocircular form of viral ribonucleoprotein complexes 6,7 . In BUNV, sequence changes within PE1 have a significant effect on promoter function, but adjacent nts within PE2 are highly resistant to sequence changes, provided that their interterminal Watson-Crick base-pairing potential is maintained 1 . For the family Nairoviridae, little is known on the roles of the 3′-and 5′-terminal UTRs in regulating RNA synthesis. As with other Bunyavirales members, the UTRs of all nairoviruses comprise highly conserved terminal proximal nts (PE1) shared by all three segments, followed by less conserved regions that are segment-specific nts (PE2). The importance of these segment-specific nts in RNA synthesis has been partially examined using a minigenome reporter assay in non-pathogenic Hazara virus (HAZV), which is closely related to Crimean-Congo hemorrhagic fever virus (CCHFV) 8 . PE1 and PE2 were found to be separated by a spacer region, which exhibited a critical requirement to be short in length and lack base-pairing ability. Taken together, the accumulated data indicate that the promoter structure of bunyaviruses differs among virus families. Understanding these properties is essential for developing antivirals targeting viral RNA synthesis and processing.
To characterize the promoter structure of diverse bunyaviruses, we constructed a list of all viral promoters that exhibit complementarity and each of the nt counts within the first 40 nts of both the 5′ and 3′ (anti)genomic ends. We found that the promoters of the family Nairoviridae differ from those of other virus families, and have characteristics unique to the family, which has a large genome size.

Results
Construction of the list of promoters in the order Bunyavirales. This study aimed to characterize the promoter structure of all virus species in the order Bunyavirales, including the families Arenaviridae, Cruliviridae, Fimoviridae, Hantaviridae, Leishbuviridae, Mypoviridae, Nairoviridae, Peribunyaviridae, Phasmaviridae, Phenuiviridae, Tospoviridae, and Wupedeviridae, which are tri-segmented or multi-segmented viruses, as summarized in Fig. 1. The complementarity of the 5′ and 3′ extreme 40 nts of the genomic ends of virus genomes registered in the ICTV list (https:// ictv. global/ taxon omy) was analyzed. Because there are incomplete genome sequences in the National Center for Biotechnology Information (NCBI) database that do not precisely cover the genome extremes, we selected viral sequences with complete complementarity in the terminal + 1 to + 3 nts (some exceptions with non-complementary + 1 nt are included). The complement structure of 5′ and 3′ genomic ends (positive-sense form) as well as the genome length, counts of G:C/A:U complementarity, and counts of each nts (A, U, G and C) in the promoter region were calculated by using an automatic calculating system based on an Excel file (Supplementary Table 1), and the results are tabulated in Supplementary Table 2. A dataset was generated for each virus species that had complete data for all segments, including: tri-segmented bunyaviruses Arenaviridae (2 species), Cruliviridae (3 species), Hantaviridae (24 species), Mypoviridae (1 species), Nairoviridae (20 species), Peribunyaviridae (56 species), Phasmaviridae (2 species), Phenuiviridae (47 species), Tospoviridae (19 species) and Wupedeviridae (1 species); multi-segmented bunyaviruses Fimoviridae (17 species) and Phenuiviridae (14 species); and di-segmented bunyavirus Arenaviridae (23 species; characterized in Supplementary Fig. 1).

Characteristics of the replication promoters of five major virus families.
For five major trisegmented virus families, i.e., Peribunyaviridae, Phenuiviridae, Tospoviridae, Hantaviridae, and Nairoviridae, we examined the conservation of the nts in the 40 nts at the promoter region using the sequence generator WebLogo 9 ( Fig. 2A, which shows representative M segments, and Supplementary Figs. 2 and 3). The promoters differed among viruses: the initial nt was adenosine (A) in Peribunyaviridae, Phenuiviridae and Tospoviridae, and was uridine (U) in Hantaviridae and Nairoviridae. These promoters were further categorized into those starting with a tri-nt repeat (5′-AGU AGU and 5′-UAG UAG ) and those starting with a di-nt repeat (5′-ACAC, 5′-AGAG and 5′-UCUC) ( Fig. 2A,B). We next examined the percentages of G:C and A:U complementarity at every nt position in the promoter region among the virus species in each virus family ( Fig. 2A). The complementarity conformation was remarkably different among virus families. G:C complementarity was relatively Scientific Reports | (2022) 12:13560 | https://doi.org/10.1038/s41598-022-17758-z www.nature.com/scientificreports/ higher in the virus genomes with a promoter starting with U than in those with a promoter starting with A. The genomes of the families Hantaviridae and Nairoviridae contain high G:C complementarity at the 13-16 nt and 17-21 nt positions, respectively. In some viruses belonging to the family Phenuiviridae, a shift of 1 nt at the 10th position from the 5′ extreme appeared to increase the complementarity of the subsequent 5′ and 3′ ends 10 , but it did not increase the total G:C complementarity frequency in the promoter region of Phenuiviridae (data not shown). It has been reported that HAZV has a promoter composed of two complementary regions of PE1 and PE2 separated by a spacer region formed by non-complementary sequences at the 13-16 nt position 8 . We found the same feature in most virus species of the Nairoviridae family ( Fig. 2 and Supplementary Fig. 3).
To analyze the promoter structure in more depth, the G:C and A:U complementarity in the 40-nt promoter region was determined in three segments of the five virus families. The average complementarity count of virus species in each virus family is shown in Fig. 3A. The A:U complementarity counts were higher than the G:C complementarity counts in all segments for all virus families. However, G:C complementarity was particularly higher in the promoters of the family Nairoviridae than in those of other virus families (Fig. 3A). Each nt (A, U, G, and C) in the promoter region was counted, and the average value within each virus family is shown in Fig. 3B. In Phenuiviridae, Tospoviridae, and Hantaviridae, A in the 5′ end and U in the 3′ end were frequent in all segments. In Peribunyaviridae, both A and U were abundant at the 5′ and 3′ ends. In contrast, in the family Nairoviridae, C and G were more frequent at the 5′ and 3′ ends, respectively, than in other virus families. We demonstrated that the family Nairoviridae had more G:C complementarity as well as higher G/C counts in the 40 nts of the promoter region than other virus families, which is suggestive of stronger affinity for base pairing at both genomic ends.

Genome length of virus families in the order
Bunyavirales. The promoter structure of Nairoviridae differed from that of other tri-segmented virus families in that it had high G:C complementarity and a noncomplementary nt spacer region. To investigate the relationship between these features and the characteristics of the viral genomes, the genome lengths of all viral species of the five virus families were studied. Figure 4A shows the average total genome length (combined length of the L, M, and S segments) of all virus species in the five virus families. The full genome lengths of the families Peribunyaviridae, Phenuiviridae, and Hantaviridae were comparable, while the genome length of the family Tospoviridae was larger, and that of the family Nairoviridae was the largest. The length of each segment was also examined in all virus species, and the average lengths within virus families are shown in Fig. 4B. The family Tospoviridae had relatively large L, M, and S segments. The L segment of Nairoviridae was the largest among all segments of all viruses.
Genome length of viruses in the family Nairoviridae. Among tri-segmented viruses belonging to the order Bunyavirales, only the family Nairoviridae includes highly pathogenic viruses categorized as biosafety level (BSL)-4 pathogens that cause hemorrhagic fever in humans, such as CCHFV 11 . We hypothesized that the high pathogenicity of this virus family in mammals may be related to its large genome size. We examined the length  www.nature.com/scientificreports/ of the available sequences annotated to "Nairoviridae" in the NCBI database. We first selected virus genome sequences possessing 5′-UCUC-GAGA-3′ ends, which are the most conserved genomic end sequences in nai-  Fig. 5A. The family Nairoviridae contains two highly pathogenic viruses in mammals, i.e., CCHFV and Nairobi sheep disease virus (NSDV), which have a mortality rate of 30% and 90% in humans and small ruminants, respectively 12,13 . The length of the sequences of CCHFV and NSDV are shown as red and yellow bars, respectively, in Fig. 5A. The lengths of the M segments of CCHFV and NSDV were found to be the longest among virus genome sequences possessing 5′-UCUC-GAGA-3′ ends.

Discussion
In this study, we tabulated the replication promoter structures of all known virus species in the order Bunyavirales. Our analysis focused on five major tri-segmented virus families, and the results indicated that the genomes can be divided into two categories: those with a promoter starting with A (families Peribunyaviridae, Phenuiviridae, and Tospoviridae) and those with a promoter starting with U (families Hantaviridae and Nairoviridae; Fig. 2). It has been shown that viral RNA polymerases have the ability to initiate RNA synthesis with a purine (G or A), but not with a pyrimidine (U or C) 14 . Therefore, the 5′-U genomic end of Hantaviridae and Nairoviridae is unconventional. The genomes of LACV and Rift Valley fever virus (RVFV) of the families Peribunyaviridae and Phenuiviridae, respectively, contain a 5′-triphosphate end that starts with A (5′-pppA) 15,16 . The 5′-pppA is generated by viral RdRp that recognizes the opposite U as the template. As seen in several segmented and nonsegmented RNA viral polymerases [17][18][19] , bunyaviral RdRp synthesizes RNA from an internal nt, and not from the terminus of the template. In LACV, RNA synthesis is initiated with A using the U at the + 4 position of the antigenome (3′-UCA UCA) as the template during genome replication 4 . The elongated product, 5′-pppAGU, is realigned to the + 1 to + 3 position of the antigenome template (3′-UCA UCA), and is further elongated to generate 5′-pppAGU AGU . Accordingly, the position of U responsible for RNA synthesis initiation is presumably the + 3 position in the Phenuiviridae (3′-UGUG) and Tospoviridae (3′-UCUC) antigenomes. This indicates that the 5′-pppAC and 5′-pppAG products realign to the 3′-UGUG and 3′-UCUC of the antigenomes, respectively, and are further elongated to generate 5′-pppACAC and 5′-pppAGAG, respectively, which are precise complementary chains of the antigenome templates. In contrast, the genomes of Hantaan virus (HTNV) in the family Hantaviridae, and CCHFV in the family Nairoviridae contain a 5′-monophosphate end 15,20 , suggesting an unconventional RNA processing event during replication. In HTNV, RNA synthesis is initiated with an internal G at the + 3 position by using a C of the 3′-AUCAUC of the antigenome as the template 20 . Subsequently, the elongated 5′-pppGUA product realigns to the 3′-AUCAUC to further produce 5′-pppGUA GUA . Then, the extreme 5′-pppG is removed by an endoribonuclease activity of viral RdRp to produce 5′-pUAGUA (5′-monophosphate end) 20 . The endoribonuclease activity of RdRp is responsible for the cap snatching that cleaves the 5′ end of the host mRNA for use as a transcription primer 21 . The -1 position of viral mRNA of HTNV is G, indicating that viral RdRp can cleave host mRNA after the G nt (cleave GpN to produce G/pN) during transcription. In Nairoviridae, the − 1 position of viral mRNA is C 22,23 , and it is also generated via the cap-snatching mechanism. Similar to the RNA synthesis in Hantaviridae, it is supposed that nairoviral RNA synthesis is internally initiated with 5′-pppC at the + 2 position by using the G of 3′-AGA of the antigenome as the template. Subsequently, the 5′-pppCU product would realign to the 3′-AGA, and be further elongated to generate 5′-pppCUCU. The 5′-pppC would then be removed, resulting in the production of a 5′-monophosphate end. Therefore, although the hantaviral RdRp is a conventional enzyme that initiates RNA synthesis with a purine (G), the nairoviral RdRp is considered to be an unconventional enzyme that can initiate with a pyrimidine (C). Such a difference may be important for the targeting of novel antivirals specific for nairoviral diseases. Our analysis additionally confirmed that most bunyaviral genomes begin with a di-or tri-nt repeat (Fig. 2), which has been suggested previously 20,24 . The repeats can determine the initiation site for RdRp (e.g., + 2 in Nairoviridae, + 3 in Hantaviridae, Phenuiviridae and Tospoviridae, and + 4 in Peribunyaviridae), which is important for the prime-realign RNA synthesis mechanism. The biological significance of the internal position of RNA synthesis initiation is unclear. It is likely that the di-nt repeat is restricted to virus families possessing an www.nature.com/scientificreports/ ambisense genome, such as Phenuiviridae and Tospoviridae, as well as Nairoviridae (for which only CCHFV has been reported) 25 . This suggests that the ambisense coding property may be related to the di-nt repetition in The Nairoviridae promoter appears to have high G:C complementarity in the 17 to 21-nts region ( Fig. 2A and Supplementary Fig. 2), and this likely reflects the high G:C complementarity rate at the promoter region (Fig. 3A). Interestingly, this GC-rich dsRNA region is located after a spacer region composed of non-complementary bases around the 14th position in all three segments, as has been reported previously in HAZV and CCHFV 8,26 . We have previously suggested the possibility that the HAZV polymerase can recognize this GC-rich dsRNA as a promoter element essential for RNA synthesis initiation via an unidentified domain of the L protein 8 . This kind of specific protein-RNA interaction has been proposed to be a suitable target for antivirals against CCHFV, which is closely related to HAZV. Our comprehensive analysis of the promoter list also suggested that this kind of strategy may be applicable for all viruses belonging to the family Nairoviridae.
In bunyaviruses, genome replication in each segment is regulated by the segment-specific promoter strength, but the variations in nts (A, U, G, and C) in each promoter region do not differ significantly among the L, M, and S segments in all virus families, except for Nairoviridae (Fig. 3B). It is possible that the promoter strength among segments is determined by slight differences in the promoter structure that do not affect the total complementarity counts or nt variations. Viruses in the family Phenuiviridae and CCHFV of the family Nairoviridae have an ambisense S segment, but there is no nt variation pattern in the promoter that is unique to the S segment (Fig. 2B). This suggests that the nt variation in the promoter was not affected by the presence of the ambisense segment during the viral evolution process. On the other hand, the nt variation in the promoter of the nairoviral L segment was different from that of the M and S segments, i.e., it was observed to have less G and C at the 5′ and 3′ ends, respectively (Fig. 2B). The nairoviral L segment is remarkably long when compared to other nairoviral segments and the genomes of other virus families (Fig. 4B). This large genome size may be associated with the promoter structure.
It remains unclear why the genome of the family Nairoviridae is so large. Nairoviridae is the only tri-segmented virus family that includes hemorrhagic fever viruses classified as BSL-4 pathogens, such as CCHFV. We hypothesized that the large genome size of the family Nairoviridae may be related to its high pathogenesis in mammals. Although the length of the L segment in Nairoviridae is the longest among all bunyaviruses, it is not particularly long among the highly pathogenic viruses in this family (Fig. 5A). Rather, our analysis confirmed that among viruses in the family Nairoviridae, the M segment is the largest segment in two highly pathogenic viruses in mammals, CCHFV and NSDV, suggesting that the M segment contains factors involved in viral pathogenesis. The M segment encodes GPC that is first translated as a polyprotein from mRNA, and further cleaved into Gn, Gc, and other accessory or uncharacterized proteins. A schematic diagram of several representative nairovirus GPCs is shown in Fig. 5B. GPC contains an N-terminal signal peptide and multiple membrane-spanning domains, and is processed by signal peptidases to generate an N-terminal pre-Gn protein, C-terminal pre-Gc protein, and a double-membrane-spanning NSm protein. The pre-Gn and pre-Gc are subsequently processed by furin-like or subtilisin kexin isozyme-1 proteases to generate a mucin-like protein containing a large number of O-glycosylation sites, a protein designated as GP38 (-like), virion envelope glycoprotein Gn, and virion envelope glycoprotein Gc 27 . We showed that although the sizes of Gn and Gc are similar among virus species, those of the O-glycosylation sites and GP38-like protein are different; in particular, they are larger in CCHFV and NSDV (Fig. 5B). This suggests that these regions may be determinants of the pathogenicity of Nairoviridae. It has been proposed that GP38 is involved in CCHFV particle formation and viral infectivity 27 . Analysis of convalescent patient sera showed high titers of CCHFV GP38 antibodies, which indicated the immunogenicity of this protein in humans during natural CCHFV infection. In a mouse model, an antibody against GP38 could protect the animals from a heterologous CCHFV challenge, indicating an association between GP38 and the high pathogenesis of CCHFV 28 . Our present analysis indicates that there is an association between the N-terminal GPC region and viral pathogenesis not only in CCHFV, but also in other highly pathogenic nairoviruses, including NSDV.
In conclusion, we constructed a comprehensive list of the promoters in Bunyavirales that included all virus families in this order. Studies on the RNA synthesis mechanism of Bunyavirales have been limited to only a few virus species. Analysis of the conservation in all promoter structures is useful for the prediction of RNA synthesis mechanisms in uncharacterized and newly identified bunyaviruses. The automatic promoter-characterizing system (Supplementary Table 1) is applicable for all bunyaviruses for which the precise genomic end sequences are known.

Methods
List of bunyavirus promoters. In total, 590 bunyavirus species were registered in the ICTV list (https:// talk. ictvo nline. org/) on December 7th, 2021. The complete sequences of the L, M and S segments of bunyaviruses available on the NCBI associated with the GenBank accession numbers listed in Supplementary Table 2 were used for the analysis. After obtaining the full-length genome sequences, the sequences were input in the "Sequence" column in Supplementary Table 1, and the extreme 40 nts of each of the 5′ and 3′ ends, the complementarity between the 5′ and 3′ ends of the sequences, and the counts of G:C and A:U complementarity and each of the nts (A, U, G, and C) in the promoter region were calculated automatically. In Supplementary Table 1, the results of the L segments were input as representative data. Conservation of the nts in the promoter was analyzed by using the sequence logo generator WebLogo (https:// weblo go. berke ley. edu/ logo. cgi).
Analysis of the genome length of nairoviruses. Sequences annotated as "Nairoviridae" were downloaded from the NCBI refseq database on January 16th, 2022. There were 5272 Nairoviridae sequences in the database. We first checked for the presence of the extreme promoter sequence 5′-UCUCA in the 8-nt ends of the sequences. The promoter sequence was present in both ends of 368  , and the lengths of these sequences were calculated using a custom Python script. The codes used for this analysis are available on GitHub (https:// github. com/ shohei-kojima/ Arena virid ae_ overh ang_ analy sis_ 2022).
Amino acid sequence map of the nairovirus glycoprotein. The structural characteristics of the nairovirus glycoprotein were predicted using TMHMM-2.0 for the transmembrane protein 29 , SignalP-6.0 for the signal cleavage site 30 , and NetOGlyc-4.0 for the O-linked glycosylation sites 31 . Data on the glycoprotein sequences were collected from UniProt (https:// www. ebi. ac. uk/ unipr ot/ index). The UniProt accession numbers were: CCHFV, Q8JSZ3; NSDV, A0A0A7H8l1; Dugbe virus, Q02004; Tofla virus, A0A0U5AG15; HAZV, A6XIP3; and Erve virus, J3S7E1. GP38-like regions were found using the Protein Basic Local Alignment Search Tool (BLASTp) based on the amino acid sequences of the CCHFV and Dugbe virus GPC.
Statistical analysis. Statistical analyses were performed with Prism software (version 9.1.2; GraphPad, San Diego, CA, USA). Statistical significance was assigned when p values were < 0.05. Inferential statistical analysis was performed by a two-tailed unpaired Student's t-test or one-way analysis of variance followed by Tukey's test, as appropriate.

Data availability
All databases used in this study are available from DDBJ/ENA/GenBank (https:// www. ddbj. nig. ac. jp/ about/ insdce. html). The accession numbers of viral sequences used in this study are listed in Supplementary Tables 1, 2 and 3.