Dengue is a mosquito-borne disease caused by a group of viruses collectively known as Dengue virus, belonging to the Flaviviridae family. It affects nearly 390 million people every year worldwide1 with symptoms ranging from mild fever to severe shock syndrome. Although recently different vaccination strategies have achieved reasonable levels of protection, a vaccine that protects uniformly against the circulating serotypes is not available2. This is partly due to the high variability among these viruses, which can lead to partially protective immune responses and antibody dependent enhancement (ADE) of infection when non-neutralising antibodies facilitate virus entry by Fc gamma receptors3,4,5,6.

Dengue viruses (DV) are enveloped, ssRNA viruses with a single open reading frame encoding three structural and seven non-structural proteins7. The genetic variance amongst DV results in diverse immune responses such that they are classified into four serogroups (DV1–4) based on antigenic diversity8,9. Although mutations occur randomly in the genome, viral proteins show a combination of regions that are permissive to multiple mutations, which enable immune evasion through antigenic variation, and regions where amino acid residues critical for structure and viral fitness are conserved10. In general, regions exposed to the immune system are prone to variation, however, even the envelope (E) protein, the main antigenic determinant on the virion, retains highly conserved cryptic peptides10,11. It has been demonstrated that those conserved regions in structural proteins have an important role in viral fitness and might be targets of broadly neutralising antibodies in viruses such as HIV and Influenza A virus. For instance, although the majority of the protective immune response against influenza virus is provided by antibodies against the head of haemagglutinin (HA), new classes of multi-neutralising antibodies have been isolated that target the highly conserved HA stalk region12,13,14,15,16,17. Antibodies with similar properties have also been found that target functionally conserved regions of HIV glycoprotein 12018,19,20,21,22. In both cases these regions are being evaluated as vaccine targets and the antibodies elicited have been used to study immunoprophylaxis strategies.

In this context, the sequence conservation of DV was evaluated, with the aim of identifying conserved regions in the E protein. All complete genome sequences available on access date for DV4 in NCBI (120 sequences) were analysed along with the same number of genomes for each other DV serotype, that were randomly selected through their NCBI sorting numbers (access on November, 26, 2013). This unbiased dataset, comprising 480 sequences (all sequence IDs in Supplementary Data) allowed ample representation of the known variability observed in this taxonomic unit. The 480 GenBank files obtained in the previous step were processed using custom PERL scripts written using BioPerl module23 to extract protein and coding sequences. MUSCLE software was then used to align protein sequences with default parameters and these alignments, together with CDS data, were used to create a codon alignment. For both protein and codon alignments conservation scores were calculated based on the ratio of the count for the most frequent character (amino acids or nucleotides plus gaps) at a given position and the total number of sequences evaluated (480). To detect local conserved regions the fitting of a smooth curve to protein and codon conservation score data was carried out using smooth.spline function implemented in R language with smoothing parameter set to 0.4. The conservation scores varied from 0.4 to 1 (scores for all peptides are available in Supplementary Table), where a score closer to 1 corresponds to a higher conservation of the region across all analysed sequences (Fig. 1a). Although structural proteins are more variable in general, some regions in E demonstrated to be conserved as the non-structural proteins average. To analyse the E protein with greater resolution we repeated this procedure with its sequences separated from the full genome. This revealed two main peaks of conservation on the envelope protein (Fig. 1b). Further analyses revealed that the first of these sites is the fusion peptide. This peptide was first described in 1989, comprises a hydrophobic loop highly conserved in all flaviviruses, that is normally buried at the E dimeric form and becomes exposed at the tip of the fusogenic trimer (Fig. 2a,b)24,25,26. The second most significant peak of conservation was also mapped into domain II of the E protein and comprised residues 250–270 (polyprotein residues 530–551). Internally located in pre-fusion E dimer, covering the final residues of the ij loop, the αB helix and the beginning of the kl loop, the structure of this peptide (from now E250–270) was visualised using available structures of DV E proteins in pre- and post-fusion forms (Fig. 2a,b). E250–270 is the longest conserved peptide that is solvent-exposed in the E protein trimer, which is responsible for fusion of the viral envelope to the host membrane (Fig. 2b). Comparison of the localisation and structure of this DV peptide with those of other flavivirus E proteins shows its cross-species conservation (Fig. 2a,b). To visualise this conservation at the amino-acid sequence level, WebLogo was used27 to create a representation for the region spanning the peptide (columns 530–551). WebLogo was run using standard parameters and considering equiprobable amino acid frequencies. The same strategy was employed to compare the conservation of E250–270 in other flaviviruses. 120 sequences of Japanese encephalitis virus (JEV), 120 of Tick-borne encephalitis virus (TBEV), 120 of West Nile virus (WNV), were selected randomly from GenBank as well as the 70 sequences of Yellow fever virus (YFV) and 63 of Zika virus (ZKV), available, aligned with MUSCLE and visualized all together with WebLogo. Sixteen out of 21 amino acids in this peptide were 100% conserved in every DV sequence evaluated, being DV4 the most divergent among the four serotypes. When compared to other Flaviviruses at least seven amino acids from E250–270 are highly conserved (V250, L253, G254, Q256, G258, L264, G266; Fig. 2c and Supplementary Figure). These data indicate that E250–270 is conserved both at the sequence and structural levels across multiple flaviviruses.

Figure 1
figure 1

Genome-wide analysis of DV conserved sites.

Conservation profile of nucleotides (red line) and amino acids (black line) of 480 dengue coding sequences (CDS) (a) and their respective envelope proteins (b). The “x” axis represents the position of amino acid residues in the DV CDS (a) or Envelope protein (b) and “y” axis represents the conservation score, where 1 indicates the highest conservation. The two most highly conserved peptides in the envelope protein (b) are the fusion peptide (residues E99–112, highlighted in the blue box) and E250–270 (red box). Above “a” the schematic DV CDS. C: capsid; prM: membrane precursor; E: envelope; NS: non-structural. Above “b” the scheme of E protein domains (I, II, II), the stem segment and transmembrane anchor (TM).

Figure 2
figure 2

E250–270 sequence and structural conservation in different flaviviruses.

(a) The fusion peptide (in blue) and E250–270 (in red) are highlighted in envelope monomer structures of DV1, JEV and ZKV and (b) in the trimeric form of DV1 and TBEV. PDB ids: 4GSX, 3P54, 5JHM and 1URZ. (c) WebLogo schematic showing the amino acid composition per site in E250–270 of DV and other flavivirus. Polar amino acids (G,S,T,Y,C) are colored in green, neutral (Q,N) in purple, basic (K,R,H) in blue, acidic (D,E) in red and hydrophobic (A,V,L,I,P,W,F,M) in black.

As there is no specific reported function associated with the region where E250–270 is located in DV, to establish the significance of this highly conserved peptide, a DV was generated with a mutated E250–270 peptide (Fig. 3a). Several hydrophobic side chains were removed by mutation of five amino acid residues to alanine. These changes were designed to remove some of the most conserved side chains across all E protein sequences whilst minimising structural disruption across the peptide. Prediction of the effect of these substitutions on the three-dimensional structure of the envelope protein was carried out with Modeller28 using the complete E protein x-ray crystallographic structure as a template29 (Fig. 3b). PROCHECK30, WHAT IF31 and Verify3D32 algorithms were used to ensure the satisfaction of stereochemical restraints indicating that the amino acid substitutions could be tolerated by the E protein without gross structural disruption. More studies will be necessary to fully evaluate the impact of these mutations on E protein biology.

Figure 3
figure 3

CSMut1 design, modelling and infectivity.

(a) The wild type E250–270 and CSMut1 sequences are shown with mutated amino acids highlighted in red. (b) Wild type (on the left) and CSMut1 model (on the right) with E250–270 in red. (c) Relative quantification of DV NS5 RNA in Huh7.5 cells transfected with wild type and CSMut1 genomes at different time points. Cell associated RNA is shown in the left panel and supernatant extracted RNA in the right panel. (d) DV E-protein immunofluorescence staining of WT and CSMut1 120 hours after genome transfection in Huh7.5 cells (bars correspond to 32 μM).

An infectious clone of DV1, strain BR/90, was used to construct the mutant virus33. Firstly, three silent nucleotide changes were inserted to add restriction enzyme cleavage sites (C368T, T1663G, G1822C, based on GenBank AF226685.2) and then alanine substitutions were introduced in E protein amino acids 250–253 and 255 by gene synthesis and conventional cloning (Fig. 3a). The resultant infectious clone, named Conserved Surface Mutant 1 (CSMut1), was fully sequenced and no modifications other than those desired were observed. CSMut1 and matching wild type (WT) were in vitro transcribed using MegaScript T7 synthesis kit (Ambion), supplemented with m7G(5′)ppp(5′)G RNA Cap Structure Analog (New England Biolabs). RNA was purified (RNeasy Mini Kit, Qiagen), and transfected into Huh7.5 cells with Lipofectin (Invitrogen). Cells and supernatant were recovered at indicated hours post transfection (h.p.t.) and the RNA extracted using RNeasy Mini Kit or QIAamp Viral RNA Mini Kit, respectively (both Qiagen). Using RT-qPCR for non-structural protein 5 (NS5) mRNA (5′ GCAAACATCTTCAGGGGAAGT 3′, 5′ GCTCCCGTACCTCTCCTACC 3′), only decreasing quantities of CSMut1 NS5 transcript were observed in both cell associated RNA or culture supernatant, suggesting an impaired ability of this mutant virus to replicate (Fig. 3c). The WT virus, on the other hand retained detectable levels of cell-associated DV RNA and increasing amounts of DV RNA in the culture supernatant consistent with efficient replication with the expected kinetics for this virus (Fig. 3c). Furthermore, CSMut1 growth was not detected by plaque assay (data not shown).

The impact of CSMut1 on DV1 was also assessed by immunofluorescence for the viral envelope antigen in Huh7.5 cells34. Briefly, Huh7.5 cells were transfected with wild type or CSMut1 RNA using Lipofectin, fixed after 120 hours, and stained with monoclonal anti-E antibody (4G2 - ATCC® HB-112), using Alexa Fluor 488 rabbit anti-mouse IgG (H + L, Life Technologies) as a secondary antibody, and DAPI counter stain (Molecular Probes). In agreement with RNA detection (Fig. 3c), E protein expression was not observed in cells transfected with CSMut1 RNA, suggesting that the mutant virus was not able to spread in Huh7.5 cells (Fig. 3d). To understand whether the impairment caused by CSMut1 was restricted to the mammalian host, similar experiments to those described above were performed in a C6/36 Aedes albopictus cell line. The results were remarkably similar to those in human cells (data not shown), suggesting that the defect caused by the mutations were not due to a host specific factor.

These data show that, despite the great diversity among the serotypes of dengue viruses there are at least two polypeptides within its main antigenic determinant that are highly conserved at both the sequence and structural levels. These two regions were also reported during analysis of pan-DV sequences as potential immune-relevant T cell determinants11. These peptides are generally buried in dimeric E form, and are thought to become fully exposed only in the E fusogenic form of mature virion35. This assumption suggests that once hidden to the humoral immune response, these regions remain conserved due to the absence of selective pressure from host immunity. To test this, the presence of antibodies against E250–270 was assessed in patient sera. Serum samples were obtained from an outbreak in Paraná state, Brazil (2013) according to the approved guidelines of Fiocruz (Fundação Oswaldo Cruz), within Instituto Carlos Chagas (Curitiba, Brazil). Experiments involving human subjects were approved by the committee of ethics and research (Comitê de Ética e Pesquisa – CEP) of Fiocruz-RJ under protocol 617/11. Informed consent was obtained from all donors. 70 serum samples were evaluated for DV infection (by NS1, IgM and IgG ELISAs). Samples corresponded to 54 DV1 sera designated positive by at least one assay and from patients in the acute phase of disease (up to the seventh day after the onset of symptoms) and 16 control sera from non-infected individuals. An indirect ELISA protocol was used to identify anti-E250–270 IgG antibodies36 (Fig. 4). Briefly, plates were coated with synthetic E253–270 peptide (it was not possible to solubilise a full length E250–270 peptide), blocked and incubated with serum samples. An HRP-linked, goat anti-human IgG (H + L, Invitrogen) was used as secondary antibody, and the plates were treated with o-phenylenediamine in citrate phosphate buffer containing 30% hydrogen peroxide. These ELISA data demonstrated this peptide to be antigenic in human natural infections, despite its predicted buried nature in the dimeric E form. One explanation for the development of antibodies to this site would be the dynamic movement of the virus particle, described as ‘breathing’37,38. In agreement, Ramanathan, B. et al.39 predict the peptide as being a potential linear epitope, and the Rey group described a broadly neutralising monoclonal antibody that interacts with the first valine of E250–27040. Therefore, two other reasons for conservation could be explored i) structure and ii) function. Modelling of the mutant E protein did not show evidence of structure destabilisation, even though the mutated residues were conserved across almost all analysed virus sequences, suggesting that E250–270 might have a functional role in the DV life cycle. In agreement with our data other groups suggest that amino acids within this region are key to the E protein biology. Accordingly, a single substitution at V251A moderately restricts DV replication and viral particle production in C6/36 and Vero cells, and reduces viral E protein detection by immunofluorescence in C6/36 and BHK-21 transfected cells41. Moreover, it was demonstrated that G266 and I270 substitutions to tryptophan affected viral replication in mammals and insect cells, and the I270W reduced fusion42. In contrast the study of Christian et al. using replicons with random point mutations in E protein did not verify significant differences in E expression, budding and infectivity when mutating residues of the peptide´s N-terminus (such as V251, and others mutated in CSMut1), or residues 266 and 27043. This could be related due to the amino acid utilized in substitutions, restrictions of the replicon´s method, and the contribution of multiple mutations as present in CSMut1. On the other hand, they suggests that other residues in peptide´s C-terminus, M258, H259 and A265, could have important role in fusion and, if mutated, greatly reduce or practically abolish viral replication43. These residues could form latch contacts with M proteins preventing the premature triggering of E protein. M258 forms with other residues a hydrophobic path that appears to be important to interact with F400 of E stem region in DV1 trimer, and histidine protonation, among them H259, enable the dissociation of E and M protein contacts29,43. The undetectable level of replication observed to CSMut1 demonstrated the importance of E250–270 to viral infectivity and further investigations should be done to elucidate the exact function this conserved peptide. Moreover the detection of antibodies anti E250–270 in natural infected patients point the possibility to the use of these peptide as immunological target, as recently shown36.

Figure 4
figure 4

E250–270 antigenicity. Presence of antibodies against E250–270 analysed by ELISA.

54 serum samples positive for DV infection and 16 from non-infected controls. Absorbance values read at 450nm. ***p = 0.0004, error bars are ± standard deviation.

Our strategy enables identification of the most conserved regions in DV genomes, other flavivirus such as ZKV, and also the rational designing of mutant viruses to investigate the importance of these regions in viral fitness and infectivity. As carried out for E250–270 peptide, a systematic analysis of other highly conserved regions could suggest potential immunological/pharmacologic target for dengue treatment and control. Moreover, these knowledge can be extrapolated for other ssRNA genome viruses, and contribute to understand the evolution of their cryptic conserved peptides.

Additional Information

How to cite this article: Fleith, R. C. et al. Genome-wide analyses reveal a highly conserved Dengue virus envelope peptide which is critical for virus viability and antigenic in humans. Sci. Rep. 6, 36339; doi: 10.1038/srep36339 (2016).

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.