Genome-wide analyses reveal a highly conserved Dengue virus envelope peptide which is critical for virus viability and antigenic in humans

Targeting regions of proteins that show a high degree of structural conservation has been proposed as a method of developing immunotherapies and vaccines that may bypass the wide genetic variability of RNA viruses. Despite several attempts, a vaccine that protects evenly against the four circulating Dengue virus (DV) serotypes remains elusive. To find critical conserved amino acids in dengue viruses, 120 complete genomes of each serotype were selected at random and used to calculate conservation scores for nucleotide and amino acid sequences. The identified peptide sequences were analysed for their structural conservation and localisation using crystallographic data. The longest, surface exposed, highly conserved peptide of Envelope protein was found to correspond to amino acid residues 250 to 270. Mutation of this peptide in DV1 was lethal, since no replication of the mutant virus was detected in human cells. Antibodies against this peptide were detected in DV naturally infected patients indicating its potential antigenicity. Hence, this study has identified a highly conserved, critical peptide in DV that is a target of antibodies in infected humans.

and Influenza A virus. For instance, although the majority of the protective immune response against influenza virus is provided by antibodies against the head of haemagglutinin (HA), new classes of multi-neutralising antibodies have been isolated that target the highly conserved HA stalk region [12][13][14][15][16][17] . Antibodies with similar properties have also been found that target functionally conserved regions of HIV glycoprotein 120 [18][19][20][21][22] . In both cases these regions are being evaluated as vaccine targets and the antibodies elicited have been used to study immunoprophylaxis strategies.
In this context, the sequence conservation of DV was evaluated, with the aim of identifying conserved regions in the E protein. All complete genome sequences available on access date for DV4 in NCBI (120 sequences) were analysed along with the same number of genomes for each other DV serotype, that were randomly selected through their NCBI sorting numbers (access on November, 26,2013). This unbiased dataset, comprising 480 sequences (all sequence IDs in Supplementary Data) allowed ample representation of the known variability observed in this taxonomic unit. The 480 GenBank files obtained in the previous step were processed using custom PERL scripts written using BioPerl module 23 to extract protein and coding sequences. MUSCLE software was then used to align protein sequences with default parameters and these alignments, together with CDS data, were used to create a codon alignment. For both protein and codon alignments conservation scores were calculated based on the ratio of the count for the most frequent character (amino acids or nucleotides plus gaps) at a given position and the total number of sequences evaluated (480). To detect local conserved regions the fitting of a smooth curve to protein and codon conservation score data was carried out using smooth.spline function implemented in R language with smoothing parameter set to 0.4. The conservation scores varied from 0.4 to 1 (scores for all peptides are available in Supplementary Table), where a score closer to 1 corresponds to a higher conservation of the region across all analysed sequences (Fig. 1a). Although structural proteins are more variable in general, some regions in E demonstrated to be conserved as the non-structural proteins average. To analyse the E protein with greater resolution we repeated this procedure with its sequences separated from the full genome. This revealed two main peaks of conservation on the envelope protein (Fig. 1b). Further analyses revealed that the first of these sites is the fusion peptide. This peptide was first described in 1989, comprises a hydrophobic loop highly conserved in all flaviviruses, that is normally buried at the E dimeric form and becomes exposed at the tip of the fusogenic trimer ( Fig. 2a,b) [24][25][26] . The second most significant peak of conservation was also mapped into domain II of the E protein and comprised residues 250-270 (polyprotein residues 530-551). Internally located in pre-fusion E dimer, covering the final residues of the ij loop, the α B helix and the beginning of the kl loop, the structure of this peptide (from now E 250-270 ) was visualised using available structures of DV E proteins in pre-and post-fusion forms (Fig. 2a,b). E 250-270 is the longest conserved peptide that is solvent-exposed in the E protein trimer, which is responsible for fusion of the viral envelope to the host membrane (Fig. 2b). Comparison of the localisation and structure of this DV peptide with those of other flavivirus E proteins shows its cross-species conservation (Fig. 2a,b). To visualise this conservation at the amino-acid sequence level, WebLogo was used 27 to create a representation for the region spanning the peptide (columns 530-551). WebLogo was run using standard parameters and considering equiprobable amino acid frequencies. The same strategy was employed to compare the conservation of E 250-270 in other flaviviruses. 120 sequences of Japanese encephalitis virus (JEV), 120 of Tick-borne encephalitis virus (TBEV), 120 of West Nile virus (WNV), were selected randomly from GenBank as well as the 70 sequences of Yellow fever virus (YFV) and 63 of Zika virus (ZKV), available, aligned with MUSCLE and visualized all together with WebLogo. Sixteen out of 21 amino acids in this peptide were 100% conserved in every DV sequence evaluated, being DV4 the most divergent among the four serotypes. When compared to other Flaviviruses at least seven amino acids from E 250-270 are highly conserved (V250, L253, G254, Q256, G258, L264, G266; Fig. 2c and Supplementary Figure). These data indicate that E 250-270 is conserved both at the sequence and structural levels across multiple flaviviruses.
As there is no specific reported function associated with the region where E 250-270 is located in DV, to establish the significance of this highly conserved peptide, a DV was generated with a mutated E 250-270 peptide (Fig. 3a). Several hydrophobic side chains were removed by mutation of five amino acid residues to alanine. These changes were designed to remove some of the most conserved side chains across all E protein sequences whilst minimising structural disruption across the peptide. Prediction of the effect of these substitutions on the three-dimensional structure of the envelope protein was carried out with Modeller 28 using the complete E protein x-ray crystallographic structure as a template 29 (Fig. 3b). PROCHECK 30 , WHAT IF 31 and Verify3D 32 algorithms were used to ensure the satisfaction of stereochemical restraints indicating that the amino acid substitutions could be tolerated by the E protein without gross structural disruption. More studies will be necessary to fully evaluate the impact of these mutations on E protein biology.
An infectious clone of DV1, strain BR/90, was used to construct the mutant virus 33 . Firstly, three silent nucleotide changes were inserted to add restriction enzyme cleavage sites (C368T, T1663G, G1822C, based on GenBank AF226685.2) and then alanine substitutions were introduced in E protein amino acids 250-253 and 255 by gene synthesis and conventional cloning (Fig. 3a). The resultant infectious clone, named Conserved Surface Mutant 1 (CSMut1), was fully sequenced and no modifications other than those desired were observed. CSMut1 and matching wild type (WT) were in vitro transcribed using MegaScript T7 synthesis kit (Ambion), supplemented with m 7 G(5′ )ppp(5′ )G RNA Cap Structure Analog (New England Biolabs). RNA was purified (RNeasy Mini Kit, Qiagen), and transfected into Huh7.5 cells with Lipofectin (Invitrogen). Cells and supernatant were recovered at indicated hours post transfection (h.p.t.) and the RNA extracted using RNeasy Mini Kit or QIAamp Viral RNA Mini Kit, respectively (both Qiagen). Using RT-qPCR for non-structural protein 5 (NS5) mRNA (5′ GCAAACATCTTCAGGGGAAGT 3′ , 5′ GCTCCCGTACCTCTCCTACC 3′ ), only decreasing quantities of CSMut1 NS5 transcript were observed in both cell associated RNA or culture supernatant, suggesting an impaired ability of this mutant virus to replicate (Fig. 3c) replication with the expected kinetics for this virus (Fig. 3c). Furthermore, CSMut1 growth was not detected by plaque assay (data not shown).
The impact of CSMut1 on DV1 was also assessed by immunofluorescence for the viral envelope antigen in Huh7.5 cells 34 . Briefly, Huh7.5 cells were transfected with wild type or CSMut1 RNA using Lipofectin, fixed after 120 hours, and stained with monoclonal anti-E antibody (4G2 -ATCC ® HB-112 ™ ), using Alexa Fluor 488 rabbit anti-mouse IgG (H + L, Life Technologies) as a secondary antibody, and DAPI counter stain (Molecular Probes). In agreement with RNA detection (Fig. 3c), E protein expression was not observed in cells transfected with CSMut1 RNA, suggesting that the mutant virus was not able to spread in Huh7.5 cells (Fig. 3d). To understand whether the impairment caused by CSMut1 was restricted to the mammalian host, similar experiments to those described above were performed in a C6/36 Aedes albopictus cell line. The results were remarkably similar to those in human cells (data not shown), suggesting that the defect caused by the mutations were not due to a host specific factor.
These data show that, despite the great diversity among the serotypes of dengue viruses there are at least two polypeptides within its main antigenic determinant that are highly conserved at both the sequence and structural levels. These two regions were also reported during analysis of pan-DV sequences as potential immune-relevant The "x" axis represents the position of amino acid residues in the DV CDS (a) or Envelope protein (b) and "y" axis represents the conservation score, where 1 indicates the highest conservation. The two most highly conserved peptides in the envelope protein (b) are the fusion peptide (residues E 99-112 , highlighted in the blue box) and E 250-270 (red box). Above "a" the schematic DV CDS. C: capsid; prM: membrane precursor; E: envelope; NS: non-structural. Above "b" the scheme of E protein domains (I, II, II), the stem segment and transmembrane anchor (TM).
T cell determinants 11 . These peptides are generally buried in dimeric E form, and are thought to become fully exposed only in the E fusogenic form of mature virion 35 . This assumption suggests that once hidden to the humoral immune response, these regions remain conserved due to the absence of selective pressure from host immunity. To test this, the presence of antibodies against E 250-270 was assessed in patient sera. Serum samples were obtained from an outbreak in Paraná state, Brazil (2013) according to the approved guidelines of Fiocruz (Fundação Oswaldo Cruz), within Instituto Carlos Chagas (Curitiba, Brazil). Experiments involving human subjects were approved by the committee of ethics and research (Comitê de Ética e Pesquisa -CEP) of Fiocruz-RJ under protocol 617/11. Informed consent was obtained from all donors. 70 serum samples were evaluated for DV infection (by NS1, IgM and IgG ELISAs). Samples corresponded to 54 DV1 sera designated positive by at least one assay and from patients in the acute phase of disease (up to the seventh day after the onset of symptoms) and 16 control sera from non-infected individuals. An indirect ELISA protocol was used to identify anti-E 250-270 IgG antibodies 36 (Fig. 4). Briefly, plates were coated with synthetic E 253-270 peptide (it was not possible to solubilise a full length E 250-270 peptide), blocked and incubated with serum samples. An HRP-linked, goat anti-human IgG (H + L, Invitrogen) was used as secondary antibody, and the plates were treated with o-phenylenediamine in citrate phosphate buffer containing 30% hydrogen peroxide. These ELISA data demonstrated this peptide to be antigenic in human natural infections, despite its predicted buried nature in the dimeric E form. One explanation for the development of antibodies to this site would be the dynamic movement of the virus particle, described as 'breathing' 37,38 . In agreement, Ramanathan, B. et al. 39 predict the peptide as being a potential linear epitope, and the Rey group described a broadly neutralising monoclonal antibody that interacts with the first valine of E 250-270 40 . Therefore, two other reasons for conservation could be explored i) structure and ii) function. Modelling of the mutant E protein did not show evidence of structure destabilisation, even though the mutated residues were conserved across almost all analysed virus sequences, suggesting that E 250-270 might have a functional role in the DV life cycle. In agreement with our data other groups suggest that amino acids within this region are key to the E protein biology. Accordingly, a single substitution at V251A moderately restricts DV replication and viral particle production in C6/36 and Vero cells, and reduces viral E protein detection by immunofluorescence in C6/36 and BHK-21 transfected cells 41 . Moreover, it was demonstrated that G266 and I270 substitutions to tryptophan affected viral replication in mammals and insect cells, and the I270W reduced fusion 42 . In contrast the study of Christian et al. using replicons with random point mutations in E protein did not verify significant differences in E expression, budding and infectivity when mutating residues of the peptide´s N-terminus (such as V251, and others mutated in CSMut1), or residues 266 and 270 43 . This could be related due to the amino acid utilized in substitutions, restrictions of the replicon´s method, and the contribution of multiple mutations as present in CSMut1. On the other hand, they suggests that other residues in peptide´s C-terminus, M258, H259 and A265, could have important role in fusion and, if mutated, greatly reduce or practically abolish viral replication 43 . These residues could form latch contacts with M proteins preventing the premature triggering of E protein. M258 forms with other residues a hydrophobic path that appears to be important to interact with F400 of E stem region in DV1 trimer, and histidine protonation, among them H259, enable the dissociation of E and M protein contacts 29,43 . The undetectable level of replication observed to CSMut1 demonstrated the importance of E 250-270 to viral infectivity and further investigations should be done to elucidate the exact function this conserved peptide. Moreover the detection of antibodies anti E 250-270 in natural infected patients point the possibility to the use of these peptide as immunological target, as recently shown 36 .
Our strategy enables identification of the most conserved regions in DV genomes, other flavivirus such as ZKV, and also the rational designing of mutant viruses to investigate the importance of these regions in viral fitness and infectivity. As carried out for E 250-270 peptide, a systematic analysis of other highly conserved regions could suggest potential immunological/pharmacologic target for dengue treatment and control. Moreover, these knowledge can be extrapolated for other ssRNA genome viruses, and contribute to understand the evolution of their cryptic conserved peptides.