Whole genome sequencing, variant analysis, phylogenetics, and deep sequencing of Zika virus strains

The recent emergence of Zika virus (ZIKV) has been concentrated in the Caribbean, Southeastern United States, and South- and Central America; resulting in travel-based cases being reported around the globe. As multi-disciplinary collaborations are combatting the ZIKV outbreak, the need to validate the sequence of existing strains has become apparent. Here, we report high-quality sequence data for multiple ZIKV strains made publicly available through the National Institutes of Health- (NIH) funded biorepository, BEI Resources (www.beiresources.org). Next-generation sequencing, 3′ rapid amplification of cDNA ends (RACE), and viral genome annotation pipelines generated GenBank sequence records for 16 BEI Resources strains. Minor variants, consensus mutations, and consensus insertions/deletions were identified within the viral stocks using next-generation sequencing (NGS) and consensus changes were confirmed with Sanger sequencing. Bioinformatics analyses of the sequencing results confirm that the virus stocks available to the scientific research community through BEI Resources adequately represent the viral population diversity of ZIKV.

Genome Sequencing. Three sets of custom PCR primers were designed to generate amplicons from consensus Zika genome sequences and primers were diluted to 2 μM and pooled in equal volumes. Amplicon 1 was generated with primers 5′-GCTAACAACAGTATCAACAG-3′ and 5′-GATCTTTGTGGTCATTCTCTTC-3′. Amplicon 2 was generated with primers 5′-GTATGGAATGGAGATAAGGCC-3′ and 5′-ATGGTCTCTARGGTCTCCGG-3′. Amplicon 3 was generated with primers 5′-GTWGCATCTGCCGGAATAAC-3′ and 5′-GGCTGCACA GCTTTCCCCAA-3′. Three independent PCR reactions were performed on 2 μL of the reverse transcription products using Phusion 2x Hot Start Mix (New England Biolabs) to generate three overlapping ~3 kb amplicons across the genome. Amplicons were verified on 1% agarose gels. Amplicons were quantitated using a SYBR Green dsDNA detection assay (SYBR Green I Nucleic Acid Gel Stain, Thermo Fisher Scientific), and all three amplicons per genome were pooled in equal concentrations. The RNA-based and DNA-based sequence-independent single-primer amplification (SISPA) methods were used to generate complete genome sequence data for these BEI Resources strains with 300 bp paired-end reads on the Illumina MiSeq instrument.
For DNA SISPA, 50 to 200 ng of pooled amplicon DNA were combined with dimethyl sulfoxide and a random hexamer oligonucleotide with unique barcodes for each sample. The mixture was incubated at 95 °C for 5 min and immediately placed on ice. The denatured DNA template was then incubated with the Klenow fragment 3′−5′-exo (New England BioLabs) at 37 °C for 60 min, followed by 75 °C for 10 min. The resulting DNA was amplified by PCR using AmpliTaq Gold (Life Technologies) for 35 cycles (94 °C for 30 s, 55 °C for 30 s, and 68 °C for 45 s). PCR mixture contained primers specific for each barcode with either 3 or 4 N's at the 5′ end using an equimolar ratio combination. The resulting DNA was then treated with exonuclease I (New England BioLabs) at 37 °C for 60 min.
For RNA SISPA, 5 μl of viral RNA and uniquely barcoded random hexamer oligonucleotides were combined with SuperScript III reverse transcriptase/Platinum Taq high-fidelity enzyme mix (ThermoFisher Scientific) for first-strand cDNA synthesis (50 °C, 5 minutes 0.4 °C, 5 minutes, 25 °C, 15 minutes, 50 °C, 30 minutes, 55 °C, 10 minutes, 70 °C, 15 minutes, 4 °C, 10 minutes-1 h). The cDNA (first strand) was incubated with RNase H and Klenow fragment 3′−5-exo (New England Biolabs) at 37 °C for 60 min, followed by 80 °C for 10 min for second-strand DNA synthesis. The product was then amplified by polymerase chain reaction for 45 cycles (94 °C for 30 s, 55 °C for 30 s, and 68 °C for 30 s). The PCR mixture contained primers specific for each barcode with either 3 or 4 N's at the 5′ end using an equimolar ratio combination.
The DNA or RNA SISPA products were then normalized and pooled into a single reaction mixture that was purified using a QIAquick PCR purification kit (Qiagen). SPRIselect (Beckman Coulter) bead selection method was used to select for SISPA products that were 300 to 1000 bp in length prior to sequencing. 3′ Race. 3′ Rapid Amplification of cDNA Ends (RACE) was performed by poly-adenylating the ZIKV RNA using E-PAP (Thermo Fisher Scientific) followed by cDNA synthesis and PCR using FirstChoice RLM-RACE Kit (Thermo Fisher Scientific). The sequence of the ZIKV-specific primer used at the PCR step was 5′-AGAGTGTGGATTGAGGAGAACGAC-3′. PCR products were then cloned and sequenced with Sanger technology prior to incorporation into the respective consensus genome sequence. Sequences obtained by 3′ RACE extended those obtained by NGS by approximately 70 nucleotides.
Read Assembly. After sequencing, reads from each sample were deconvoluted by barcode and trimmed to eliminate low-quality regions consisting of a Phred score <30 with an ascii offset of 33, minimum length of 64, and minimum nucleotide quality score of 20. SISPA hexamer primers and barcode sequences were also removed in the trimming process. Trimmed reads were subjected to de novo assembly with CLC Bio software. The resulting contigs were then queried against a custom full-length Zika virus reference database to determine the closest reference sequence. Contigs were then mapped to the selected reference sequence for each sample using the CLC Bio software suite. Parameters for this reference-based assembly consisted of mismatch cost, 2; gap cost, 3; reads mapping to multiple loci, placed randomly; fraction of the read that must be placed for assembly, 0.5; similarity in the fraction of the read to be placed, 0.95; paired read information, not used; alignment mode, local. For sites where the majority of reads disagreed with the sequence from the reference strain, the reference sequence was updated accordingly to improve read mapping in subsequent assemblies. A final mapping of all next-generation reads to the selected reference sequences was then performed within the CLC Bio software. Curated assemblies were validated and annotated with the Viral Genome ORF Reader (VIGOR) version 3 annotation software 12 before submission to GenBank. VIGOR was used to predict genes, perform alignments, ensure the fidelity of open reading frames, correlate nucleotide polymorphisms with amino acid changes, and detect any potential sequencing errors. The annotation was subjected to manual inspection and quality control before submission to GenBank.
Consensus Variation. The nucleotide and translated protein sequences obtained from various virus stocks provided by BEI Resources were grouped together with GenBank sequences having the same strain name. Each group of sequences was then aligned using MAFFT 13 , and checked for any observed differences at the consensus level across all accession numbers for the same strain. Each assembled position that differed from the previously-published sequences were manually inspected and verified.

Minor Variant Detection.
Deep sequencing analysis was performed on all strains that were processed at the J. Craig Venter Institute (JCVI) to identify the minor variants circulating in the population. This involved generating a consensus sequence from all sequence reads for each sample using CLC mapping assembly (clc_ ref_assemble_long) as discussed above. A custom script was then used to map all of the high-quality trimmed sequence reads from each viral strain to the associated consensus sequence by parsing the assembly output. Separate scripts were then used to calculate the major and minor alleles, determine the nucleotide location of the observed allele(s), and predict whether the change in each codon would result in an amino acid substitution. These annotations were then used to provide biological context and meaning to all detected minor variants. To find statistically significant variations in the population, all forward and reverse reads covering each position were checked and a statistical model using a binomial distribution was generated to ensure that each minor variant was observed above a specified frequency threshold (3%) with a 95% confidence interval followed by multiple-hypothesis correction using the Bonferroni method. Positions lacking sufficient coverage to call a minor allele with 95% confidence (p < 0.05) were not reported in the output. The variations observed in the sequencing reads for each strain were reported against the consensus sequence of the same strain to get the percentage of minor alleles in the population.
Recombination Detection. The 16 new ZIKV sequences were combined with 90 publicly-available ZIKV sequences in GenBank, aligned with MAFFT 13 , and subjected to recombination analysis using RDP4 14 .
The recombination detection methods included in the software suite were RDP 15 , GENECONV 16 , MaxChi 17 , CHIMAERA 18 , SiScan 19 , and 3SEQ 20 . Only recombination events with a Bonferroni-corrected p-value <= 0.01, which were confirmed by at least four methods were reported.

Sequence Analysis.
A correlation analysis between the root-to-tip genetic divergence and date of sampling was performed in Path-O-Gen v.1.4 with multiple sequence alignments generated with MAFFT 13 . The analyses were done separately for the Asian (n = 86) and the African (n = 20) lineages as well as for a combined dataset. Linear regressions of root-to-tip divergence as a function of sampling time were also performed with Path-O-Gen.
To perform the analysis, coding complete regions from 106 ZIKV genomes were extracted from the National Center for Biotechnology Information (NCBI) representing 86 and 20 sequences from the Asian and African lineages, respectively. The coding regions were aligned with MAFFT and manually trimmed 13 . Maximum likelihood (ML) phylogenies were reconstructed using the heuristic tree search algorithm implemented in PhyML 21 . ML bootstrapping was performed with 100 replicates to assess the robustness of tree topologies. A general time reversible (GTR) nucleotide substitution model with a proportion of invariant sites were used to generate the Maximum Likelihood tree.
Bayesian Evolutionary Analysis Sampling Trees (BEAST v.1.8.2) was used to calculate the phylogenetic relationships and time to the most recent common ancestor for the coding region of all 106 ZIKV sequences 13,22,23 . The necessary parameters and data were prepared in XML format using BEAUti version 1.8.3. Sequences were separated according to Asian or African lineage and the dates of strain isolation were pulled either from GenBank or from the peer-reviewed literature. The GTR substitution model was selected with all estimated base frequencies and Gamma + Invariant site heterogeneity model. BEAST was run using both strict and relaxed uncorrelated models. Multiple combinations of molecular clock and coalescent models were run for the chain length of 100 million with a tree sampling every 10000 chains. The time to the most recent common ancestor and the nucleotide substitutions per site per year was calculated. Using Tracer v.1.6 24  Reconstructing the changes in effective population size over time was performed with Bayesian skyline using the GMRF Bayesian Skyride model, followed by model evaluation using AICM values in the Tracer program 23 . The differences in the AICM values showed that the coalescent GMRF Skyride model was superior than the coalescent Bayesian Skyline Piecewise-constant population model 25 .
Selection pressure in the coding region was calculated on the same set of 106 ZIKV genomes, consisting of the 16 sequences from this study together with others in GenBank, using the SLAC, FEL, IFEL, and MEME algorithms within the HyPhy package 26 . Two sequences (NC_012532 and KF383118) were removed from the selection analysis due to multiple insertions/deletions (indels) causing frameshifts in the codon alignment. Nine identical sequences were subsequently removed prior to applying default parameters to the analysis including: global dN/dS value estimated at 1.0 and significance level of 0.05. Code Availability. With the exception of the commercially-licensed programs, all custom scripts used to identify minor variants are available in the Elvira project through SourceForge.

Sequence Comparison of Various Stocks.
To allow researchers to efficiently apply molecular biology methods on ZIKV genetic material, we used next-generation sequencing to generate coding-complete consensus genome sequences for 16 ZIKV strains available at BEI, without additional passaging. These strains represent historical samples belonging to the African lineage 27 , as well as strains within the Asian lineage from the ongoing ZIKV outbreak (Supplementary Table S1). These virus stocks were provided to JCVI by BEI Resources for sequence validation and authentication on behalf of the scientific research community. The ZIKV coding-complete sequence data from this report enhances our understanding of the epidemiological and evolutionary dynamics of ZIKV from the recent outbreak. We subsequently performed additional downstream bioinformatics analyses of these assembled sequence data to better understand how the reagents available from BEI Resources compare to earlier passages of similar material, and to determine how these sequences relate to all other ZIKV strains with publicly-accessible genomes.
The four African lineage samples that were sequenced in this study were provided by BEI Resources. The MR-766 virus (BEI Resources: NR-50065) was isolated from the blood of a sentinel rhesus monkey in the Zika forest near Entebbe, Uganda, on April 20, 1947. Additional genomic sequences for MR-766 have been previously determined: KU720415.1, KX830960.1, KX377335.1, LC002520.1, DQ859059.1, KX601169.1, and AY632535/NC_012532. The GenBank accession number for the sequence of this isolate reported in this study, KU963573.2, complements these previous sequence data. The sequence for DQ859059.1 is very distinct from the rest of the MR-766 sequences, therefore it was not included in the comparison. Across all seven stocks from this isolate, we observed differences in 23 positions and six insertions/deletions (indels) at the nucleotide level (Table 1) as well as 17 substitutions and one indel at the amino acid level ( Table 2). Fifteen of these substitutions were only observed in the reference sequence NC_012532.1 compared to all the other sequences from this isolate ( Table 2). We found an insertion at position 118 followed by a deletion at position 139, resulting in a short frameshift within the reference sequence NC_012532.1 compared to all the other MR-766 isolates in NCBI, including the KU963573 sequenced at JCVI (Table 1). Similarly, a 12-base indel at genome position 1433 in NC_012532.1 was found for most sequences, including a deletion at this locus in the KU963573 sequence. The PRI/PRVABC59/2015 virus was originally isolated from the blood of a human in Puerto Rico during December 2015. The sequence generated from this stock in the current study (KX087101) was compared with the other existing sequences for PRVABC59 from NCBI (KU501215, KX601168, and KX377337). Only three substitutions and one indel were observed among all four whole genome sequences as shown in Table 3, which resulted in a total of two differences at the amino acid level, one each in the Envelope (E) and NS1 protein as shown in Table 4. The consensus sequence generated by JCVI is shorter by 28 and 1 bases for KX087101.3 at the 5′ and 3′ ends, respectively, compared to KX377337. These differences can primarily be attributed to SISPA-based methods not yielding sufficiently high coverage at the end of the template to generate a  Table S12).
To computationally confirm the results of the consensus variations, we reviewed the assembled reads and did not observe any reads that extend into the region containing a deletion in the relevant strains. To further validate our results with laboratory methods, we performed Sanger sequencing on two amplicons (spanning positions 1080-3151 and 8130-8912) that include representative indels and nucleotide substitutions in the ZIKV/Macaca mulatta/UGA/MR-766_SM150-V8/1947, ZIKV/Homo sapiens/NGA/IbH-30656_SM21V1-V3/1968, and ZIKV/ Homo Sapiens/PRI/PRVABC59/2015 stocks provided by BEI Resources. The sequenced amplicons confirmed the 12 nucleotide insertion/deletion, as well as the consensus variants that were detected by NGS sequencing in these regions of the selected strains, which were reported in our comparison.
Deep Sequencing for Minor Variants. Taking advantage of next-generation sequencing at a high read depth enables a better understanding of microevolution within an experimental system by identifying the minor variants (i.e. quasispecies) that exist within a population of viral genomes. We therefore set out to identify observed minor variants that displayed patterns which may contribute to viral adaptation to its host. To do so, we increased the frequency cutoff such that only minor variants in at least 20% of the mapped reads for any position were included in subsequent analyses. Given that ZIKV consists of a positive-sense single-stranded RNA genome, this cutoff should minimize the number of variants detected from viral transcripts as well as artifacts generated during PCR and/or reverse-transcription, and instead focus on those variants that are present at relatively high levels in the population of genomes within a given stock. To begin, we tabulated the number of minor variants compared to the consensus genome and calculated which coding regions contained the most minor variants after normalizing for gene length (Fig. 1). Interestingly, when both the African and Asian lineages were analyzed together, the cleaved pr region of the prM coding region contained the highest frequency of minor variants (0.053 variants per position). Separating the two lineages revealed that the prM coding region contained the highest frequency of minor variants in the Asian lineage (0.043 variants per position), while the E gene contained the highest frequency of minor variants in the African lineages (0.014 variants per position). While examining the results across the coding region at the codon level, we found a larger overall number of total minor variants for each of the Asian strains (Fig. 2), than for strains belonging to the African lineages (Fig. 3). We also observed 11 minor variants shared among multiple strains at codon positions 81 (C), 168 (prM),     691, and 1263) were identical among all strains that contained minor variants at these positions ( Table 5). The sample size is insufficient for providing statistical significance to these observations and the majority of the positions harboring variation appear to be mostly randomly distributed. Given their frequency and shared nature, it is unlikely that these positions have a negative effect on the viral population.   Phylogenetic Analysis. To better understand the evolutionary relationships between these sequences, we used the polyprotein coding region to reconstruct the phylogenetic relationships between the many ZIKV strains. Examination of the phylogenetic tree confirmed the existence of East African and West African lineages as was previously described 28 ( Supplementary Fig. S17). Among Asian lineage strains, the sequences responsible for the recent outbreak cluster very closely together. The basal relationships between the pre-outbreak FSM Using Bayesian methods, we calculated the ZIKV African and Asian lineages to have diverged from each other in the 17 th century, while the ancestor of all East and West African strains existed in the late 18 th century, and the most recent common ancestor for strains belonging to the Asian lineage existed in the late 19 th century. The East African sequences and West African sequences diverged in approximately 1816 and 1898 respectively, which differ only slightly from previously-reported dates 29 . Estimated mean evolutionary rates for the ZIKV genomes varied from 3.52E-04 (95% HPD 3.017E-4, 4.1099E-4)   substitutions per site per year based on strict and relaxed molecular clock estimates with different priors. These dates are similar to those generated using linear regression ( Supplementary Fig. S18). Interestingly, a subset of the sequences had stronger than expected phylogenetic relationships due to either the time or place of strain isolation. For example, the Asian lineage sequences that were isolated during the 1966 Malaysian, 2010 Cambodian, or 2014 Thai outbreaks are all monophyletic and basal to the extant outbreak sequences. In addition, sequences from strains isolated from the Pacific islands are closer to each other than they are to those from other locations, including Latin America. In almost all cases, the sequences cluster together with the geographical origin of the virus. No clustering pattern was observed based on the source organism of the virus (i.e. the ZIKV sequence extracted from human did not cluster any differently from those extracted from mosquito). Selection Analysis. We used four algorithms to calculate global dN/dS values for the coding region of multiple ZIKV genomes belonging to either the Asian or African lineages (see Supplementary Table S14). Out of the 3424 codons that we included in our selection pressure analysis, 227, 579, and 233 were identified as undergoing negative (i.e. purifying) selection by one, two, or three algorithms, respectively. Conversely, there were 49, 2, and 1 codons with characteristics of positive (diversifying) selection identified by one, two, or three algorithms, respectively. Specifically, aligned codon 3163 was predicted by 3 algorithms as undergoing positive selection, while codons 2456 and 2808 were predicted by 2 algorithms as having a genetic signature of undergoing positive selection. After calculating dN/dS across the ZIKV coding region, we normalized the values according to the number of codons in each coding region (Supplementary Table S13). These results revealed that the 4% of M and 2.8% of NS1 codons in ZIKV had displayed evidence of undergoing positive selection, while 43% of codons in 2K and 35% of codons in E were identified as undergoing negative selection pressure. The strain names and GenBank accession numbers for sequences used in these phylogenetic, selection, and other comparative genomics analyses are provided (Supplementary Table S14).

Discussion
Herein, we have identified consensus variants that differ between multiple sequence records for the same strain as well as minor variants within multiple virus stocks. The consensus sequences for stocks that are available through BEI Resources adequately represent the phylogenetic diversity of ZIKV, including the major ZIKV lineages (East African, West African, and Asian). The growing number of sequences from the recent outbreak are closely related while individual subclades share a high degree of phylogenetic relatedness. Recombination does not appear to play a significant role in the evolution of ZIKV, while evidence of selection pressure is present across viral proteins that are exposed to the host immune system.
It is important to note that the various Zika virus stocks have been sequenced at various points in time on various platforms. For example, the early isolates from Uganda, Malaysia, Micronesia, and possibly others were sequenced using Sanger technology 30,31 . Some of these same isolates have been sequenced again, together with viruses collected from the most recent outbreak using NGS instruments. These data together with other reported metadata, which describe the date and location of collection, are important to enable the accurate interpretation of computational results.
Our computational analyses show that these ZIKV strains are authentic and suitable for additional experiments. Although MR-766 (East African lineage) is the ZIKV reference sequence, other strains that belong to the Asian lineage would be much better suited as models for studying the recent Asian lineage-based outbreak. Moreover, it has been suggested previously that the 12-base deletion at positions 1435-1446 in the envelope of the MR-766 isolate seen in the stocks sequenced is because of extensive mouse brain passage of the isolate and therefore not a true representative of a natural strain 31 .
The fact that MR-766 is still one of the official Zika reference sequences is noteworthy because (1) MR-766 belongs to the African lineage (all recent outbreak strains fall into the Asian lineage) and (2) there are several bases in disagreement between the official reference sequence NC_012532.1 and other sequences derived from this isolate. We are confident in our data given the high-quality sequence read data and good coverage spanning the region of disagreement, which is reflected in the consensus coding region sequence. Moving forward, it will be important to take these sequence discrepancies into consideration since MR-766 is a common laboratory strain and has been used to establish ZIKV animal models 32 . The insertion/deletion reported here should be investigated further to determine whether its presence or absence affects viral binding and/or membrane fusion for viral entry in different hosts and cell types. Similar indels have been reported previously in other MR-766 sequences 27 . Most of the non-synonymous changes were located in the Envelope E and NS5 mature proteins. It is for these reasons that a transition to an Asian-lineage strain such as PRVABC59, especially for research in human-derived model systems should be carefully considered. Future phylogenetic and comparative genomics studies should ensure that any variants that are phylogenetically associated with the divergence between African and Asian sequences are not mistaken as "driver" mutations for increased neurovirulence.
We cannot definitively conclude that the predicted recombination event is accurate since both parent strains and the daughter strain are extremely similar to each other, and there is a lack of variation in this region among the three sequences. In addition, such a recombination event would require two template switch events in close proximity to each other to generate the predicted recombinant daughter sequence, which decreases the likelihood that this prediction is accurate.
The basal positioning and monophyletic nature of the Asian 1966 and 2007-2010 ZIKV sequences, suggest that these are unique ZIKV lineages that are not necessarily direct ancestors of the recent outbreak sequences. Instances where Brazilian sequences are the closest phylogenetic relative to strains isolated from recently-returned travelers to Italy likely indicate trans-Atlantic carriage of virus by a person that was later diagnosed clinically. In addition, the lack of clustering between viruses obtained from either host or vector organisms reflects the known biology of this virus and its continuous cycling between primates and mosquitoes, with very rare primate-to-primate transmission. The deep sequencing analysis revealed novel insight into the seemingly non-random nature of minor variants that are present at higher frequencies (>20%). The fact that a subset of these minor variants is observed across multiple strains is of interest. How these high-and low-frequency variants contribute to viral evolution, host specificity, and/or the host innate and adaptive immune response over time is largely unknown and warrants additional investigation. However, one study reported an increase in cytopathic effects associated with the A1263V substitution among other mutations in a Brazilian strain collected during the recent ZIKV outbreak 33 . A resolved three-dimensional protein crystal structure of the ZIKV capsid (C) protein shows that the I81 residue, for which we identified a minor variant, contributes to hydrophobic interactions within the protein 34 . Although only one minor variant was reported for a given nucleotide position in each stock because of the cutoffs that we imposed, it is important to recognize that multiple co-circulating alleles are likely present in each virus stock. Our 20% frequency threshold and 0.05 corrected p-value cutoff removed many of the minor variants that were likely laboratory artifacts generated by random mutation, reverse transcription, amplification, and sequencing. The reported variants represent a sample of the detectable population of nucleotide variations present in the sequenced sample, which may consist of variants generated by natural virus variation and/or PCR artifacts. Additional work will be required to examine whether the minor variants present in these virus stocks are found at similar levels in a permissive animal host.
Positive selection pressure is primarily driven by the host immune response to viral proteins during infection. The relatively small number of codons undergoing positive selection in ZIKV genomes is somewhat surprising given that at least a subset of codons in other mosquito-borne Flavivirus genomes have been shown to be undergoing such pressure 35,36 . In addition, positive selection in RNA viruses transmitted through an insect vector is somewhat limited due a transmission cycle that involves alternating between different host species, together with the lack of an adaptive immune system in mosquitoes 37,38 . The high degree of negative selection in the M gene is an interesting result since the gene codes only for a structural protein that has minimal exposure to the host adaptive immune system. It is likely that the structural (and other) functions of the M protein are highly specialized and therefore sensitive to amino acid substitutions. The majority of genes that code for enzymes (e.g. NS3, NS5) were found to have an intermediate number of codons undergoing negative selection. We would expect that the amino acids in and around the active site(s) for these enzymes would have high negative selection. However, the majority of amino acid substitutions in other regions of these enzymatic proteins would likely have minimal effect on the function of these proteins.
The findings of this study will minimize the need for the scientific research community to separately authenticate the virus reagents being made available through public repositories such as BEI Resources. These data can be useful in designing and performing future comparative studies that focus on gaining a better understanding of the pathogenesis, neurotropism, spread, potential treatment, and prevention of ZIKV through comparative experiments.

Data Availability
All sequences generated as part of this study were submitted to GenBank under the Bioproject PRJNA314889.