Based on serological evidence and viral isolation, Zika virus (ZIKV) has circulated for many years relatively benignly in a sylvatic cycle in Africa and an urban cycle in South East Asia (SEA). With the recent availability of limited but novel Indian ZIKV sequences to add to the plethora of SEA sequences, we traced the phylogenetic history and spatio-temporal dispersal pattern of ZIKV in Asia prior to its explosive emergence in the Pacific region and the Americas. These analyses demonstrated that the introduction and dispersal of ZIKV on the Pacific islands were preceded by an extended period of relatively silent transmission in SEA, enabling the virus to expand geographically and evolve adaptively before its unanticipated introduction to immunologically naive populations on the Pacific islands and in the Americas. Our findings reveal new features of the evolution and dispersal of this intriguing virus and may benefit future disease control strategies.
Zika virus (ZIKV) was first isolated in 1947 from rhesus monkey serum collected in the Zika Forest of Uganda (Africa)1. Serosurveys conducted from the 1950s onwards suggested that ZIKV circulated in several countries of Africa and Asia. Less than 20 human infections were confirmed within the first 60 years following the discovery of the virus, probably because it causes mild disease in humans and is often misdiagnosed given the existence of several clinically similar diseases, for example, dengue or chikungunya fever2, 3. ZIKV emerged in Africa (African lineage) and subsequently spread through Asia (Asian Lineage). In 2007, an Asian lineage strain of ZIKV unpredictably caused a unique explosive epidemic on the Pacific island of Yap in Micronesia4, involving 70% of the population. However, within months, the epidemic subsided, and there has only been one further report of epidemic ZIKV in Micronesia in the Kosrae State in 20165. However, in October 2013, a more recent ZIKV strain emerged in French Polynesia (FP)6, probably following the accumulation of functionally important amino acid substitutions7, 8, which were predicted to enhance virus transmission by the primary vector species Aedes aegypti7. Subsequently, non-vector borne transmission (mother to child, sexual and blood transfusion transmission) was confirmed with the emergence of ZIKV in the Americas2, 9. Following its establishment on the FP islands, ZIKV radiated eastwards and westwards10 and subsequently emerged in Brazil before dispersing and causing epidemics in the Americas2, 11.
Many new sequences are now available from ZIKV-infected humans in Asia, including Bangladesh12, India13, Indonesia14, the Philippines15, Singapore16, Thailand17, and Vietnam18. These sequences may impact our interpretation of the early evolutionary history of ZIKV in Asia. We therefore re-visited the evolutionary events of ZIKV from its first emergence in Asia. Although additional sequence data will expand our knowledge, it is already possible with the new data from India and South-East Asia (SEA) to interpret the dispersal patterns of ZIKV in terms of its geographic source, mode of spread, and time of circulation.
Using updated phylogeographic and phylogenetic analyses, we illustrate and discuss how the epidemic pattern in FP and the Americas contrasts strongly with the relatively long-term and apparently benign circulation of ZIKV in SEA.
Bayesian phylogenetic analysis of ZIKV sequences from Africa and Asia, including partial data from a recently available Indian ZIKV sequence13, indicate that the Indian virus shares a more recent ancestry with all of the other Asian sequences than with the Malaysian sequence, which is positioned as a sister-group to all of the other Asian sequences (Fig. 1, branch leading to node B, Posterior probability 0.71; Suppl. fig. S1). We also performed separate Bayesian analyses of both the partial capsid and NS2b/NS3-genes (Suppl. fig. S2 and S3, respectively). Our results demonstrated that Indian ZIKV shared a more recent ancestry with the SEA viruses compared to the Malaysian virus. However, a tree based on a partial envelope gene sequence placed the Indian virus in the most basal position of the Asian ZIKV clade (Suppl. fig. S4). Whether this truly represents the phylogenetic position of Indian ZIKV will require a complete genomic analysis of the isolate and more sequence data from geographically distant regions of India and neighbouring countries. Subsequently, and separated by a long branch, ZIKV then diverged to form a clade that harboured the majority of Asian ZIKV diversity (Fig 1, node C). This was followed by a series of diversification events (Fig. 1, nodes C–L) that predominantly occurred in SEA and were characterized by frequent introductions and localized outbreaks that probably reflected the increasing distribution and spread of ZIKV in the primary vector species, i.e., Ae aegypti, and susceptible humans. Notably, both Thailand and Vietnam experienced introductions and outbreaks of multiple ZIKV strains throughout the course of its evolution in SEA and have served as sources of ZIKV in other regions (Fig. 1, Suppl. fig. S1 and S5).
An inclusive phylogenetic and temporal analysis, focusing on isolates positioned after the dispersal of ZIKV from Africa but prior to its emergence on the Pacific islands and the Americas, was prepared. The evolutionary rate of the Asian ZIKV lineage under a strict clock was estimated to be 7.26 × 10−4 substitutions per site per year (95% HPD: 6.28–8.19 × 10−4). Furthermore, the analysis showed that the most recent common ancestor (MRCA) of ZIKV in Asia (Fig. 1, node A) was present 5–12 years earlier than previously estimated7, 19, ~1954–04 (95% highest posterior density: 1948–08 to 1959–05). Then, ~2 years later, circa 1956–10 (95% HPD: 1949–09 to 1964–05), the MRCA of the Indian lineage and all subsequent Asian lineage ZIKV emerged (Fig. 1, node B), surprisingly implying the introduction of Indian ZIKV from an Asian rather than an African source. Subsequently, following a silent period of ~40 years, ZIKV appeared in SEA in early 1997 (95% HPD: 1994–09 to 1999–03; Fig. 1, node C). It should also be noted that the MRCA of the Micronesian and Philippine sequences was estimated to have been present in Asia ca. 1997 (95% HPD: 1995–07 to 1999–12; Suppl. fig. S5). Thus, the ancestor of these two emergent viruses was possibly present in SEA and evolved relatively silently for at least 10 years prior to the Micronesian epidemic in 2007. Following the emergence of ZIKV in SEA (node C), there was a period of circulation of ~15 years (Fig. 1, nodes C–L) prior to its introduction and emergence on the FP islands (Fig. 1, nodes M1–M4) in mid-2012 (95% HPD: 2011–12 to 2013–01). The introduction of ZIKV from FP into America followed about 1 year later, ~2013–03 (95% HPD: 2012–11 to 2013–06) (Suppl. table 1; Suppl. fig. S5). When the Indian sequence data were excluded, the results of the temporal estimates were confirmed by using the HPD values that overlapped in the two analyses, which were supported by time-regression analyses including all data (Suppl. fig. S5–S7).
Following its first isolation in April 1947 from a sentinel rhesus monkey in the canopy of the African Zika Forest of Uganda1, ZIKV was isolated again in 1948 in the same forest but this time from Ae africanus, an indigenous African sylvatic mosquito1. These observations suggest that in Africa, ZIKV circulates between sylvatic Aedes species mosquitoes and monkeys that inhabit the forest canopy, which does not exclude other possible transmission cycles. Subsequent epidemiological and serological studies have revealed that ZIKV was probably also prevalent and widespread in African towns and villages during the 1950s2, 20, 21. Since it is now known that both dengue viruses (DENVs) and ZIKV adapted from sylvatic to urban transmission cycles, likely due to the increasing Ae aegypti population density in urban districts of SEA, our observations provide a rational explanation for how it was possible for ZIKV to be isolated in 1966 from domestic Ae aegypti in populous areas of the Malaysian peninsula22. Importantly, attempts in the 1950s to isolate ZIKV from a wide range of mosquitoes in sylvatic environments in Malaysia22 and other Asian countries did not result in the identification of additional strains of ZIKV, implying that the Malaysian virus was not significantly established in the sylvatic Asian environment, which also appears to be the case for ZIKV in the Pacific and the Americas; that is, ZIKV is not significantly sylvatic except in Africa. Nevertheless, following the detection of ZIKV in Malaysia, subsequent serosurveys conducted between 1952 and 1989, in which 189 of 1011 tests were reported as ZIKV positive2, 23,24,25, support the concept of widespread distribution of ZIKV in SEA2, 19, 26,27,28,29. Importantly, these early studies were performed using the most specific serological assay, viz., neutralization against high virus challenge doses (>50 LD50 to >1000 LD50) in mice, to reveal a high incidence of neutralizing antibody responses in residents of Malaya (19% and 75%)23, 25, Borneo (18%)23 and India (16.8%)24.
Why did ZIKV disperse so successfully throughout Asia for so many years without appearing to have caused major epidemics30? Clearly, the very low frequency of clinically apparent ZIKV infections and the difficulty in clinically discriminating between infections caused by DENV, chikungunya virus (CHIKV), and ZIKV were major factors in the failure to recognize the presence of ZIKV infections. Second, the inapparent spread of endemic ZIKV throughout SEA resulted in the emergence of low-level herd immunity to the virus. Third, antigenically related flaviviruses, including DENV, West Nile virus (WNV), Japanese encephalitis virus, Wesselsbron virus, and Stratford virus, circulating throughout SEA contributed a background of cross-protective reactivity to ZIKV that might have dampened ZIKV transmission. By contrast, with the exception of several islands in the Western Pacific region that have experienced outbreaks of JEV, DENV was the only human pathogenic flavivirus known to have caused outbreaks in islands further east in the Pacific prior to the arrival of epidemic ZIKV. Moreover, it is a widely held view that immunity to DENV alone does not protect humans against infection with ZIKV2. Thus, the evolving ZIKV was probably introduced to populations of immunologically naive inhabitants on most of the Pacific islands, leading to the explosive epidemics that have now been reported widely7. Several human flaviviruses, including DENVs, are known to circulate in the Americas. Thus, the situation in Brazil and neighbouring South American countries should differ significantly from that described in Oceania because several human pathogenic ZIKV-related flaviviruses circulate in Brazil, including DENVs, St. Louis encephalitis virus (SLEV), yellow fever virus (YFV), Ilheus virus (ILHV), Rocio virus (ROCV), and others. Why, therefore, was ZIKV able to cause explosive epidemics in the Americas in the presence of this potential immune background? The most likely explanation is that, with the exception of YFV, which circulates primarily in the sylvatic environment in the Americas, and DENV, which is not believed to stimulate protective immunity to ZIKV2, these other ZIKV-related flaviviruses are not commonly transmitted by domestic Ae aegypti and therefore do not circulate primarily in the urban environment favored by ZIKV.
Moreover, a serosurvey of Indian human sera using the mouse protection test revealed a high incidence of protective immunity against DENV-1 (40.4%: 72/196), DENV-2 (20.2%: 38/188), WNV (35.0%: 123/351), and ZIKV (16.8%: 33/196), while none of 588 sera neutralized YFV and only a few neutralized the American flaviviruses, ILHV (1.4%: 3/206) or SLEV (2.8%: 6/211)24. These results suggest that the presence of an antibody-mediated cross-protection between ZIKV and either YFV or the American flaviviruses is relatively unlikely. In addition, while ROCV caused local epidemics in coastal areas of São Paulo, Brazil from 1973 to 1980, it has subsequently disappeared from this region of Brazil30. Furthermore, WNV, which has only recently been detected in Brazil, does not yet appear to have become established as a significant pathogen in Latin America31. It should also be emphasized that even following the epidemics in the Pacific and subsequent spread of ZIKV to the Americas, Asia has experienced relatively small ZIKV outbreaks compared to the explosive and clinically apparent epidemics in the Pacific and the Americas32.
Thus, ZIKV epidemics in FP and the Americas have occurred in the presence of a high background immunity to DENV. However, as noted above, it is a widely held view that immunity to DENV does not necessarily protect humans against infection with ZIKV2. How and when did African ZIKV gain access to SEA? Like DENV and CHIKV, the major transmission vector of ZIKV is Ae aegypti, the domestic descendant of the sylvatic African Ae africanus2, 33 and the primary conduit for the introduction of DENV and CHIKV to Asia via ships trading out of East Africa during the 18th–20th centuries34,35,36. Thus, by analogy, ZIKV is also likely to have been introduced to Asia, from Africa, in infected humans and/or Ae aegypti, to which ZIKV had adapted, via the same shipping routes. This argument is also supported by contemporary evidence of frequently reported cases of ZIKV being introduced to non-tropical countries by infected individuals returning from areas in the tropics where epidemic ZIKV was known to be circulating37. Based on an earlier study7 with a similar estimated evolutionary rate (Median: 6.30 × 10−4; 95% HPD: 5.48 × 10−4 to 7.10 × 10−4) to that of the present study, the African and Asian ZIKV lineages were estimated to have diverged in ~1834, which is consistent with reports of other Ae aegypti- transmitted viruses, namely, DENV and CHIKV, being introduced via ships trading between East Africa through India to SEA34, 38.
Recently, ZIKV was isolated from a patient in northern India. A neighbour-joining analysis of the partial envelope gene of this virus implied that the Indian sequence was basal in the Asian ZIKV clade13. To investigate this further, we concatenated sequences of the partial capsid, envelope and NS2b/NS3 genes of the Indian ZIKV isolate and analyzed the data in a Bayesian phylogenetic framework. Our results suggested that the Indian sequence was not basal to the Malaysian virus isolated in 1966 (Suppl. fig. S1). By contrast, our analysis suggests that ZIKV could have been introduced to India from SEA, rather than vice versa. Although the currently available sequence data do not robustly resolve the Indian versus Malaysian ancestry, the concept of the introduction of ZIKV from SEA to India seems plausible given that ZIKV also appears to have been introduced to Bangladesh from SEA (Fig. 1; Suppl. fig. S1 and S5). In this context, the highest incidence of human sera that neutralized ZIKV was collected from towns in the Ahmedabad district (north-west India) in the state from which the NIV1720741 strain used in the present analysis was isolated in 201624. This state may, therefore, represent a pocket of endemicity containing divergent ZIKV strains in that region.
Our phylogenetic and temporal analyses suggest that ZIKV was first introduced to SEA during the 1950s, and then dispersed both eastwards and westwards ~5–12 years earlier than was previously predicted7, 19. Our estimate is also compatible with serological studies conducted in Malaysia, India, Indonesia, the Philippines, Thailand and Vietnam, indicating that ZIKV was present in these countries between 1952 and 19542 (and references therein). Based on the current phylogenies (Fig. 1; Suppl. fig. S1 and S5), ZIKV was potentially first introduced from Africa to the Malaysian–Indonesian region and subsequently to India. However, although many new sequences have been published during the past few years, it is still not possible to determine precisely whether any particular country in SEA has played a major role compared with other countries in the evolution and dispersal of ZIKV in SEA.
Between 1954 and 1976, no recorded human clinical ZIKV cases in Asia were confirmed using serological tests or virus isolation procedures. Hence, it is likely that ZIKV remained undetected because it was primarily a sub-clinical disease and diagnostic tests for ZIKV were not routinely performed. Then, in 1977 and 1978 three and four febrile patients on Java, Indonesia, were diagnosed with ZIKV infection, respectively39. Over 35 years later, ZIKV was isolated from a patient from Sumatra, Indonesia14. It is possible that ZIKV remained undetected in the Malaysian–Indonesian region for a long time, circulating mainly between Ae aegypti and human hosts, causing sporadic sub-clinical or mildly clinical human infections, while evolving and adapting to its vector species. Thus, since the late 1950s, ZIKV appears to have circulated silently in SEA according to serosurvey data. The analyses suggest that the MRCA of the Malaysian and Indian isolates is approximately 40 years earlier than that of the Micronesian/Philippine and all subsequent isolates (Fig. 1, node B vs C, Suppl. fig. S5). However, without further sequence data from these neglected areas of Asia, it will continue to be difficult to identify precisely where ZIKV was circulating and, potentially, causing human ZIKV infections. Nevertheless, serological studies suggest that ZIKV may have been present in the Indian–SEA region earlier than the 1950s23, 25, 40.
It is clear that ZIKV has circulated without recognition for many years in Asia. This is supported by evidence from serology, virus isolation2 and phylogenetic analyses, all of which have shown that the ancestral Asian virus continued to evolve and circulate (Suppl. fig. S1 and S5) prior to and after its unexpected emergence as an epidemic virus on Yap island, Micronesia, in 20074. In fact, during a period of 14 years, 1997–2011 (Fig. 1, nodes C–L), this ancestral lineage of Asian ZIKV appears to have dispersed between several other countries in SEA. Notably, Thailand and Vietnam were possibly the source of ZIKV in other regions of Asia. For example, Thailand appears to have been the source of ZIKV in Singapore, which experienced a large outbreak in 201616, and in Bangladesh (Suppl. fig S5). Thus, it is likely that different countries in SEA have experienced different epidemiological situations. Whereas, the Philippines seems to have had a single strain of ZIKV circulating for several years, Thailand and Vietnam appear to have experienced multiple introductions and “waves” of ZIKV that have swept across these countries (Suppl. fig S1 and S5) followed by re-occurring outbreaks29, 41. Consequently, following the isolated Micronesian epidemic in 2007, ZIKV continued to circulate and evolve in the SEA–Indian-region32, 42 before causing a large-scale epidemic in FP in 2013/2014. ZIKV was subsequently introduced into the Americas, apparently via Brazil7, 8, 11.
It is also believed that the ZIKV strain that emerged on Yap Island, Micronesia, did not disperse widely beyond Micronesia. Although the reasons for its failure to spread have not been identified, there are several possible explanations. First, the Micronesian lineage pre-dates the FP lineage by at least six years, during which four unique and functionally significant amino acid substitutions emerged in the FP lineage, i.e., T777M, V763M, S139N, and M/T2634V7. The first three of these unique substitutions has been shown to have functions relating either to receptor attachment or virus fusion activity with cell membranes43. These substitutions could therefore have an impact on the relative transmission efficiency of the FP lineage compared with earlier Asian lineage sequences7. Second, in support of this interpretation, the unique S139N substitution, first identified in 20167, emerged in FP and was detected in virtually every descendant lineage, suggesting a possible link with microcephaly. Subsequently, the unique S139N substitution was associated with increased ZIKV infectivity in humans and enhanced fetal mouse microcephaly44. Third, the relatively small human population size and density on the Micronesian islands and limited connectivity with other Pacific islands might have limited the likelihood of ZIKV dispersal to other distant regions of the Pacific. Fourth, a mutation (A188V) was subsequently shown to potentially contribute to the apparently increased transmission efficiency of ZIKV45. However, more than one amino acid substitution appears to determine the transmission efficiency since the A188V substitution is not consistently associated with it. For example, A188V is the ancestral state for this described substitution in Africa, but the only outbreak reported in Africa was caused by a strain belonging to the Asian lineage46. Nevertheless, the data, in total, are consistent with the hypothesis that ZIKV has circulated, potentially silently in the presence of other related human pathogenic flaviviruses, in sub-Saharan Africa and SEA for many decades47,48,49.
Our analysis also indicates that the MRCA of the lineage that eventually led to the large outbreak on FP might have existed and potentially been introduced in June 2012 (95% HPD: 2011–12 to 2013–01), i.e., ~1 year before the first detection of ZIKV in FP in October 20136, 9, 50, 51. Based on the observed short divergence times between nodes M1 and N (Suppl. table 1; Suppl. fig. S5), a potential explanation for the rapid dispersal of ZIKV in the Pacific and the Americas could be interpreted as being due to relatively low levels of herd immunity in the populations. In addition, a similar cryptic period of about 1 year, before its appearance as an epidemic virus, was predicted for the introduction of ZIKV into Brazil11. Whether this reflects the relatively mild nature of ZIKV infections and consequently low viraemic levels in the majority of the infected population is an intriguing subject for mathematical modelers.
In summary, following its emergence in Asia, after being introduced from Africa, ZIKV circulated as a relatively benign virus in Asia for several decades. ZIKV then emerged in Oceania and the Americas and, largely because of its association with central nervous system malformations in neonates, especially microcephaly, it became an international public health concern. This raises at least two important questions, (i) how did ZIKV circulate successfully throughout Asia during its apparent silent period between 1966 and 2006 and (ii) what are the major determinants of the increased epidemicity of ZIKV in the Pacific and the Americas. With regard to the former question, several possible explanations have been proposed for these differences in the epidemicity of ZIKV in Asia compared to that of ZIKV in the Pacific and the Americas: Stochastic evolution52, adaptive evolution7, NS1 protein enhancement45, S139 enhancement7, 44, and contrasting levels of background immunity due to cross-protective antibodies2. Each of these factors could contribute to the overall differences in the transmission efficiency. However, the relatively high background cross-reactive antigenicity and/or immunity, which are likely present in Asia, contrasting with the relatively low equivalent antigenicity and/or immunity in the Pacific and the Americas, in combination with the key amino acid mutations in the ZIKV genome, can potentially explain the different epidemiological patterns between Asia on the one hand and the Pacific and Americas on the other.
Materials and methods
Partial and complete genome ZIKV sequences were retrieved from the NCBI GenBank (www.ncbi.nlm.nih.gov/genbank/) focusing on Asian lineage ZIKV isolates. Initially, all available complete genomes (as of October 2017) were retrieved. Sequences collected from the Pacific islands and the Americas were subsampled to reduce the data load. Subsequently, partial phylogenetically informative sequences were collected around the Indian–SEA region and added to the dataset. These sequences were then used to construct two curated open reading-frame datasets (sequence information is available in Supplementary Table S2); (i) one open reading-frame dataset included 89 sequences, representing 5 from the African lineage and 84 of the Asian lineage; (ii) the other open reading-frame dataset included 84 sequences of Asian lineage only. The datasets were aligned with Mafft v.7.26653, keeping the reading-frame consistent with amino-acid positions, and were visualized and edited in AliView54. All subsequent analyses were performed with a generalized time-reversible nucleotide substitution model with four gamma distributed rate variation categories and a proportion of invariant sites, as selected by jModeltest v.255.
To analyze the phylogenies of the 84 Asian lineage strains, including the novel Indian sequence, a Bayesian phylogenetic tree was computed using MrBayes v.3.2.656 with the dataset including the 5 African ZIKV lineage strains as an out-group. Two parallel runs with four Metropolis-coupled chains were initiated for 5 M Markov chain Monte Carlo generations using the previously determined models of nucleotide evolution with default flat Dirichlet priors, sampling every 1000 generations and discarding the first 25% as burn-in before computing a consensus tree. Since only partial sequences of the Indian isolate were available, it is likely that phylogenetic placement was also examined by constructing three separate trees based on either the capsid (453 bp), envelope (773 bp), or NS2b/NS3 (1393 bp) sequence alignments.
To estimate the evolutionary rates and time to the most recent common ancestor for the Asian lineage, BEAST v.1.8.357 was employed. Initially, the temporal structure was assessed with TempEst V.1.5.1 (http://tree.bio.ed.ac.uk/software/tempest/) plus MM-type robust regression58. Subsequently, a path-stone and stepping-stone model-test was performed59, determining that a strict molecular clock with a non-informative continuous-time Markov chain (CTMC) prior and a Bayesian skyline coalescent tree prior with a piecewise-constant demographic model best suited the dataset based on Bayes factor evaluation. Based on the resultant Bayesian phylogenetic tree, which included African ZIKV as outgroup sequences, all Asian ZIKV lineages except the Malaysian 1966-sequence were annotated as a monophyletic clade. The robustness of the dating was also evaluated by excluding the Indian sequence. Using the models and parameters suggested by the model-test, two analyses were run in parallel for 100 M MCMC generations, sampling every 10,000 generations per analysis. The convergence of the two runs was assessed with Tracer 1.6 (http://tree.bio.ed.ac.uk/software/tracer/). Tree- and log-files were combined with LogCombiner (BEAST-package57), and a maximum-clade credibility tree was computed with TreeAnnotator (BEAST-package57) after discarding the first 10 M MCMC generations of each run. The resultant consensus tree was visualized and edited in FigTree v.1.4.1 (http://tree.bio.ed.ac.uk/software/figtree/). All computations were run using the CIPRES computational cluster60.
E.A.G., D.M., J.H.-O.P., M.D.-R., V.-M.C-L., V.E., and X.D.L. are members of the ZIKAlliance Programme of the European Union. M.W.G. and A.K.F. are members of the ZikaPLAN Programme of the European Union. This study was supported by the ZIKAlliance project and the ZikaPLAN project under European Union-Horizon 2020 programme (Grant Agreement Nos. 734548 and 734584, respectively). J.H.-O.P. is supported by the Swedish Research Council FORMAS (Grant Agreement 2015-710).