Independent Emergence of the Cosmopolitan Asian Chikungunya Virus, Philippines 2012

Outbreaks involving the Asian genotype Chikungunya virus (CHIKV) caused over one million infections in the Americas recently. The outbreak was preceded by a major nationwide outbreak in the Philippines. We examined the phylogenetic and phylogeographic relationships of representative CHIKV isolates obtained from the 2012 Philippines outbreak with other CHIKV isolates collected globally. Asian CHIKV isolated from the Philippines, China, Micronesia and Caribbean regions were found closely related, herein denoted as Cosmopolitan Asian CHIKV (CACV). Three adaptive amino acid substitutions in nsP3 (D483N), E1 (P397L) and E3 (Q19R) were identified among CACV. Acquisition of the nsP3-483N mutation in Compostela Valley followed by E1-397L/E3-19R in Laguna preceded the nationwide spread in the Philippines. The China isolates possessed two of the amino acid substitutions, nsP3-D483N and E1-P397L whereas the Micronesian and Caribbean CHIKV inherited all the three amino acid substitutions. The unique amino acid substitutions observed among the isolates suggest multiple independent virus dissemination events. The possible biological importance of the specific genetic signatures associated with the rapid global of the virus is not known and warrant future in-depth study and epidemiological follow-up. Molecular evidence, however, supports the Philippines outbreak as the possible origin of the CACV.

Genome structure and molecular signatures of the Philippines Asian CHIKV. We determined nearly complete genome sequence (11542 to 11599 nucleotides) of 16 Philippines Asian CHIKV isolates, consisting of nucleic acids from position 38 at the 5′ non-translated region to position 11649 at the 3′ non-translated region of the CHIKV TH-35. All the Philippines isolates possessed two open reading frames in the coding region that encoded for structural (capsid, E3, E2, 6k and E1), non-structural (nsP1, nsP2, nsP3 and nsP4) polyproteins and an opal stop codon at nsP3 codon 524.
Pairwise comparison of the Philippines Asian genotype CHIKV sequences revealed a unique deletion of 48 nucleotides (11361 to 11408) at the 3′ non-translating region in all isolates collected from Mariduque (CK12-702, CK12-708 and CK12-709). Within the protein-coding region, 29 amino acid substitutions were observed ( Table 1). Twelve of the amino acid substitutions were located within the structural polyproteins and 17 within the non-structural polyproteins, resulting in 0.96% and 0.69% amino acid changes, respectively.
Among the amino acid substitutions, nsP4-107H, nsP4-110S and E2-11V were specific for all isolates collected from Marinduque; nsP3-351D, nsP4-366 S and 6K-46M were unique to all isolates collected from Samar Island. The nsP3-88V and nsP4-101A were detected in isolates collected from Compostela Valley (CK12-684 and CK12-686); the latter was also detected in isolate from Aurora (CK12-921). The E2-297H was detected in samples collected from Romblon and Misamis Oriental. The E3-19R and E1-397L were specific to all the Philippines isolates except those collected from Aurora, Compostela Valley and Laguna (CK12-340). Selection pressure analysis of complete coding region of Asian CHIKV. The selection pressure on all protein-coding genes of Asian CHIKV was examined using HyPhy package as implemented in Datamonkey server. Based on the GARD analysis, there was no recombination in all the sequence alignments. Based on the PARRIS method, no evidence of positive selection was observed in sequence alignments of Asian genotype CHIKV. We observed positive selection codon by using the codon specific selection methods, where IFEL, REL and MEME identified 45 positive selection sites (Tables 2 and  3). Nine and thirty-six positively selected sites were identified in structural and non-structural genes, respectively. Three residues were identified as positively selected site by two out of three abovementioned methods. The nsP3-77 was identified by REL (Bayes factor = 434.6) and MEME (p = 0.034); nsP3-451 was identified by IFEL (p = 0.083) and REL (Bayes factor = 1069.35); and nsP4-81 was identified by REL (Bayes factor = 1770.92) and MEME (p = 0.007).
Molecular signature and phylogenetic relationship of Asian CHIKV. The relationship of the Asian CHIKV, collected globally, was examined using the phylogenetic tree reconstructed with open reading frame regions of the CHIKV genome (11,237 nucleotides). From the phylogenetic tree, the isolates of Asian genotype segregated into two major clades, the Indian and the SEA lineages (Fig. 2). The Indian lineage merely consisted of CHIKV isolates from India (Fig. 2), and the SEA lineage comprised isolates obtained from CHIKV endemic countries in SEA (Thailand, Indonesia, Malaysia, the Philippines), the newly emerged Cosmopolitan Asian CHIKV (CACV) strains from China, Oceania (New Caledonia and Micronesia) and Caribbean (Saint Martin and British Virgin Island). The TH35 strain isolated from Thailand in 1958 occupied the ancestral node of the SEA lineage in the phylogenetic tree, suggesting that segregation of Indian and SEA lineages occurred earlier than 1958. This is in agreement with our analysis where the Indian and SEA lineages shared the most recent common ancestor (MRCA) that could have circulated in this region dating back to 1953 (95% HDP:1951 to 1954).
Subsequently, deduced amino acid sequences of the SEA Asian CHIKV were examined. Simultaneous with the phylogenetic and geographical analyses, we determined that sequential amino acid substitutions occurred during the evolution of SEA CHIKV lineages. These changes reflected the evolutionary path of the CHIKV in SEA region and the recent emergence in the Caribbean area. The representative informative sites of the sequential amino acid changes and geographic clade specific sites are shown in Fig. 2. Within the SEA lineage, the isolates segregated into clades in similar spatial and temporal patterns as previously reported 22 . Distinct clustering between the Thai and Philippines-Indonesia isolates in the phylogenetic tree was observed. The Thai isolates was denoted group A (SEA mainland isolates); the Philippines-Indonesia isolates and its descendant were denoted group B (SEA Island isolates). The MRCA analysis suggested that this divergence could have been dated back to 1981 (95% HDP: 1980 to 1982). Amino acid substitution from threonine to isoleucine at nsP3 position 413 was first observed in the Philippines-Indonesian clade among isolates from 1983. The substitution was inherited by descendant in group B except NC_2011-568 and Leiv.Chik.1, which possessed valine at the position. The substitution was absent in all of the Thai isolates collected so far (until year 1995), supporting our MRCA estimation that the virus diverged into the SEA mainland (Thai) and SEA island (Philippines-Indonesian) clades started as early as 1980s. Table 1. Relevant amino acid substitutions identified between the Philippines Asian CHIKV isolates collected from different provinces in 2012. Specific amino acid substitutions found in (a) single CHIKV isolate used in the study were denoted with blue box, (b) more than one CHIKV isolates that collected from same location were denoted with red box, (c) more than one CHIKV isolates that collected from multiple locations were denoted with yellow box. ChikV.India.

Continued
The SEA island isolates in clade B continued to evolve and can be further delineated into three major clades, B1, B3 and B3 which corresponded to their distinct spatial and temporal characteristics. The B1 clade represented the old Philippines-Indonesia CHIKV obtained during 1980s, which become extinct. B2 and B3 were the most recent circulating Asian CHIKV strains evolved from the B1 clade. According to the MRCA estimation, B2 and B3 clades shared a common ancestor that had been circulating in the regions since 15 years ago (95% HDP: 1998 to 2000). We identified 22 sequential amino acid substitutions spanning the CHIKV genome that represented the B2/B3 common ancestral strains prior to the predicted divergence in 1999 (Fig. 2). Among the 22 evolutionary associated substitutions, two of them were not inherited in all B2/B3 isolates. The nsP3-77 from serine to threonine was absent in NC-2011-568, Leiv.Chik.1 and 0706Tw, while the nsP3-457 substitution from threonine to isoleucine was absent in isolate 0706Tw. All three isolates were reported to have Indonesian origin.
The B2 clade demonstrated a restricted geographical distribution and only consisted of isolates collected from local outbreaks in Malaysia (2006), New Caledonia (2011), and a single isolate from Russia (traveler back from Indonesia). The B2 clade possessed serine at position 248 of E2 protein and valine at position 486 of nsP2 protein, which are clade-specific amino acid substitutions that differentiated them from other SEA Asian CHIKV isolates. Within the B2 clade, the isolates formed geographical clades where all the Malaysian isolates grouped and formed a sub-clade herein named B2a. While the isolates in common with the Indonesian origin (NC_2011-568 and Leiv.CHIKV.1) clustered within the sub-clade B2b. Both of the isolates possessed valine at nsP3-413, which was a unique signature to B2b sub-clade. Result from MRCA analysis suggested, these two sub-clades diverged in year 2000 (95% HDP: 2000 to 2002). Deletion of seven amino acids (376-382) in nsP3 protein was detected in four out of eight isolates in B2 clade.
The B3 clade was the only cosmopolitan Asian CHIKV clade with isolates found geographically dispersed. We identified four unique molecular signatures of the B3 clade: nsP3-Del 379-382, nsP3-I383T, E2-L248F, E2-V371L that confirmed the emergence of B3 clade from a common ancestor dated to 2003 (95% HDP: 2002 to 2004). All the Philippines CHIKV isolated from the nationwide CHIKV outbreaks in 2012 grouped into this clade. Other isolates are CACV described from China (2012), Micronesia (2013) and the Caribbean region (2013-2014) descended from the common ancestor of the B3 clade viruses. By using the unique amino acid substitutions observed, we further classified the virus into several sub-clades (B3a-d). Indonesian isolates, CHIKV 0706aTW (B3a) rooted at the basal of B3 clade suggested the origin of B3 from Indonesia. However, the clade remained silent after the isolation of 0706aTW until the 2012 Philippines outbreak. Analysis of the genome sequences of 25 B3 clade CHIKV revealed additional three amino acid substitutions: nsP1-A121E, nsP3-A437T, nsP4-F269L that were present in all isolates in B3b-d sub-clades compared to the 0706aTW (B3a). We identified two additional evolutionary-associated amino acid substitutions that allowed differentiation of sub-clades B3b, B3c, and B3d. The substitution nsP3-D483N were present in all B3c/B3d isolates. While the substitution of E3-Q19R was unique to all isolates in sub-clade B3d.
CACV described from China, Micronesia and the Caribbean region clustered spatially with viruses identified from the same geographical location and formed a distinct geographic clade. The geographic  Table 3. Positive selection sites identified by IFEL and MEME methods with p-value ≤ 0.1 or, REL methods with Bayes factor ≥50. The sites with Bayes factor ≥100 were showed in italic while three site (nsP3-77, nsP3-451 and nsP4-81) positive for more than one method were showed in bold.
clades were interspersed within the Philippines monophyletic clade (Fig. 2). The two China isolates (JC2012 and China-sy) with SEA origin (likely the Philippines) clustered together and fell into sub-clade B3c. While the Micronesian and Caribbean isolates clustered into sub-clade B3d. The Micronesian isolates grouped closely with CK12-559 identified from Isabela Island in the Philippines. Whereas, the Caribbean CHIKV (Saint Martin and British Virgin Island) clustered with CK12-545 and CK12-335 identified from Romblon and Misamis Oriental group of islands, respectively. This suggested independent emergence of CHIKV from the Philippines, which then spread into China, Micronesia and the Caribbean region.

Discussion
Rapid CHIKV dissemination driven by Aedes albopictus-adapted IOL lineage was reported in many Aedes albopictus infested countries [22][23][24][25] . Epidemiological evidence suggests that through competitive displacement 26 , the emergence of IOL strain in SEA region may eventually lead to the extinction of the endemic Asian genotype, which has mainly circulated locally in the region over the past six decades. While it was true that most outbreaks involving the Asian genotype CHIKV were limited and localized, the continuous reporting of the virus in Indonesia during 2008 to 2009 and New Caledonia in 2011 points to a resilient presence of the Asian genotype CHIKV in Island SEA and Pacific Oceania regions 3,27-29 .
Resurgence of Asian genotype virus in the Philippines beginning in 2011 accentuated the rise of a new CHIKV epidemic in the country after 20 years of quiet inter-epidemic period 21 . This nationwide outbreak provides additional evidence to support the continuous endemicity of the Asian genotype CHIKV in SEA region. It is noted that unlike other SEA countries, Philippines was not affected by the contagion involving the IOL ECSA virus when the epidemic swept the region in 2012 6,10,11,14,30 . Only a single ECSA isolate was obtained for our study and this was from Davao a major port city in the Southern Philippines nearing East Malaysia which had reported large surge of CHIKV infection during 2010 31,32 . The ongoing CHIKV outbreaks in the Caribbean and Micronesia were caused by Asian CHIKV strains 18 . Our phylogenetic data was in agreement with recent report and indicated the outbreak causing strains were closely related to the Philippines isolates responsible for the 2012 CHIKV outbreaks 18 . In contrast to the geographically restricted feature of previous Asian genotype-associated CHIKV outbreaks, the current Asian genotype CHIKV epidemic/outbreaks demonstrated wide dispersing characteristic that resemble the IOL-associated outbreaks in Indian Ocean area 13 . By incorporating the phylogeographical analysis and microevolution on the virus genome, we suggest a possible scenario contributing to the global spread of the CACV. Deletion of similar regions of the hypervariable C-terminal domain in the nsP3 gene was observed in most of the Asian genotype CHIKV isolated after 2006 33,34 . Considering the phylogenetic analysis and temporal distribution of the SEA isolates, our finding herein suggests that the new SEA CHIKV Asian strains have evolved from a common ancestor descended from an ancestral strain of the old Asian lineage (B1 clade) that possibly diverged during the period of 1998 to 2000 (95% HDP). Acquisition of the clade unique molecular signature in the C-terminal domain of the nsP3 gene may have occurred independently in Malaysia (nsP3 376 -382del , B2 clade) and Indonesia (nsP3 379 -382del , B3 clade). This could occur as a result of genetic convergence arisen from evolutionary adaptation to the local setting 33 . The C-terminal domain is an important determinative factor for optimal virus replication in various host cells in Alphavirus. The nsP3 376 -382del was identified in two out of six CHIKV isolates described from Malaysia in 2006, indicating the possibility of adaptive mutations during the localized outbreak. Prolonged circulation of Asian genotype CHIKV in Indonesia 28,29 and ancestral location of the Indonesian isolate in B3 clade support the Indonesian origin of the B3 clade. This may ascribed an important evolutionary event that leads to transmission and spatial distribution of CHIKV in the SEA region. In addition, the amino acid substitution at position 248 in E2 protein is a clade-specific adaptive mutation that allowed the differentiation of the B2 (serine) and B3 (phenylalanine) clades. CHIKV E2 protein is important for virus attachment. The codon 248 is located at the acid-sensitive region of E2 protein that underwent positive selection (p-value ≤ 0.1). A recent study demonstrated that the mutation of E2-L248Q was beneficial for the dissemination of the virus in Aedes albopictus 35 . Whether the naturally occurred amino acid substitution of E2-248 would have beneficial effect on the viral fitness in Aedes aegpti, the mosquito vector in the Philippines and Caribbean region, warrants further investigation.
Traceable microevolution of the viral genome gives rise to the probable transmission route of virus 13 . Sequence analysis of the Philippines CHIKV revealed the acquisition of two unique adaptive amino acid substitutions, the nsP3-D483N, and E3-Q19R during the Philippines outbreaks. An amino acid substitution of Aspartic acid to Asparagine at nsP3 position 483 was present in all strains in B3 clade except the Philippines CK12-275 and Indonesian 0706aTW CHIKV. Whereas, the amino acid substitution of Glutamine to Arginine at E3 position 19 was concomitant with the nsP3-483N and was observed in all virus isolates from the Philippines except isolates collected from Compostela Valley and Aurora. Besides, an additional amino acid substitution of Proline to Leucine at E1 position 397 was observed in all E3-19R-bearing strains and the two China (JC2012 and China-sy) strains collected in 2012. However, the E1-397L was not a unique adaptive substitution observed in the Philippines outbreaks, as it is present in all Indian lineage Asian CHIKV. In the SEA lineage, the E1-397L was only observed in one of the Malaysian isolate, MY021IMR which was isolated from the 2006 outbreak in Bagan Panchor. Whether this substitution happened convergently in Malaysia and the Philippines remained inconclusive. This observation, however, points to unique sequential adaptive mutations of nsP3-483D/E1-397L/E3-19R that accumulated in the viral genome during the localized outbreaks.
Data from this study showed that the Philippines outbreaks could have begun from Compostela Valley, where the ancestral strain nsP3-483D co-existed with variants, nsP3-483N CHIKV. The variant, nsP3-483N strains spread into Laguna and Aurora prior to the acquisition of E3-19R and E1-397L. The unique amino acid substitution of nsP4-V101A was only observed in strains from Compostela Valley (CK12-684 and CK12-686) and Aurora (CK12-921), but not in Laguna strain (CK12-340) suggests that the spread into Laguna happened earlier than Aurora, which is prior to the acquisition of nsP4-101A. The co-existence of E3-19Q/E1-397P (CK12-340) and E3-19R/E1-397L (CK12-148) CHIKV in Laguna suggests that the acquisition of Arginine in E3 protein and Leucine in E1 protein could have happened by local adaptation. On the other hand, the detection of 'intermediate' strains (JC2012 and China-sy), bearing only nsP4D and E1-397L but not the E3-19Q, suggests the acquisition of E1-397L occurred prior to the E3-19R. These 'intermediate' strains could have circulated transiently in a restricted local setting (perhaps Laguna only) and then spread into China prior to the acquisition of the third amino acid substitution at E3-19Q. Thus far, the nsP3-483N/E1-397L/E3-19R-bearing strains have the most diverse geo-distribution. Rapid nationwide spread in the Philippines (Albay, Isabela, Marinduque, Misamis Oriental, Romblon and Samar Island) and globally (Micronesia and Caribbean) following acquisition of the third amino acid substitution (E3-19R).
Unlike the CHIKV epidemic involving the Asian genotype during the 1970 to 1990, the risk of widespread outbreak remains transitory as the human-Aedes aegypti interaction still lacks outbreak sustainability potential at a local scale, therefore requiring continuous virus importation 36 . Although the B3 clade appeared as early as 2007, high magnitude and rapid dissemination of the virus was only observed during the 2012 CHIKV outbreaks in the Philippines (acquisition of nsP3-483N) and later during its nationwide and global spread (acquisition of E1-397L and E3-19R). Out of the three unique adaptive amino acid substitution, two (nsP3-483 and E1-397) were predicted as positively selected sites. Currently, it is still unclear if these amino acid substitutions at nsP3-483, E1-397 and E3-19 are products of CHIKV adaptation to the Aedes aegypti in local transmission setting. Sequential acquisition of mutations that could have provided beneficial effect to the viral fitness has been previously demonstrated 35 . E3 protein is suggested to stabilize the interaction of E1/E2 protein by clamping the acid sensitive region of E2 in place until furin cleavage 37 . A recent study demonstrated the substitution of E3-S18F stabilized the E2 Scientific RepoRts | 5:12279 | DOi: 10.1038/srep12279 protein with E2-198 35 . Whether, the amino acid substitution of E3-19 exerts the same effect, remains to be investigated. It is possible that these mutations confer the newly emerging CHIKV with the ability to sustain epidemic human-Aedes aegypti transmission cycle.
The microevolution history of CHIKV genome reported herein suggests the Philippines as the possible origin of CACV causing the outbreaks in the Caribbean. Unique amino acid substitutions observed among the CACV suggests multiple independent virus dissemination events contributing to the global spread. While the possible biological importance of these mutations are still unknown, the genetic signatures identified in the study represent interesting candidates for future in-depth study and epidemiological follow-up. Sequencing of additional isolates from the outbreak regions and local reservoirs would allow better delineation of the evolution pattern of this globally emerging Asian CHIKV. Sample preparation, genome sequencing and assembly. All laboratory activities involving the virus isolates was conducted following BSL-2 biosafety practices and procedures in BSL-2 laboratory. Viral RNA was extracted and screened for the CHIKV by using Reverse-transcription PCR (RT-PCR) previously described 6 . The E1 (N = 19) genes of positive sample were subsequently sequenced and analyzed to determine the virus genotype. Isolates identified as Asian genotype (N = 18) were subjected to full genome sequencing using the Ion Torrent sequencing platform (Life Technologies, USA). The raw sequence reads were assembled using the GSNAP algorithms as implemented in Sequencher V5.2.2 38 . Table 1 Selection pressure analysis. The sequence alignment of individual gene (nsP1, nsP2, nsP3, nsP4, C, E3, E2, 6K and E1) for 49 CHIKV were analyzed using HyPhy package 42 implemented in the Datamonkey server 43 as previously described 44 . Prior to analysis, the sequence alignment was checked for duplication of genome sequences, which was removed upon identification. GARD 45 method was used to screen for potential recombination in the dataset prior to the selection pressure analysis. The Parris method 46 was used to assess the selection in the sequence alignment. While the specific site selection on each individual gene was analyzed using SLAC, FEL, IFEL, REL, FUBAR and MEME algorithm. Positive selection was defined as p-value ≤ 0.1 for PARRIS, SLAC, FEL, IFEL, MEME; Bayes factor ≥50 for REL or Posterior probability ≥0.9 for FUBAR.