Characterization of the virome associated with Haemagogus mosquitoes in Trinidad, West Indies

Currently, there are increasing concerns about the possibility of a new epidemic due to emerging reports of Mayaro virus (MAYV) fever outbreaks in areas of South and Central America. Haemagogus mosquitoes, the primary sylvan vectors of MAYV are poorly characterized and a better understanding of the mosquito’s viral transmission dynamics and interactions with MAYV and other microorganisms would be important in devising effective control strategies. In this study, a metatranscriptomic based approach was utilized to determine the prevalence of RNA viruses in field-caught mosquitoes morphologically identified as Haemagogus janthinomys from twelve (12) forest locations in Trinidad, West Indies. Known insect specific viruses including the Phasi Charoen-like and Humaiata-Tubiacanga virus dominated the virome of the mosquitoes throughout sampling locations while other viruses such as the avian leukosis virus, MAYV and several unclassified viruses had a narrower distribution. Additionally, assembled contigs from the Ecclesville location suggests the presence of a unique uncharacterized picorna-like virus. Mapping of RNA sequencing reads to reference mitochondrial sequences of potential feeding host animals showed hits against avian and rodent sequences, which putatively adds to the growing body of evidence of a potentially wide feeding host-range for the Haemagogus mosquito vector.

reads followed by Claxton Bay (203.26 RPM) and Ecclesville (100.06 RPM) ( Table 1). The other sites had relatively low proportions of viral reads (0.90-22.83 RMP) except for Catshill, where no viral read was detected. Generally, the majority of the viral RNA reads mapped to known insect specific viruses which included the Phasi Charoen-like and Humaita-Tubiacanga viruses. Additionally, RNA sequencing reads mapped to several viruses associated with a range of insects and feeding hosts and included some of human health importance. Trinity v2.9.1 assembly of mapped viral reads resulted in a total of 171 viral contigs and 4 unassembled reads ( Table 2).

Known Insect Specific Viruses (ISVs).
The metaviromes (Fig. S1) were dominated by one ISV, the Phasi Charoen-like virus (PCLV), which has been previously reported in Aedes aegypti (Ae. aegypti) mosquitoes 7,27,28 . Overall, PCLV was identified in nine locations (1.65-17 064.28 RPM, mean = 1442.02 RPM) while the Humaiata-Tubiacanga virus (HTV), another well-known ISV, was identified in eight (0.09-710.30 RPM, mean = 59.33 RPM) of the twelve locations sampled (Fig. S1). De novo assembly of the viral reads using Trinity v2.9.1 resulted in 3 to 27 contigs (243-6675 bp) of PCLV per site with similarity levels to reference sequences in GenBank ranging from 95.44 to 99.87%. Phylogenetic analysis of segments of the S (551 bp), L (451 bp) and M (439 bp) regions of representative sequences from the different sites showed most of the local sequences clustering into one main branch for all the segments with high bootstrap support (Fig. 2). However, the sequences for all three segments from the Caroni Bird Sanctuary clustered separately from other Trinidad sequences into well-supported clades (81-99% bootstrap support) including reference sequences (Fig. 2a,c), with segments S and M having closest links to sequences from Grenada. Additionally, the local sequences of segment L from Morne Diablo also separated and formed a separate clade ( Fig. 2c; 100% bootstrap support).
Assembly of the HTV viral reads showed 1-3 contigs per site (243-1998 bp) with similarities to reference sequences in GenBank ranging from 97.80 to 99.13%. Phylogenetic analysis of a 188 bp region of segment 1 from the different sites showed all local sequences clustering separately from four references from Guadeloupe, with five (5) of the Trinidad sequences in one distinct sub-clade with 91% bootstrap support, the Caroni Bird Table 1. RNA read mapping against a reference viral RVDB database, ARB/SILVA 16S rRNA gene sequences and potential animal feeding host reference mitochondrial sequences.

Location
Total number of reads www.nature.com/scientificreports/ Sanctuary and Mamoral sequences in a second strongly supported sub-clade (97% bootstrap support), and the Morne Diablo sequence on a weakly-supported branch (Fig. 2d).

Mosquito-borne viruses.
A relatively small proportion of viral reads (4.80 RPM) were classified as alphaviruses (Togaviridae family), which accounted for approximately 0.0005% of all viral reads from the twelve locations (Fig. S1). Among these, 10 reads (4 from Mamon Village and 6 from Ecclesville at 0.06 and 0.10 RPM, respectively) mapped onto the genome of the MAYV. Trinity assembly of the mapped MAYV viral reads showed the six reads (three read pairs) from Ecclesville assembled into three contigs 166-277 bp long ( Table 3) that  www.nature.com/scientificreports/ aligned to different regions of reference MAYV genomes. However, the four MAYV reads from Mamon Village were unassembled. Blastn analysis revealed, all the contigs and reads had 99.29%-100% similarity to MAYV non-structural (nsp1-3 and nsp4) and structural polyprotein coding regions (Table S1). Most of these sequences had highest similarity to reference South American MAYV sequences, the majority of which were from Venezuela, Bolivia, and Peru (Table 3). Unassembled read M_MAYVF2 from Mamon Village had the highest similarity matches to reference MAYV sequences that originated from Trinidad, and these came from strains isolated in the 1950s, when the first infections by MAYV were reported.

Mosquito-associated viruses.
A relatively small number of reads mapped to reference genomes of mosquito-associated viruses (Fig. S1)  All mosquito-associated viral reads identified in the Chaguaramas (Fig. 1) and the Claxton Bay metavirome (Fig. S1) belonged to the unclassified viruses. The Guadeloupe mosquito virus sequence (Sobemovirus) was the only mosquito-associated sequence identified in the Mamoral location. Assembly of the mapped viral reads Table 2. Identification of viral contigs from Haemagogus mosquitoes collected in Trinidad. Assembly was performed on mapped sequencing reads from the twelve locations and identified using the blastn suite from the NCBI nucleotide collection nt/nr database. a In addition to the 3 MAYV contigs from Ecclesville, there were also 4 unassembled MAYV reads from Mamon village. The complete data set, which includes range of percentage nucleic acid identities and contig information, is shown in Table S1 and sequences are available at https:// zenodo. org/ record/ 49324 69#. YMQbQ KhKi01.  S1a). Furthermore, 1 contig (2978 bp) for the Guadeloupe mosquito virus was identified only in the Mamoral metavirome. Phylogenetic analysis of an 1859 bp region of segment 1 (Fig. S1b) showed the sequence separating from all   (Table S1). Kaiowa and Guato viruses were previously found in mosquitoes from Brazil 29 .
Furthermore, 5 contigs (320-4519 nt) from Ecclesville were unique, with highest similarity levels (94.94-96.94%) to the Atrato picorna-like virus. Phylogenetic analysis of a 350 bp region of the polyprotein segment ( Fig. S1c) resulted in all Ecclesville contigs forming a single clade that diverged from the GenBank reference Atrato picorna-like virus sequence. A relatively large number (33) of contigs (217-1020 nt) from 7 locations (Hollis, Cumana, Mamon Village, Ecclesville, Claxton Bay, Rousillac and Morne Diablo) aligned with high nucleotide identity (99-100%) to the avian leukosis virus strains from China. Maximum likelihood phylogeny of the avian leukosis viral contigs resulted in local sequences separating into two main clades with sequences from Cumana and Hollis clustering together (Fig. S1d) and sequences from Rousillac and Mamon village forming a strongly supported clade. This suggests the local avian leukosis viruses are highly diverse and may include unique lineages as compared to the reference sequences available in the GenBank database. Four contigs (289-442 bp) from the Ecclesville location aligned with 94-96% nucleotide identity to unclassified bat sobemoviruses and luteoviruses (Table S1).
Potential feeding host sequences. Mapping of RNA reads to potential mosquito blood-meal host mitochondrial sequences showed that, although levels were relatively low, there were hits to the avian species, Gallus gallus, at four locations (Claxton Bay-0.61 RPM, Mamon Village-1.85 RPM, Rousillac-0.49 RPM, and Hollis-0.75 RPM) and to the rodent species, Mus musculus, at two locations (Chaguaramas-17.73 RPM and Catshill-3.72 RPM). Assembly of the avian reads resulted in 17 contigs (203-982 nt) with > 99% nucleotide identity to Gallus gallus and assembly of the rodent reads resulted in 25 contigs (240-966 nt) with > 99% identity to Mus musculus (Table S1). The contig sequences are available at: https:// zenodo. org/ record/ 49324 69#. YMQbQ KhKi01.
Bacterial sequences. In all twelve locations, a relatively moderate level of reads (717.28 RPM) mapped to the SILVA bacterial 16S rRNA gene sequence database (Version 132). As seen in Fig. 3, the majority of mapped reads from all locations were classified as Proteobacteria (relative abundance = 96.5-57.3%), and within this phylum, the Gammaproteobacteria class dominated at all locations (relative abundance = 61.6-97.0%) except Chaguaramas (7.4%), Claxton Bay (37.9%) and Rousillac (36.6%). Mapping of the RNA sequences against 301 representative 16S rRNA gene sequences of Wolbachia spp. showed a mapping rate ranging from 0.92 Log10

Discussion
This is the first report on the virome of any Haemagogus mosquito species and provides important baseline data on a range of viruses associated with these mosquitoes. The data shows the presence of nucleic acid sequences from arboviruses of human and animal health significance in Haemagogus mosquitoes found in Trinidad, which may have epidemiological implications for this region. Viral sequences identified include those that belong to the families Bunyaviridae, Togaviridae and Rhabdoviridae, which have been previously reported to be a part of the viromes of other mosquito species [3][4][5][6]11 . The insect specific viruses (ISVs) characterized were generally similar for most of the sampling locations on the island apart from Catshill and Chaguaramas, where no ISV's were identified. PCLV and HTV, the two most common ISVs, have also been identified in similar studies associated with field-caught Ae. aegypti mosquitoes from the Caribbean region as well as in Asia and Australia 7,28,30,31 . Although, the average read depth for the two ISV's were relatively low in comparison to Ae. aegypti from other studies, this is the first report of ISV's identified in Haemagogus mosquito species. Furthermore, the broad distribution of both viruses in Haemagogus mosquitoes collected from the majority of sites was similar to observations of other researchers working with in field caught Ae aegypti mosquitoes 7,30 . This suggests that these ISVs may be naturally abundant and have a broad mosquito host range in the wild. However, there is need for further characterization of ISV's from other mosquito species in the wild. The phylogenetic analyses showed relatively distinct sequences for all PCLV segments from Trinidad, except for the sequences from the Caroni Bird Sanctuary which clustered with the reference sequence from Grenada, an island close to Trinidad. This suggests a broad regional distribution for at least some of the PCLV viral lineages. Maximum likelihood phylogeny also showed sequences of Trinidad HTV as a distinct lineage that was separate from reference sequences from Guadeloupe. Due to the limited sample size and the fact that the study was only limited to the island of Trinidad, it is not possible to conclude whether the distinct lineages of viral sequences found were due to evolutionary processes in an isolated ecosystem. Additionally, several reports have shown that viral heterogeneity can be influenced by vector and host interactions 19,32 but the potential role of these types of interactions on the occurrence of distinct clades of the PCLV and HTV sequences in Trinidad needs to be established.
The RNA sequence data from this study shows that the ISVs found in Haemagogus were similar to those reported in other Culicidae vectors and hence they may be broadly associated with mosquitoes. Furthermore, although this report has revealed ISVs known to be associated with other competent disease vectors, there is need for further characterization of ISV's from other mosquito species in the wild.
MAYV was only identified at very low levels in the Ecclesville and Mamon Village locations. This finding was similar to previous studies which also reported isolation of the virus at low infection rates in pools of field caught Hg. janthinomys mosquitoes from South America [33][34][35] . Ecclesville and Mamon Village sites are in areas known to have primate troops (the Red Howler Monkey, Alouatta macconnelli), which are the main reservoir and host feeding preference reported for Haemagogus mosquitoes in the sylvatic cycle of this emerging alphavirus 13,16,19,35 . Hence, the non-human primates could be the possible source of the MAYV in Haemagogus from these sites. Most of the other sampling locations are not associated with non-human primates, which could have accounted for the absence of MAYV. There is limited information on the prevalence of MAYV in mosquitoes or potential hosts present in the island, but it must be noted that the first report of MAYV infections included four out of five patients who were forest workers in Trinidad, and three of these individuals were stationed at Catshill and Moruga sites that are relatively close to Ecclesville and Mamon Village 36 . However, the possibility of other feeding hosts at Ecclesville and Mamon Village sites being the source of MAYV also cannot be discounted since avian and small mammalian species have also been reported to be associated with MAYV isolation in previous studies 37,38 .
Haemagogus mosquitoes have been shown to be very attracted to humans as a blood feeding source 33,35,39 and have become well adapted to breeding in artificial containers [40][41][42] . The forest peripherals in Mamon Village and Ecclesville are used for housing and agricultural activities 42 where humans may also serve as potential intermediary host when MAYV is in circulation.
RNA sequencing of the Haemagogus mosquito pool from Chaguaramas resulted in detection of unclassified Kaiowa, Guato and Cumbaru viruses (Fig. 1a) that were recently isolated from the salivary glands in Culex and other medically important mosquito species 29,43 . Additionally, other mosquito-associated viral sequences were identified in the Ecclesville location including a picorna-like virus, Merida virus and Hubei-virga like 2 virus. The Merida virus was previously initially isolated from the Yucatan's capital state 44 and Hubei virga like 2 from Hubei region in China 45 which confirms a wide distribution and multiple Culicidae hosts for these viruses.
A unique unclassified picorna-like virus (Ecclesville picorna like virus) was also identified from Ecclesville, with the Atrato picorna-like virus 1 (Table S1) isolated from field Psorophora species along the Columbian river bank 46 as its closest relative. Potential new lineages of the Wuhan mosquito virus 6 were also identified in Ecclesville whose location is ecologically disparate to the Hubei, China location where the virus was initially isolated from Culex mosquito species 47 . The Guadeloupe mosquito virus lineage (Sobemovirus) was also found to be associated with Haemagogus mosquitoes from Mamoral only, which suggests this virus may be regionally distributed and may have multiple mosquito hosts occupying different ecosystems since it was previously identified from urban Aedes aegypti collected from households in Guadeloupe 30 .
The findings of this study also showed the association of other viruses with Haemagogus that have not been reported before in any other mosquito species. The Pepper mild mottle virus was the only plant virus identified in this study and was found in the Claxton Bay Haemagogus mosquito pool. Culicidae mosquitoes are known to www.nature.com/scientificreports/ feed on nectar, plant fluids and fruit sap 48,49 , hence the uptake of plant associated viruses is not surprising and has also been reported in previous metagenomic mosquito studies [5][6][7]50 . The Pepper mild mottle virus is a capsicum virus 51 known to be affiliated with insects such as aphids 49,52 , but this is the first record of the virus being associated with a mosquito species. Similarly, small numbers of viral reads were identified as Black queen cell virus (BQCV) from the Quinam location which may have been due to acquisition from environmental sources. Mosquitoes require carbohydrates as an energy source 53 and nectar and plant secretions contains glucose and fructose sugars which are attractive to mosquitoes. Some Haemagogus mosquitoes may have fed on nectar or sugar exudates from plant flowers that infected bees may have previously fed on, resulting in the mosquito ingesting the virus. This insect associated virus has only been reported to be infective to honeybees around the world, including Australia and parts of South Africa 54,55 . This is the first study reporting the association of the BQCV with Haemagogus or any mosquito species. Further, this virus has not been previously reported in Trinidad and Tobago. The apiary industry has been in decline in the country due to the poor health status of hives 56 and the role of the BQCV in this decline needs to be investigated. The avian leukosis virus 57 was also found throughout several sampling sites. This retrovirus is known to be associated with birds and is vertically transmitted (adult to baby chicks) 58 . Although some strains (Type E) are known to be endogenous in the genomes of birds, the sequences obtained in this study were highly similar to exogenous strains (Type J and RSA), suggesting that the mosquitoes may be acquiring the virus from feeding on host animals. The detection of this virus in Haemagogus from several locations strongly suggests birds as potential feeding hosts for these mosquitoes. Previous studies have shown that a mosquito's blood-meal can provide evidence of potential feeding hosts using RNA-Seq data analysis 7,59 . Birds as potential feeding hosts for Haemagogus is further supported by the fact that bird sequences were the most common hits when RNA reads were mapped against common animals from Trinidad. Birds have also been previously suggested to be feeding hosts for Haemagogus in the MAYV sylvatic cycle 33,60 .
Mus musculus RNA was also identified in Haemagogus samples collected from Chaguaramas and Catshill locations suggesting rodents as an alternative feeding host for these mosquitoes. A previous entomological survey 42 found Haemagogus mosquitoes present in high densities at locations in Trinidad with no known non-human primates, the reported major host of these mosquitoes [13][14][15] . Hence, the data from the current study supports previous suggestion that other animals may be serving as feeding hosts 61,62 . Some of the alternative hosts may also be potentially involved in the MAYV sylvatic cycle or may, in time, adapt to serve as reservoirs, which has major epidemiological implications for MAYV fever. Although MAYV has been detected in non-primate animals including birds, rodents and reptiles 33,37,60,63 , the ability of these animals to serve as strong reservoirs of the virus is not known and currently primates are still considered the primary reservoir 15,16,37 . Future outbreaks can be more widespread if the virus evolves to utilize animals like birds or rodents as efficient reservoirs since these animals are present within or are close to major human population centers. Epidemics can be further exacerbated by the widespread occurrence of Haemagogus mosquitoes, including in areas close to human communities in some countries like Trinidad and Tobago, as was noted by Ali et al. (2019).
This study is also the first report on the microbiome of the Haemagogus mosquito species inferred from the RNA-seq data. Although the Ribo-ZeroTM Magnetic kit (Illumina Inc.) was used to deplete ribosomal RNA, the depletion was evidently incomplete since a significant number of reads mapped to the Silva bacterial 16S rRNA database that showed dominance of the Gammaproteobacteria class within the Proteobacteria phylum. The high prevalence of this bacterial class has similarly been reported in other mosquito species including Culex and Anopheles [64][65][66][67][68] . More importantly, mapping against representative Wolbachia 16S rRNA gene sequences provided strong evidence for the association of this bacterial genus with Haemagogus. The presence of Wolbachia species is known to negatively affect insect vector competence, as was demonstrated for Aedes mosquito species transmission of mosquito-borne pathogens such as the Zika virus and dengue virus 67 . Additional studies are needed to determine if members of this bacterial genus may have similar effects in reducing vector competence of Haemagogus and transmissibility of MAYV and other Haemagogus transmitted arboviruses.
The lack of availability of whole genome sequences for any Haemagogus mosquito species made data analysis challenging since it was not possible to filter out the host genome sequences before analysis of sequences of taxonomic importance. Despite the challenges, this study has added valuable baseline data on Haemagogus spp. and highlights the need for further work on characterizing the virome and microbiome of these medically important mosquitoes.

Methods
Mosquito collection. Adult female mosquitoes were collected in twelve forested areas in Trinidad previously described by Ali et al. 2019 (Fig. 1) during the rainy season over the period June to December 2018 using the human bait adult catching method at ground level 69 . Haemagogus mosquitoes (n = 205) were collected and transported to the lab on dry ice. The specimens were morphologically identified using taxonomic keys 23-25 , pooled (3-5 mosquitoes) and stored at − 80 °C until further use. Mosquitoes with distinctive Hg. janthinomys morphological features were pooled and used for RNA sequencing. Subsequent to RNA sequencing, DNA barcoding analysis of mosquitoes from Trinidad based on sequencing of the mitochondrial cytochrome oxidase I (COI) gene and the ITS2 region showed major sequence variations in these genes that suggest the possibility of the occurrence of a species complex (Ali et al., unpublished data). The mosquitoes analyzed in this study were subsequently referred to as Haemagogus spp.
Nucleic acid preparation and sequencing. Mosquito   . Workflow for bioinformatic analysis for viral, bacteria and feeding host read and contig analysis. All analyses were conducted using the Galaxy server (https:// usega laxy. eu/) except NCBI BLAST and generation of phylogenetic trees.  76 and the filtered reads were then extracted as FASTQ files using SAMtools fastx files v1.9 76 for contig assembly. Trinity v2.9.1 77 was used for de novo assembly of the generated FASTQ reads. Nucleotide similarity of viral contigs to published sequences (nucleotide collection nt/nr) was determined using the blastn suite v2.10.0 78 from NCBI (www. ncbi. nlm. nih. gov). The distribution of viral identified reads was visualized using the Krona tool 79 .
Feeding host read and contig analysis. Trimmed and filtered reads were mapped against 142 sequences of mitochondrial bar-coding genes and three full mitochondrial genomes of 116 animals using the Bowtie 2 software v2.3.4.2 74 (Table S1). The animals included were 27 mammals, 10 reptiles and 79 bird species which are commonly found in Trinidad 84 . The sequences used for mapping are available at https:// zenodo. org/ record/ 49324 69#. YMQbQ KhKi01.
Relative abundance of taxonomic groups. The relative abundance of the different taxonomic groups was expressed in reads per million (RPM) of total number of filtered reads per sample. The RPM was calculated using the formula: Phylogenetic analysis. The phylogenetic relationships of selected viral assembled viral contigs and representative reference sequences from GenBank were determined using the MEGA X 10.0.5 software 85 . The contigs of all coding regions were aligned with the closest reference sequence from GenBank and amino acid sequences generated to confirm accuracy of the nucleotide sequences. Sequences were aligned using Clustal W and trimmed before phylogenetic trees were generated using the Maximum Likelihood method (Kimura 2-parameter substitution model) with 1000 bootstrap replications.
RPM = No. of reads of taxonomic group × 1,000,000 Total number of filtered reads in sample