Extensive new Anopheles cryptic species involved in human malaria transmission in western Kenya

A thorough understanding of malaria vector species composition and their bionomic characteristics is crucial to devise effective and efficient vector control interventions to reduce malaria transmission. It has been well documented in Africa that malaria interventions in the past decade have resulted in major changes in species composition from endophilic Anopheles gambiae to exophilic An. arabiensis. However, the role of cryptic rare mosquito species in malaria transmission is not well known. This study examined the species composition and distribution, with a particular focus on malaria transmission potential of novel, uncharacterized Anopheles cryptic species in western Kenya. Phylogenetic analysis based on ITS2 and COX1 genes revealed 21 Anopheles mosquito species, including two previously unreported novel species. Unusually high rates of Plasmodium sporozoite infections were detected in An. funestus, An. gambiae and eight cryptic rare species. Plasmodium falciparum, P. malariae and P. ovale sporozoite infections were identified with large proportion of mixed species infections in these vectors. This study, for the first time, reports extensive new Anopheles cryptic species involved in the malaria transmission in western Kenya. These findings underscore the importance of non-common Anopheles species in malaria transmission and the need to target them in routine vector control and surveillance efforts.

Vector control tools, mainly long-lasting insecticide-treated nets (LLIN), indoor residual spraying (IRS), and larval source management (LSM), are key components of malaria control strategies 1 . The massive scale-up of LLINs and IRS in recent years for malaria vector control has led to major changes in biting behaviour, host preference, breeding range, vectorial capacity or vector competence of mosquito vector species in Africa 2 . Vector species-specific differences in ecology and behaviour could substantially affect both malaria transmission and the success of vector control strategies. Accurate identification of malaria vector species and their distribution and bionomics is crucial to devise efficient vector control interventions.
In Africa, most malaria entomological studies focus on the two major vector species complexes Anopheles gambiae sensu lato (s.l.), and An. funestus s.l. [3][4][5][6][7][8] . Such studies usually rely on morphological mosquito species identification with keys that could result in morphological misidentifications or discarding unexpected or unknown cryptic sibling species (or rare species) 9 . Cryptic rare species are defined as groups of closely related, but genetically isolated species that are difficult or impossible to distinguish by morphological traits 10 . They are very common in the Anopheles genus 9,[11][12][13][14][15][16][17] . For examples, in southern Africa, Lobo et al. reported 12 of 18 molecularly identified species (including 7 newly identified cryptic species) carrying Plasmodium sporozoites using PCR. In West Africa, a cryptic subgroup of An. gambiae complex, namely GOUNDRY, which exhibits outdoor resting behaviour, was found to be highly susceptible to Plasmodium falciparum infection 18,19 . Additionally, a new species namely An. fontenillei in the An. gambiae complex was recently discovered in the forested areas of Gabon, Central Africa 20 . In the western Kenya highlands, 17 species of Anopheles (including 9 cryptic species) were identified, and nearly half of the cryptic species carried P. falciparum parasite DNA 12,14,21 . These studies indicated that many cryptic Anopheles species may play an important role in malaria transmission across Africa.
Morphologically indistinguishable cryptic sibling species of mosquitoes are challenges to malaria control programs as a result of incomplete understanding of their biology and their role in malaria transmission. Recent advancements in molecular tools, however, offer unique opportunities for accurate and comprehensive analysis of vector-host-parasite interactions. A number of DNA-based molecular tools have been used to identify cryptic species, including PCR amplification or sequencing of the extrachromosomal mitochondrial DNA (mtDNA) and the ribosomal RNA (rRNA) genes. The rRNA genes have been a preferred target because they are well-studied gene family, and the sequences of certain domains of the genes are very highly conserved among species. Sequencing of rRNA genes has become an important tool for systematic studies of highly diverged taxa 11 . The intergenic spacer (IGS) and internal transcribed spacer 2 (ITS2) region of rRNA gene have been commonly used for development of PCR-based species diagnostic assays and sequence-based cryptic species identifications 9,13,14,16,17,[22][23][24][25] . The intraspecies ITS2 sequence variation of species in An. gambiae complex ranged from 0.07 to 0.43%, whereas interspecies variation ranged between 0.4 and 1.6% 26 . Mitochondrial DNA fragments from the cytochrome oxidase subunit 1 (COX1) gene have been used as DNA barcodes to identify mosquito species [27][28][29] . However, molecular species identifications based on COX1 alone may have challenges for discriminating between closely related species or species within a complex 30 . Indeed, a recent study indicated that ITS2 provides better resolution than COX1 to differentiate An. arabiensis specimens from other An. gambiae complex specimens in eastern Ethiopia 31 . Accurate identification of diverse vector species is crucial to devise a tailor-made vector intervention for malaria control based on their specific ecology and behaviour. This is particularly critical as African countries anticipate to achieve malaria elimination from many endemic areas by 2030 32 .
To understand the role of cryptic rare vector species in malaria transmission, the present study examined the composition, distribution, and bionomics of Anopheles species in western Kenya. The study also determined Plasmodium sporozoite infection status of cryptic rare Anopheles species in highland and lowland settings of western Kenya. The multiplex-PCR and species-specific PCR methods were used to identify major vector species and blood meal sources. The ITS2 and the COX1 loci were sequenced and analyzed for cryptic rare Anopheles species. Highly sensitive real-time PCR approach was used to detect Plasmodium infections in mosquito vectors.

Results
overview of molecular determination of Anopheles species. Out of the 3556 Anopheles mosquitoes, 87.1% (3099/3556) were determined by species-specific PCRs or multiplex-PCRs and sequencing as major species An. gambiae sensu stricto (hereafter referred to as An. gambiae) (1440), An. arabiensis (718), and An. funestus sensu stricto (hereafter referred to as An. funestus) (941) in the five study sites (Fig. 1, Table 1, Supplementary Fig. S1). A subset of 21 randomly selected individuals from each major species identified by PCRs were confirmed by ITS2 sequencing based on similarity (> 98%) to the sequences of anopheline voucher species retrieved from NCBI GenBank database ( Supplementary Fig. S2).
The remaining 457 collected anophelines (12.9%) were classified into 18 rare species groups based on ITS2 sequence homology. Except for two species groups (An. sp. 18 and An. sp.19), the ITS2 sequences of all the species were identified as different species based on their similarity (> 98%) to the sequences of Anopheles voucher species retrieved from NCBI GenBank database ( Supplementary Fig. S2). The ITS2 sequences of two species could not match with similarity > 98% threshold to reference anopheline sequences or known vector species in GenBank databases, suggesting the existence of novel cryptic species.
Pairwise comparison of ITS2 sequence similarities of the 21 Anopheles species indicated that except for one pair with 98.5% identity between An. gambiae and An. arabiensis, all pairs showed a similarity of 90% or less with confirmed species classifications (Supplementary Table S1). Phylogenetic tree analysis indicated that the  www.nature.com/scientificreports/ (0.6%), a highland site. An. arabiensis, was observed in high proportion in lowland areas (Homa Bay: 70.8% and Kombewa: 24.9%) than in highland areas, which ranged from 1.8% (Emutete) to 5.2% (Iguhu). Seventeen of 18 rare species were identified in the highland areas, whereas only six rare species were detected in the lowland areas, suggesting that cryptic species might be more related to the sympatric An. gambiae than An. arabiensis. In lowland sites, the most abundant rare anopheline species was An. sp.15 (n = 17), followed by An. rufipes (n = 14) and An. cf.rivulorum (n = 14), whereas multiple rare species (such as An. christyi, An. sp.1, and An. sp.17) were identified in the highlands (Fig. 1B, Table 1).

Molecular determination of Anopheles mosquito host blood meal source.
A total of 1,372 bloodfed female mosquitoes from 16 Anopheles species were successfully genotyped for blood meal sources (Table 4). Of these, 41.6% females were identified to have had human blood meals, 53.6% of mosquito blood meals were identified as bovine, whereas the remained 4.8% individuals had blood meals originating from other animals, e.g., goat, pig, and dog. For the major vector species, the highest human blood index was found in An. funestus (0.72), followed by An. gambiae (0.51), whereas very few samples of An. arabiensis had human blood meals (Pearson's Chi-squared test: χ 2 = 532.4, df = 8, p < 0.0001). The majority (> 90%) of An. arabiensis blood meals originated from cows (Fig. 3). Similar patterns of blood meal source were observed in the highlands and lowlands. Human blood meal sources were detected in a total of ten rare Anopheles species, including eight rare www.nature.com/scientificreports/ www.nature.com/scientificreports/ species from highland and two from lowland. The blood meal source of the other three rare species (An. leesoni, An. maculipalpis, and An. pretoriensis) were identified as bovine (Table 4).

Discussion
This study, for the first time, reported eight rare Anopheles species with Plasmodium sporozoites and human blood meal sources in western Kenya. Although An. funestus, An. gambiae and An. arabiensis remained as the primary malaria vectors in the region, the presence of 18 cryptic rare Anopheles species-half of them being incriminated as malaria vectors-may have serious implications for malaria control program in western Kenya. Mosquito population modification using the innovative vector control techniques, such as gene drives or Wolbachia transfection, which rely on mating of mosquitoes to spread, will be unlikely to spread into other reproductively isolated cryptic species except the one (s) in which the tool was introduced. Presence of such diversified vector species coupled with Plasmodium-infected outdoor mosquitoes may challenge vector interventions and contribute to malaria transmission stability. Understanding breeding habitats, biting seasonality, resting behaviour and vector competence of these rare vector species is critically required in order to target them in routine malaria control programs. Phylogenetic tree analysis showed that the seven recently identified and less documented cryptic rare Anopheles species along with An. funestus s.l. complex belong to Myzomyia series group. Among these, four of them (An. sp.1, An. sp.6, An. sp9, and An. sp.17) were positive for human malaria parasites. Similar findings were reported previously that identified new sibling cryptic Anopheles species in western Kenya highlands 13,14 . However, www.nature.com/scientificreports/ Plasmodium infections of the rare species An. sp.11 and An. sp.15 are probably the first time being reported in the present study, although these two new species were previously reported without parasite detections 14 . These two rare species were classified into Cellia and Myzorhynchus series groups, respectively, which encompasses potential vectors, such as An. pharoensis and An. coustani 9,24 . The present study also for the first time reported two other rare species An. sp. 18 and An. sp.19 as novel species, since their sequence data were not found in the GenBank database. However, possibly due to a low number of mosquitoes collected, Plasmodium infections were not detected in these two species, and no host blood meal sources were found for the samples. An unusually high number of malaria sporozoite rates in rare Anopheles species, including An. sp.1 (2/69) and An. sp.17 (3/56), suggested their potential role in malaria transmission in western Kenya. Additionally, detecting Plasmodium infections from outdoor mosquito collections indicates the presence of outdoor transmission that may not be addressed by indoor-based vector interventions. Further study is however required to characterize breeding habitats, vectorial capacities and biting behaviours of these rare species. Future studies should also investigate environmental factors in the highlands of western Kenya that led to presence of a diverse Anopheles species. The potential role of vector competition in the highlands and possible increased availability of breeding sites due to climate change require further study. A high Plasmodium sporozoite rate (nearly 10% or more) for the primary malaria vector species, An. funestus and An. gambiae, indicated a high level of malaria transmission due to these species in both the highlands and lowlands of western Kenya. Sporozoite rates reported in the present study were over two-fold higher than that of a previous report in western Kenya 33 , probably due to the use of highly sensitive molecular techniques 34 and multiple parasite species examinations in our study. Similar high sporozoite rates were previously reported for An. funestus from elsewhere 9 . These findings generally confirm that An. funestus and An. gambiae, remain to be the primary malaria vector species in western Kenya. Although these two species are primarily endophagic, the detection of Plasmodium infected mosquitoes outdoors might suggest certain behavioural change in their ecology due to continuous use of indoor-based interventions, as reported in neighboring Tanzania 35 . Prolonged and wide-spread use of LLINs could, thereby, favour traits such as biting outdoors or early in the evening. In the southwest Pacific, a study found that high levels of malaria transmission have been maintained by An. farauti, a malaria vector that altered its behaviour to blood-feed early in the evening and outdoors and, thereby, avoiding exposure to the insecticides used in IRS 36 . The potential occurrence of such behavioural change in western Kenya requires further study. On the other hand, high sporozoite rate does not necessarily translate to high malaria transmission since disease transmissibility is affected by a number of genetic and environmental factors that determine vector competence 37 . The observed high species diversity in the highland areas may be the result of an increase in new clonal niches that potentially conquered new breeding habitats in the highlands. Unlike An. funestus and An. gambiae, An. arabiensis was a dominant vector species in the lowlands of Homa Bay than highlands of western Kenya. This species is a common vector of malaria in the lowlands of East Africa 38,39 . A low human blood index and low sporozoite rate of An. arabiensis in the lowlands indicate that this species primarily tends to feed on non-human blood meal sources, mainly bovine, as reported in Ethiopia 40 . Previous study has suggested that climate change may push An. arabiensis to encroach and adapt to the highlands of East Africa 41 . This study also highlighted the level of misidentifications and misassignments while using morphological keys for species identification. For instance, there were only four species identified using the standard morphological key 42 , while molecular assay revealed the presence of 21 species-suggesting that many Anopheles species might be morphologically indistinguishable. The level of misidentification was extremely lower in major species (4-7%) compared to rare species (100%). A number of factors such as level of training and mosquito sample quality determine proper identification of mosquitoes using morphological keys. For instance, morphological identification of An. coustani is not complicated due to its unique and conspicuous white patch on its hind legs 42 . Yet, our data indicated that a considerable number of this species was misidentified-suggesting the need for additional training to improve the skill of field entomologists. Overall, species identification using morphological key is affected by a number of factors such as: (1) the quality of the specimen (damaged or intact specimen) when collecting mosquitoes using CDC-LT; (2) morphologically indistinguishable sibling mosquito species; (3) low level of skills on use of morphological keys for species identification by field entomologists. The present study indicated that standard morphological identifications were more reliable for the major species than rare species 21 .
In summary, the present study demonstrated the importance of new Anopheles rare species in malaria transmission in western Kenya. Molecular tools help accurately characterize the Anopheles vector species in malaria endemic areas. For the first time, eight out of 18 rare Anopheles species with Plasmodium sporozoites and human blood source were reported in western Kenya, suggesting their secondary role in malaria transmission. Presence of diversified malaria vector species in the highlands might be the consequence of climate change that potentially created new ecological niches for rare vector species, although this requires further investigations. Future studies need to identify and investigate the breeding habitats, and resting and feeding behaviours of these rare vector species in order to devise appropriate vector control strategies that can target all malaria vector species in the region and advance malaria elimination in Africa.

Methods ethics statement.
Ethical approval for the study was obtained from the institutional scientific ethical review board of University of California, Irvine, USA and Maseno University, Kenya. Permission was sought from the chief of each study site. Written informed consent was obtained from heads of the households, and individuals who were willing to participate in the study. All methods used in this study were performed in accordance with the relevant guidelines and regulations. www.nature.com/scientificreports/ Study sites and sample collections. Malaria vector surveillance was conducted in three highland sites (elevation 1500-2300 m) and two lowland sites (1050-1500 m) in western Kenya (Fig. 1). Malaria transmission in highland regions was traditionally regarded as mesoendemic-hyperendemic, unstable, and limited by low temperature, whereas malaria transmission in lowland regions is holoendemic and year around 43,44 22 . For those specimens morphologically identified to An. funestus, single PCR was conducted to confirm species using the species-specific primers (ITS2A/FUN) in the internal transcribed spacer region (ITS2) on the ribosomal DNA as described by Koekemoer et al. 49 . For the other species and those specimens which failed to amplify by PCR for clear species-specific band or had nonspecific or weak amplifications bands 50,51 using the species-specific primers, additional two PCR amplifications were performed to further identify morphologically misidentified specimens using the primers UN/GA/AR and ITS2A/FUN, respectively.
DnA sequence-based determination of the rare species. For the specimens which failed to be identified by PCR, and a subset of 21 randomly selected individuals from each major species, additional PCR amplification and DNA sequencing of the ITS2 region of nuclear ribosomal DNA and the COX1 gene were performed using the primer pair ITS2A (TGT GAA CTG CAG GAC ACA T) and ITS2B (TAT GCT TAA ATT CAG GGG GT) for ITS2 25  Multiplexed PCR-based blood meal identification. Host blood meal identification of fed mosquitoes was conducted using the multiplexed PCR-based methods as described by Kent et al. 29  Multiplexed quantitative pcR (qpcR) assay for Plasmodium infections. The DNA extracted from mosquito head-thoracic portion were used for qPCR identification of Plasmodium sporozoite infection. An multiplexed real-time qPCR assay was performed by using the published species-specific 18 s ribosomal RNA probes and primers for Plasmodium falciparum and P. malariae 53 and P. ovale 54  www.nature.com/scientificreports/ Beverly, MA), 0.5 µl of each probe (2 µM), 0.4 µl of each forward primers (10 µM), 0.4 µl of each reverse primers (10 µM) and 0.1 µl of double-distilled water. The following temperature profile was applied: hold stage at 50 °C for 2 min and 95 °C for 2 min, followed by 45 cycles of PCR amplification stage at 95 °C for 3 s and 60 °C for 30 s. The standard curve of positive control containing Plasmid DNA (MRA-177 for P. falciprium, MRA-179 for P. malariae, and MRA-180 for P. ovale) from BEI Resources (https ://www.beire sourc es.org) was included in each qPCR plate run with 3 negative controls.
Statistical analysis. The CodonCode Aligner 9.0.1 (CodonCode Corporation, Centerville, MA) was used to check the sequence quality and trim low-quality bases. BioEdit software 55 and MView web-based tool 56 were used to conduct the alignment of the sequences and to calculate pairwise sequence identity and similarity from multiple sequence alignments. A threshold limit of 98% sequence similarity for ITS2 was used to classify sequences into species groups 9 . The consensus sequences within group were compared to the NCBI nr/nt database (https ://blast .ncbi.nlm.nih.gov/Blast .cgi). Sequence groups were assigned into known taxa or voucher specimen based on similarity to a voucher specimen sequence at 98% threshold value. The haplotypes of COX1 sequences were compared to both the NCBI nr/nt database and BOLD database (https ://www.barco dingl ife. org) 29 . Phylogenetic analyses were performed using Maximum Likelihood (ML) algorithms with the General Time Reversible model for ITS2 consensus sequences of species groups and UPGMA with the Kimura 2-parameter model for COX1 haplotypes in the MEGA version 7.0 57 . The tree nodes were evaluated by bootstrap analysis for 1000 replicates and rooted for COX1 haplotypes using a sequence from Aedes aegypti (GenBank: AF390098). The diversity index of Shannon and Simpson, as well as the Simpson's evenness and dominance index were used to assess Anopheles diversity within and between sites. All these indices were calculated using PAST 4.0 software package 58 . To compare diversity between highland and lowland, a t-test was used to determine whether they were significantly different 59 .
Human blood index (HBI) is defined as the proportion of freshly fed mosquitoes containing human blood and calculated as described by Garrett-Jones 60 . Mixed blood meals were added to the number of each host blood meals when calculating the HBI separately. Sporozoite rate was estimated by the proportion of the number of positive individuals divided by total number of tested individuals with 95% confidence interval 61 . Statistical analyses were conducted using SAS JMP 14.0 software (SAS Inc., Cary, NC).