Genomic differentiation and intercontinental population structure of mosquito vectors Culex pipiens pipiens and Culex pipiens molestus

Understanding the population structure and mechanisms of taxa diversification is important for organisms responsible for the transmission of human diseases. Two vectors of West Nile virus, Culex pipiens pipiens and Cx. p. molestus, exhibit epidemiologically important behavioral and physiological differences, but the whole-genome divergence between them was unexplored. The goal of this study is to better understand the level of genomic differentiation and population structures of Cx. p. pipiens and Cx. p. molestus from different continents. We sequenced and compared the whole genomes of 40 individual mosquitoes from two locations in Eurasia and two in North America. Principal Component, ADMIXTURE, and neighbor joining analyses of the nuclear genomes identified two major intercontinental, monophyletic clusters of Cx. p. pipiens and Cx. p. molestus. The level of genomic differentiation between the subspecies was uniform along chromosomes. The ADMIXTURE analysis determined signatures of admixture in Cx. p. pipens populations but not in Cx. p. molestus populations. Comparison of mitochondrial genomes among the specimens showed a paraphyletic origin of the major haplogroups between the subspecies but a monophyletic structure between the continents. Thus, our study identified that Cx. p. molestus and Cx. p. pipiens represent different evolutionary units with monophyletic origin that have undergone incipient ecological speciation.


Introduction
The advent of genomics has provided new insights into population divergence between incipient taxa, changing our vision about the mechanisms of adaptation and speciation 1 . Experimental data demonstrate highly heterogenous patterns of population differentiation in various groups of organisms 2 . In addition to the classical model of allopatric speciation, when incipient taxa are isolated geographically 3 , it becomes obvious that ecological speciation or the development of reproductive isolation between populations as a result of adaptation to different environments is feasible and common in nature [4][5][6] . Understanding the population structure and the mechanisms of taxa diversification in the changing environment is extremely important if the studied organisms are related to the transmission of human diseases 7 . Members of the Culex pipiens complex are globally distributed throughout Europe, Asia, America, Africa, and Australia and represent competent vectors of the lymphatic filariasis parasite and encephalitis viruses, including the widely spread West Nile virus [8][9][10][11] . However, despite the fact that Cx. pipiens was the first mosquito species described by C. Linnaeus in his "Systema Naturae" 12 , mosquitoes from the Cx. pipiens complex still represent "one of the major outstanding problems in mosquito taxonomy" because the members of the complex can mate and produce viable progeny in nature 13,14 . Thus, the mechanisms of genetic differentiation in the members of this complex remain poorly understood.
The "Catalog of the Mosquitoes of the World" (Knight, 1978), maintained by the Walter Reed Biosystematics Unit at the Smithsonian Institution (http://www.wrbu.si.edu), recognizes the following species as members of the Cx. pipiens complex: Cx. pipiens, Cx. quinquefasciatus, Cx. australicus, and Cx. globocoxitus. Among these species, Cx. pipiens and Cx. quinquefasciatus spread globally in temperate and tropical/subtropical regions, respectively.
Distribution of Cx. australicus and Cx. globocoxitus is restricted to Australia. In addition to these species, the Cx. pipiens complex includes a subspecies, Cx. p. pallens, found in Japan and the Far East of Eurasia, and two additional members, Cx. p. pipiens and Cx. p. molestus, that, according to this catalog, represent two physiological forms. Indeed, Cx. p. pipiens and Cx. p. molestus exhibit important physiological and ecological differences 8,15 . Cx. p. pipiens mates in open spaces, feeds on birds, and requires a blood meal for oviposition. During the winter, females of Cx. p. pipiens undergo diapause. In contrast, Cx. p. molestus mates in confined spaces, feeds on mammals, can lay eggs without a blood meal, and females of this subspecies cannot enter diapause. Cx. p. molestus is also known as an underground mosquito because it invades basements, sewers, and underground railways. This fact explains why Cx. p. molestus can survive in a cold climate without the ability to enter diapause 8,16 . Since Cx. p. pipiens has strong bias toward feeding on birds, while Cx. p. molestus readily bites humans and other mammals, hybrids between the two forms can act as a bridge vector between the bird reservoirs and susceptible mammalian hosts [17][18][19] . Thus, understanding the abilities of Cx. p. pipiens and Cx. p. molestus to produce viable progeny in natural populations is of medical importance.
Historically, P. Forskal described Cx. molestus as a separate species from Egyptian specimens in 1775, but, later, this species was synonymized with Cx. pipiens 13

Results
Mosquito collections. In this study, we resequenced whole genomes of 40 individual mosquitoes from 4 different locations-two from Eurasia and two from North America (Table 1, Fig. 1). All mosquitoes were collected from the urban environment. Cx. p. molestus mosquitoes were collected at the larval stage from an underground water source in two locations (the Kyrgyz Republic and the Republic of Belarus). Cx. p. pipiens were also collected at the larval stage in an open water reservoir in the Kyrgyz Republic, but at the adult stage in the basement of a multifloor building in the Republic of Belarus. All field collected specimens from Eurasia were identified as Cx. pipiens by larval morphology and then tested to subspecies level using the COI assay, which is based on SNPs in a specific position 25 . Mosquitoes from the Washington, D.C., USA samples were collected as egg rafts from two different urban environments-an underground parking lot and an open water reservoir. Both egg collections were used to establish mosquito colonies that were kept in the laboratory for several generations. At the larval stage, mosquitoes from both collections were identified morphologically as Cx. pipiens. However, molecular assays based on COI SNP 25 and polymorphism in the flanking region of a microsatellite locus 26 failed to identify them at the subspecies level. The latter method was considered unreliable for the diagnosis of Cx. p. molestus and Cx. p. pipiens in California 27 . In addition, we performed a physiological assay and identified that mosquitoes from the parking lot colony were able to lay eggs without a blood meal. Mosquitoes from the open water collection 5 needed a blood meal for egg development. Thus, we called our samples from Washington, D.C, USA Cx. pipiens autogenetic (WPA) and Cx. pipiens anautogenic (WPN). In addition, we used two colonies of Cx. p. molestus and Cx. p. pipiens that were derived from the Chicago area in the USA 23 .
Nuclear genome analysis. After the resequencing the 40 individual genomes, we performed neighbor joining analysis. The Cx. quinquefasciatus genome assembly 24,28 was used as a reference genome and an outgroup. The analysis identified two major monophyletic clusters that segregated in a subspecies-specific manner with high bootstrap support (Fig. 2). Cx. p. molestus samples, the admixture levels were higher in autogenic strains overall than in anautogenic strains. Interestingly, the signature of admixture in all Cx. p. molestus samples was very low or even absent, suggesting restricted gene flow from Cx. p. pipiens to Cx. p. molestus. At a higher level of clustering (K=6-9), the local populations of the subspecies formed distant well-defined clusters without high overlap, especially for Cx. p. molestus. Thus, we concluded that both autogenic and anautogenic samples from Washington, D.C. represent Cx. p. pipiens or Cx. p.
pipiens -Cx. quinqufasciatus hybrids with some admixture of Cx. p. molestus, which is more pronounced in autogenic samples.
To get more insight into the nature of genomic differentiation between subspecies/populations, we calculated genome-wide pairwise Fst values and found that they varied between 0.08 and 0.25 and were highly significant for all comparisons (Fig. 5). The most diverged group was Cx. p. molestus from the Chicago strain, which can be explained by the very low genetic diversity of this strain caused by its longtime cultivation and possible bottleneck, which accelerated the genetic drift (Fig. 7). Fst analysis along the chromosomes demonstrated relatively uniform patterns of genetic differentiation across the genome without clear signs of "islands of divergence" (Fig. 6), which can indicate infrequent hybridization in nature and/or prezygotic instead of postzygotic reproductive isolation as major mechanisms of the subspecies differentiation. The Fst levels dropped around centromeres probably because a small number of markers are present in these regions due to a highly repetitive sequence composition. Overall, Fst values were higher between the subspecies than between autogenic and anautogenic Cx. p. pipiens from Washington, D.C. The level of genomic diversity did not differ significantly between the subspecies demonstrating an overall trend of slightly lower genetic diversity in Cx.
p. molestus; however, this was not significant in pairwise comparisons except for the cultivated colonies from Chicago, IL (Fig. 7), which demonstrated a strong depletion of genetic diversity in Cx. p. molestus.

Mitochondrial genome analysis.
In addition, to nuclear genome comparisons, we performed neighboring joining tree analyses using almost complete mitochondrial genomes for all 40 specimens recovered from the whole genome sequencing data. The patterns of phylogenetic structure were very different from the nuclear counterpart (Fig. 8) and showed the paraphyletic 7 origin of the major haplogroups among the subspecies but the monophyletic structure between the continents. The mitochondrial genome of Cx. quinquefasciatus grouped inside the American haplogroup along with the samples from Washington, D.C. and did not demonstrate any significant divergence. Samples of Cx. p. molestus from the Republic of Belarus and the Kyrgyz Republic formed a diverged monophyletic haplogroup. Interestingly, two samples of Cx. p.
pipiens from the Republic of Belarus represented a very diverged haplogroup, which may be reminiscent of the past species diversity or has been introgressed from other diverged populations/species. We specifically checked the quality of alignment and calling for this haplogroup but did not identify any abnormalities. Overall, we concluded that mitochondrial genome comparison does not reflect the true evolutionary history of the subspecies/species, probably due to multiple introgression events.

Discussion
Independent monophyletic origin of Culex pipiens pipiens and Culex pipiens molestus. Two alternative hypotheses have been proposed to explain the differences in ecological and physiological strategies of Cx. p. pipiens and Cx. p. molestus 18 . One hypothesis explains such differentiation as a rapid shift in physiological and behavior traits as an adaptation to an underground environment that is associated with human activities 21 . In this scenario, the local populations of Cx. p. molestus must be closely related to the local populations of Cx. p. pipiens 29 (Fig. 2). The presence of an additional hybrid cluster formed by autogenic and anautogenic field collected specimens from Washington, D.C., USA more likely reflects the existence of a Cx. pipiens -Cx. quinquefasciatus hybrid zone in this area 15 . However, based on genetic distances, this cluster is much closer related to the Eurasian Cx. p.
pipiens rather than to the Cx. p. molestus from Eurasia and the USA (Chicago, IL). Additionally, PCA (Fig. 3) and ADMIXTURE (Fig. 4)  pairwise Fst values were highly significant for all the comparisons (Fig. 5). We did not identify any specific region of high divergence in the genome (Fig. 6). Overall, Fst values were much higher between Cx. p. molestus and Cx. p. pipiens than between autogenic and anautogenic samples of Cx. p. pipiens from Washington, D.C. The most diverged group, based on all conducted analyses, was the strain of Cx. p. molestus from Chicago, IL, which can be explained by the very low genetic diversity of this strain caused by its longtime colonization and possible bottleneck, which accelerated genetic drift (Fig. 7).
Similar observations about the independent monophyletic origin of Cx. p. molestus and Cx. p. pipiens were obtained using microsatellite analysis in ~600 samples of Cx. pipiens mosquitoes derived from different world-wide locations 18  Thus, the independent monophyletic origin and high level of genetic divergence between Cx. p. molestus and Cx. p. pipiens suggest that these two members of the Cx. pipiens complex represent distinct phylogenetic entities with independent evolutionary histories prior to humanmediated translocation.
Concepts of speciation and evolution of the Culex pipiens complex. Theoretical models of speciation in animals could be subdivided into two major groups: allopatric or geographical speciation and sympatric or ecological speciation. The first concept of speciation, which was intensively promoted by E. Mayr 3 , suggests that incipient taxa first become isolated geographically. This situation reduces the gene flow between the populations and may lead to accumulation of mutations that cause genetic incompatibility among the hybrids. An alternative concept of speciation emphasizes ecological barriers between the emerging taxa as major drivers of evolution 30 . This scenario considers the development of reproductive isolation between the populations as a result of adaptation to different environments without geographical isolation, which usually takes place in the face of gene flow. In this situation, the hybrids between incipient taxa become less fit to the environment that promotes natural selection of any traits that reduce mating between them. We think that overall diversification of the Cx. p. pipiens and Cx. p. molestus subspecies represents a striking example of speciation through isolation-by-ecology mechanisms. In our study, the Fst analysis determined significant levels of genomic divergence between Cx. p. pipiens and Cx. p. molestus ( Fig. 5 and 6) across the entire genome without clear islands of speciation. Surprisingly, the levels of differentiation were lower around the centromeres, probably due to the low number of reliable SNVs in these highly repetitive regions.
The differentiation was extremely high between the Chicago, IL, USA strains of Cx. p. pipiens and Cx. p. molestus, but was lower between the subspecies in the Eurasian mosquito collections.
Overall, these observations suggest significant restriction of gene flow between the subspecies.
Several mechanisms of reproductive isolation between Cx. p. pipiens and Cx. p. molestus have been described 16 . Two of them are prezygotic acting, before fertilization, and reduce the opportunities for mosquito mating. The first mechanism is related to habitat specialization of the larvae: Cx. p. molestus occupies basements or other underground environments, but Cx. p. pipiens prefers open aboveground water bodies for breeding sites. This reduces chances for the two subspecies to meet and mate at the adult stages. The second isolating mechanism relies on the differences in mating behavior between Cx. p. molestus and Cx. p. pipiens. Cx. p. molestus males usually form homogeneous swarms near the ground and require limited space for mating 16 . In contrast, Cx. p. pipiens males swarm near the foliage about 2-3 m above the ground.
Experimental studies of mating behavior in small cages indicate that in crosses of females and males of Cx. p. molestus copulation success was 90% but in Cx. p. pipiens it was only 3.3% 31 . In inter-subspecies crosses between Cx. p. molestus and Cx. p. pipiens, the copulation success was also low and varied between 6.6% to 10% depending on the sexes of the subspecies. This study demonstrated that females of both subspecies actively avoid copulation with males from an alternative subspecies. Moreover, Cx. p. pipiens females were unsuccessful in receiving sperm from Cx. p. molestus and, as a result, produced no eggs.
Two other reproductive isolating mechanisms described for Cx. p. pipiens and Cx. p. molestus are postzygotic, they act after the mating and result in decreased fitness of the hybrids.
One of the mechanisms is related to the inheritance of diapause in hybrids of Cx. p. molestus and Cx. p. pipiens as a recessive trait 8 . F1 hybrids and a significant portion of F2 hybrids are unable to develop diapause and cannot survive under winter conditions. This mechanism, perhaps, could explain the higher introgression levels in Cx. p. pipiens in southern locations 18,22,23 . Finally, the members of the Cx. pipiens complex are exposed to cytoplasmic incompatability of hybrids infected with different strains the rickettsial parasite Wolbachia pipientis. Despite cytoplasmic introgression of this parasite through hybridization between the members of the Cx. pipiens complex 32 , cytoplasmic incompatibility could significantly limit survival rates of the hybrids.
For example, W. pipientis infection significantly reduced hybridization between Cx. pipiens and

Cx. quinquefasciatus in South Africa 33 . Another study demonstrated that in Eurasian populations
Cx. p. molestus was only infected by one strain of W. pipientis but Cx. p. pipiens by two different strains 34 . Moreover, that the specimens of Cx. p. molestus and Cx. p. pipiens, which we used in our study, were infected with the same strains of W. pipientis in the south of the Kyrgyz Republic but by different strains in the north of the Republic of Belarus. Thus, it may explain the differences in introgression from Cx. p. molestus to Cx. p. pipiens that was more pronounced in the Republic of Belarus than in the Kyrgyz Republic. In fact, an interesting example of infectious speciation was described in the South American Drosophila paulistorum complex. In this complex, six semi-species with overlapping geographic distribution became reproductively isolated as a result of premating and postmating isolation triggered by the Wolbachiae infection 35 .
Recent genomic studies conducted on different organisms including Drosophila simulans 36 , Rhagoletis fruit flies 37,38 , and Heliconius butterflies 39,40 provide additional evidence that ecological speciation occurs in nature. The genomic patterns of speciation can be very different 2 ranging from small genomic islands of speciation 39 to significant levels of divergence across the entire genome. Widespread genomic divergence was identified between the incipient species Anopheles gambiae and An. coluzzii in the An. gambiae complex 41 . These species were originally identified by differences in the structure of their ribosomal DNA as S and M forms 42 , but later their taxonomic status was elevated to the species level 43 . An. gambiae and An. coluzzii are believed to develop reproductive barriers in sympatry as a result of the differences in their ecological preferences 44 . The An. gambiae larval stage is associated with small rain pools. In contrast, An. coluzzi exploits persistent water reservoirs associated with rice cultivation.
Although, premating barriers between the species are incomplete 45 , they developed differences in their swarming behavior 46,47 and divergent song types 48 .
Thus, a high level of genome-wide divergence, a striking difference in adaptation to distinct ecological environments and evidence of prezygotic and postzygotic barriers for mating suggest that Cx. p. pipiens and Cx. p. molestus represent distinct ecological units that undergo incipient ecological speciation.
Hybridization in the Culex pipiens complex. The most intriguing observation of the members of the Cx. pipiens complex is that, despite differences in ecology, physiology, behavior, and geographic distribution, they still can hybridize and produce viable progeny in nature, indicating that reproductive isolation between them is not complete. Our study also demonstrated significant hybridization events between the subspecies Cx. p. molestus and Cx. p. pipiens.
Whole genome comparison indicated that most of the Cx. p. pipiens samples represent individuals with some level of introgression from Cx. p. molestus (Fig. 4). We observed high discrepancy between the nuclear and mitochondrial phylogenies (Fig. 2, 8), which indicates that, historically, the transmission of mitochondrial genomes can happen between the subspecies. At the same time, we did not observe haplogroups shared between the local populations of the subspecies. In the continents, all the mitochondrial phylogenies were strongly monophyletic, which points to male-mediated dispersal and hybridization. The admixture with Cx. p. molestus where ~40% of all samples were identified as hybrids between Cx. p. molestus and Cx. p. pipiens 18 . Thus, overall hybridization rates between the members of Cx. pipiens were higher in North America than in the Old World. For comparison, hybridization between cryptic species in the An.
gambiae complex (An. gambiae and An. coluzzi) significantly varied between 1% in Mali 45 to >20% in Guinea Bissau 51 , which is comparable to overall hybridization levels between Cx. p. molestus and Cx. p. pipiens.
Intriguingly, in our study, we did not determine a Cx. p. pipiens admixture signature in any Cx. p. molestus samples from the Republic of Belarus, the Kyrgyz Republic, and Chicago, IL, USA (Fig. 4). These findings demonstrate very limited or no gene flow from Cx. p. pipiens to Cx. p. molestus. Similar findings of asymmetric introgression from Cx. p. molestus to Cx. p. pipiens was shown by previous studies 19, 50 . The mechanism of asymmetric introgression is currently unknown. One hypothesis suggests that males of Cx. p. molestus, which can mate in confined spaces, can hybridize with both Cx. p. molestus and Cx. p. pipiens females. In contrast, Cx. p. pipiens males, which require space for swarming, have a higher disposition to mate with Cx. p. pipiens females 50 . However, more population genetic and experimental studies are needed to explain this phenomenon.
Finally, our study demonstrated that Cx. p. pipiens can develop autogeny as a result of adaptive introgression of the genetic material from Cx. p. molestus. In the field collected samples 13 from aboveground and underground environments in Washington, D.C., USA, we selected mosquitoes for autogeny. The underground mosquitoes were autogenic but the aboveground mosquitoes were anautogenic. However, mosquitoes from both autogenic and anautogenic colonies formed a single cluster when neighbor joining analysis was applied (Fig. 2). Moreover, PCA (Fig. 3) and ADMIXTURE (Fig. 4) approaches cluster these samples together with Cx. p.
pipiens from other locations. Populations with mixed characteristics were found in Europe 50 and in the USA 52 . In Portugal, an unusual pattern of blood feeding behavior on birds was also found in Cx. p. molestus 53 .
Thus, the presence of ongoing hybridization between members of the Cx. pipiens complex suggests that the speciation process between them is not complete and postzygotic barriers of reproductive isolation are not fully formed. Overall, we believe that members of the Cx. pipiens complex represent a remarkable model for studying different aspects of geographical and ecological speciation in the face of ongoing gene flow between them and local adaptations to diverse environments.

Materials and Methods
Mosquito collections. In this study, we used field collected mosquitoes from Minsk, Republic of Belarus, Mailuu-suu, Kyrgyz Republic, and Washington, D.C., USA (Table 1 The raw VCF file was filtered with vcftools 57 to remove all variants with a quality less than 500 (--minQ 500), more than two alleles per position, and were monomorphic with missing genotypes in more than four individuals.
To call mitochondrial DNA (mtDNA) variants, the reads were aligned to the nearly complete mitochondrial genome of Cx. quinquefasciatus with BWA-MEM and only unique, properly paired alignments with the highest quality (60) were left for the sorting and deduplication steps. The mitochondrial variants were called using the bcftools consensus algorithm (bcftools call -c) 56 .

Population genetic analysis.
To reduce the influence of the linkage between neighboring loci on population genetic analysis, we removed all loci located closer than 10 Kbp from each other using vcftools (--thin 10000). The resulting pruned dataset was used to compute the Principal Component Analysis (PCA) using SNPrelate software 58  To construct neighbor-joining trees for the autosomal and mtDNA datasets, we used RapidNJ software 62 with K2P distance 63 and 1000 bootstrap replications.

Data Availability
Genomic sequencing data will be available via NCBI upon the publication of the manuscript.