Mosquitoes of the Maculipennis complex in Northern Italy

The correct identification of mosquito vectors is often hampered by the presence of morphologically indiscernible sibling species. The Maculipennis complex is one of these groups that include both malaria vectors of primary importance and species of low/negligible epidemiological relevance, of which distribution data in Italy are outdated. Our study was aimed at providing an updated distribution of Maculipennis complex in Northern Italy through the sampling and morphological/molecular identification of specimens from five regions. The most abundant species was Anopheles messeae (2032), followed by Anopheles maculipennis s.s. (418), Anopheles atroparvus (28) and Anopheles melanoon (13). Taking advantage of ITS2 barcoding, we were able to finely characterize tested mosquitoes, classifying all the Anopheles messeae specimens as Anopheles daciae, a taxon with debated rank to which we referred as species inquirenda (sp. inq.). The distribution of species was characterized by Ecological Niche Models (ENMs), fed by recorded points of presence. ENMs provided clues on the ecological preferences of the detected species, with An. daciae sp. inq. linked to stable breeding sites and An. maculipennis s.s. more associated to ephemeral breeding sites. We demonstrate that historical Anopheles malaria vectors are still present in Northern Italy.

, although the validity of some of these taxa is still debated. For instance, the species An. messeae was tentatively split into two species (An. messeae and An. daciae) 11 but the hierarchic status of these taxa within the complex is controversial. We overcame this critical issue by referring to these taxa as An. messeae s.s. and An. daciae species inquirenda (sp. inq.) and using An. messeae to mean both taxa. Studies, which started in the 1920s, characterized the distribution of the Italian species 14 : An. labranchiae was present in central and southern Italy, while An. atroparvus was reported in northern regions. Anopheles messeae and An. maculipennis s.s. were recorded along the Italian peninsula, as was An. melanoon that, however, was characterized by a fragmentary presence. Anopheles sacharovi was reported in Italy but it seems to have disappeared following malaria eradication campaigns. Its last recorded date in Northern Italy dates back to 1959 15,16 .
Malaria was endemic in several areas of Italy until the 1950s, when the disease was eradicated following a strong campaign against mosquitoes and changes in people's lifestyles 17 ; the WHO finally declared the country free from malaria in the late 1970s 18 . Where mosquito vectors are present, locally-acquired malaria cases still occur after the arrival of affected persons, as reported in Greece since 2009 19 . Moreover, areas with relevant residual malariogenic potential are still present in Italy 20 and several cases of cryptic malaria were recorded in this country from 1997 until 2017 [21][22][23] . In this scenario, the characterization of species distribution is strongly recommended to support a correct risk assessment 23,24 .
After the eradication of malaria, the interest in anopheline fauna characterization decreased progressively and little data were available to corroborate the historical distribution of malaria vectors in northern Italy. The first aim of this study is to fill this knowledge gap by an extensive field sampling and identifying the Maculipennis complex species present by means of the barcoding technique. Furthermore, we used obtained presence points to model the environmental suitability of the surveyed territory for the different species detected, providing a useful tool for assessing the possible health risk associated with these vectors. The Ecological Niche Models (ENM) obtained provide interesting insights about characterizing different ecological traits of the species complex.

Results
We collected more than 23,000 mosquitoes of the Maculipennis complex from the Po Valley, the widest plain in Italy. A large part of these mosquitoes was collected through attraction traps in fixed sites, which were sampled several times during the favourable season, i.e., between June and September. We used the geometric mean of Maculipennis complex mosquitoes collected per night of sampling in these sites in 2017-18 as a proxy for their abundance (Fig. 1). We molecularly characterized 2490 mosquitoes of the Maculipennis complex: 2031 An.  Barcoding. We obtained 2330 ITS2 sequences (Table 2) which allowed for 93.6% of the achieved identification, confirming the discriminatory power of this marker for species complexes. The remaining 160 specimens were identified by Cytochrome C Oxidase-I (COI) (33) sequencing or by real-time PCR (127). The ITS2 sequences obtained in this study were aligned with homologue sequences of Italian species of the complex available in GenBank and employed to obtain the phylogenetic tree in Fig. 3. The sequences were  Figure S1. Map created using QGIS 3.10 (www. qgis. org); raster layers: Globe DEM (www. ngdc. noaa. gov), EU-Hydro (www. coper nicus. eu). www.nature.com/scientificreports/ grouped into four well-supported clades with conspecific reference sequences, enabling the unambiguous classification of our specimens in the four taxa: An. maculipennis s.s., An. messeae, An. melanoon, An. atroparvus. All ITS2 sequences originated from An. messeae showed the same pattern in two diagnostic positions, with an A in position 362 and a C in position 382 (Table 2), allowing the classification of these specimens as An. daciae sp. inq. 10,25 . All ITS2 sequences were referred to a specific haplotype comprising sequences with ambiguous bases (Table 2), due to the presence of unequal double peaks systematically recorded in specific positions of electropherograms (Supplementary Figure S2). We obtained a univocal haplotype, 292 base pair (bp) long (excluding the 5.8 S and 28 S motifs in the obtained amplicon sequences) for the species An. maculipennis s.s., identical to the available public sequences. We observed polymorphisms among sequences ascribable to An. daciae sp. inq. (305 bp long) An. atroparvus (307 bp) and An. melanoon (302 bp). We observed this variable pattern in three sites of An. daciae sp. inq. sequences, as previously reported 10,25 , in three sites of An. atroparvus and one site in An. melanoon ( Table 2). The latter site was the same already reported as inter-individually polymorphic 26 . We did not observe other differences in obtained sequences falling into the same clade.
Ecological niche modelling. Following the screening of 148 available variables, a set of 34 were selected to be included in the Ecological Niche Models (ENM) ( Table 3). As the number of capture sites for An. atroparvus and An. melanoon were scarce (eight and nine for the two species respectively, Fig. 4a), ENMs were only built for An. daciae sp. inq. and An. maculipennis s.s. Overall performance for the ENMs was good, with Area Under Curve (AUC) values of 0.80 for An. daciae sp. inq., and 0.74 for An. maculipennis s.s. The environmental suitability map suggested higher probabilities of the presence of An. daciae sp. inq. in the proximity of large breeding sites ( Fig. 4): westward near the rice fields in the Piedmont and Lombardy Regions, eastward near the rice fields between the Emilia-Romagna and Veneto regions and close to wetlands. This is also confirmed by the relative contribution of the environmental variables to the ENM (Table 4), with distance to potential breeding sites (Proximity to rice fields, Proximity to water bodies < 1 km 2 , and Proximity to wetlands) providing a cumulative contribution of 41.4%. In addition, covariates related to elevation proved to be a relevant contribution to define the presence/absence of the species (Table 4, Altitude and Slope; cumulative contribution: 15.5%); more suitable areas had an average altitude of 49 m a.s.l. and ranged from 1 to 385 m a.s.l. Proximity to rice fields also resulted in being the covariate with the highest model gain when used alone; moreover, when the factor was omitted, the model had the lowest gain, according to jackknife test of variable importance. This indicates that the covariate provided the most important pieces of information for defining the presence of An. daciae sp. inq.
The most suitable area for An. maculipennis s.s. resulted located in the central and northern part of the Po Valley (Fig. 4). For this species, the contribution of proximity to apparent breeding sites (rice fields, wetlands and www.nature.com/scientificreports/ water bodies < 1 km 2 ) was less important (cumulative contribution: 13.3%), while covariates related to altitude and slope resulted more relevant (27.2% contribution in total) (Table 3). However, soil type also showed a marked contribution in itself (9.5%), with a higher relevance of moisty soils characterized by the likely presence of surface water and a negative influence of arid soils on the model (Supplementary Figure S3) Figure S4). On the contrary, the greater loss of gain in a multi-variable model was observed if Proximity to water bodies < 1 km 2 was omitted which, therefore, appeared to be the factor bearing the most important information that is not captured by the other covariates included in the ENM. The contribution to the models of all employed covariates is reported in Supplementary Table S3. www.nature.com/scientificreports/

Discussion
This study provides detailed data on the distribution of species of the Maculipennis complex in Northern Italy, filling the gap in the knowledge gathered over the last decades. The use of the ITS2 marker allowed the identification of An. daciae sp. inq., An. maculipennis s.s. and the rarer An. melanoon and An. atroparvus, suggesting the disappearance of An. sacharovi in the surveyed area.
While ITS2 sequences ascribable to An. maculipennis s.s. were all identical, we recorded an unexpected variability in some positions of the ITS2 sequences in other species. The ITS2 marker is widely used for low intraspecific variability, but it could be present in hundreds of copies in a single genome 4 , and these copies can differ at individual level for single mutations, insertions or deletions, as extensively reported in mosquitoes 29 . The presence of a single mutation in the same specimen systematically caused a double peak of different intensity in a particular position of the electropherogram. We recorded these multiple peaks in three positions of An. daciae sp. inq., three positions of An. atroparvus and one position of An. melanoon. These variable bases, caused by an intra-individual polymorphism, are not reliable as diagnostic markers for definition at species level. While these particular polymorphisms were already reported in An. messeae, we reported the polymorphic sites in another two species of the Maculipennis complex, An. melanoon and An. atroparvus.
The ITS2 polymorphic sites were used to define new species. The taxon An. daciae was earlier defined through the presence of five polymorphisms in An. messeae from Romania 11 . As observed in our data and demonstrated by cloning 25 , three of these polymorphisms were intra-individual, making only the last two sites reliable for biomolecular identification. In the last two polymorphic sites, our specimens show the same haplotype, strongly suggesting that we faced a single taxonomic entity (species or form) in the study area, namely An. daciae sp. inq.
The taxonomy of An. messeae is puzzling: the existence of two taxa, named A and B chromosomal forms, was already suggested 30 . The identity of form A with An. daciae sp. inq. was then postulated 31 , but the rank of the two taxa, An. daciae/An. messeae or A/B forms, is still debated. The described intragenomic heterogeneity raised doubts on the use of ITS2 as a univocal species marker for the definition of An. daciae sp. inq. 32 . Besides, sequences bearing both variants in the two above-mentioned diagnostic sites were reported 10,31,32 , strongly suggesting some extent of interbreeding between An. daciae sp. inq. and An. messeae s.s. 10 , and unequivocal diagnostic morphological traits were not identified between the two taxa 11 . On the other hand, significantly different relative proportions of chromosomal inversions were observed between A and B forms 33 . In addition, a fixed chromosomal inversion (X1) was observed in An. messeae s.s., in addition to a relevant genetic distance between An. messeae s.s. and An. daciae sp. inq., mainly on the X chromosome 10 . We suggest that a critical review of the An. messeae status, with more robust evidences based on strong genetic and ecological data, is needed before unambiguously splitting An. messeae into two species.
Moreover, the definition of the taxa of the Maculipennis complex is a difficult task, as highlighted by the absence of clear morphological differences and the incomplete reproductive barriers among members of the complex 34 . This is probably due to the recent, or incipient, process of speciation, which pose several taxa of the complex in the "grey zone" 35 that are difficult to characterize. Nevertheless, the definition of taxonomic relationship inside the complex is a stimulating task, because Anopheles complexes are an ideal context to define the boundaries of sibling species [36][37][38][39] .
The elucidation of relations inside this complex is not only a taxonomic curiosity, since the identification of a new species can have strong epidemiological implications. Ecological (as suitable breeding sites) and behavioural (as host preferences) characteristics of different species may affect the capacity to transmit pathogens, such as Plasmodium parasites 5,40 . The "anophelism without malaria" phenomenon demonstrated the epidemiological importance of finely characterizing species in the complex.
Part of the specimens used were obtained by exploiting existing surveillance plans, mainly targeting the WNV, also demonstrating the usefulness of these plans for tasks that are outside of the main scope. Although Table 3. List of screened covariates (Cov.) with reference to those selected (Se.) for implementing the models. www.nature.com/scientificreports/ traps used in WNV surveillance targeting Culex mosquitoes were not expressly studied for trapping Anopheles mosquitoes, they collected a relevant number of the latter. Moreover, we regularly monitored WNV sites for two years, fortnightly over the summer season, thus obtaining comparable data on the abundance of Anopheles from monitored sites. This data demonstrated that mosquitoes of the Maculipennis complex are more abundant in areas rich in large breeding sites (such as rice fields and wetlands).
Even if a comparison with historical data is not easy, due to the different sampling strategies and identification protocols, our data suggest that An. melanoon and An. atroparvus are less widespread than in the past. Only few specimens of An. atroparvus, a historical malaria vector in northern Europe, were collected in the eastern part of the surveyed area, supporting evidence of the scarce presence of the species that seems rarer than in the past, www.nature.com/scientificreports/ particularly in the Po Delta area. The same scenario is described for An. melanoon, recorded in the western part of the monitored area, which seems to have reduced its diffusion when compared to the past 14 .
The presence of the more abundant species An. daciae sp. inq. and An. maculipennis s.s. appeared to be more likely in plain areas, as demonstrated by the importance of altitude and slope in defining environmental suitability. In particular An. maculipennis s.s. seems to favour lower altitudes, although this does not prevent its finding even at higher altitudes 41 . Nevertheless, the two species showed a different geographical distribution. While An. daciae sp. inq. resulted in having a higher chance of occurrence in western and eastern parts of the monitored area where large breeding sites were also reported, An. maculipennis s.s. appeared more diffused in the central section of the study area where larger breeding sites are rare and many of these are transient (Fig. 2). In fact, the two species seem to show different preferences in breeding site typologies. Anopheles daciae sp. inq. was more related to consistent breeding sites such as rice fields, wetlands and sparse water bodies. Anopheles maculipennis s.s. seems to exploit ephemeral breeding sites, as indicated by the relevance in the ENM of slope and soil types characterized by the propensity to host surface waters, then linked to the presence of transient breeding habitats. Furthermore, the results of our ecological niche models suggests that An. maculipennis s.s. might have a higher propensity to occupy areas with denser vegetation, compared to An. daciae sp. inq. These results, obtained by modelling, are consistent with the two species' known preferences, with An. messeae preferring large water bodies with irregular water levels rich in vegetation (as ponds, ditches, natural and artificial pools), and An. maculipennis s.s. occupying small water bodies with scarce vegetation and a widely-fluctuating daily temperature 41,42 .
After the strong campaign of malaria eradication in the post-World War II, the anophelines not represented the target of insecticidal treatments in northern Italy. Anopheline larvae are often killed by mosquito control plans targeting annoying species exploiting the same habitats (e.g., larvicidal treatment with Bacillus thuringiensis var. israelensis in rice fields to limit Aedes caspius) 43 . Nevertheless, the acquired knowledge on ecological preferences could be useful for planning specific treatments. In particular characterization of breeding sites can permit to treat preferentially one species, thus offering the opportunity to select the correct active ingredient and formulation, or to use alternative approaches, e.g. monomolecular film in transient environment, water level regulation or biological control.
The presence of these potential malaria vectors highlights epidemiological importance for an understanding of their distribution, a fundamental prerequisite to evaluate the risk of malaria introduction via Plasmodium carriers. The distribution of potential vectors differed from the past and is constantly changing according to ecological mutations, such as land use modifications and climatic change. The presence of competent vectors, together with favourable conditions, can lead to the occurrence of locally-acquired malaria, as demonstrated in Italy in 1997 21 , and more recently and impressively in Greece with 109 cases classified as locally-acquired from 2009 to 2019 19 . A deadly malaria case that involved a child in Italy in 2017, and which successively proved to be nosocomial, aroused a great deal of public opinion 23,24 . Previous knowledge of the Anopheles fauna would have provided the basic data for a first rapid assessment of the case's origin and a prompt and clear risk communication to the people.
Species of the Maculipennis complex exhibit a different vector competence for Plasmodium species. Furthermore, Plasmodium strains that can be potentially imported are completely different to those that historically circulated in the country and are no longer present. Despite further information on this aspect are needed, experimental infections and xenodiagnoses demonstrated the ability of European An. messeae, An. atroparvus, and An. sacharovi to transmit Asian Plasmodium vivax 44 . While European An. atroparvus, and An. labranchiae seem refractory to African strains of Plasmodium falciparum 22,[45][46][47][48][49] .
In order to assess the risk of locally-acquired malaria cases, these fragmentary data must be corroborated by comprehensive studies to characterize the vectorial competence of local mosquitoes, particularly for new taxa, when they have been described with exotic and potentially importable strains of Plasmodium. All these data, although difficult to obtain, are necessary to improve our preparedness to face potential emergencies. Padano-Veneta), which includes parts of the Piedmont, Lombardy, Emilia-Romagna, Veneto and Friuli Venezia Giulia regions. This area, of about 46,000 km 2 , enumerates more than 20 million inhabitants and is the most densely populated in Italy, with some of the largest Italian cities (Milan, Turin, Bologna, and Venice). This territory is geared to agriculture, characterized by intensive farming and animal husbandry, with few hedges, rare scattered trees and a dense irrigation network. Industrial settlements and residential areas frequently interweave this agricultural environment. Natural areas are rare, mainly represented by river borders, characterized by riparian vegetation, or protected and re-naturalized areas.
The same section of the surveyed area presents widespread presence of breeding sites suitable for Anopheles mosquitoes, such as the rice field areas (2250 km 2 between eastern Piedmont and western Lombardy, 110 km 2 between the Ferrara and Rovigo provinces and 40 km 2 between the Mantua and Verona provinces, besides other smaller areas). These areas were endemic for malaria in the past, particularly in the eastern part bordering the Adriatic Sea, at the Po River Delta. Potential mosquito breeding sites such as restored natural areas, quarries, irrigation channels and natural and artificial wetlands are abundant and sparsely present in the Po Valley.
Entomological collection. Mosquito samples belonging to the Maculipennis complex were collected mainly in 2017 and 2018 by active sampling or attraction traps, in addition to stored samples from previous years (Table S2). A large part of tested mosquitoes was retrieved from entomological monitoring in the frame of the WNV surveillance, in which adult mosquitoes were sampled by traps (CDC-like) baited with dry ice as a source of carbon dioxide in fixed stations of the Po Valley 50 . Mosquitoes derived from entomological samplings performed for other purposes were identified to a lesser extent (e.g., Leishmania infantum surveillance, nuisance monitoring), sometimes sampled by CDC-light traps (15 specimens). Sites surveyed for WNV were sampled fortnightly from June to September for two seasons. The abundance of specimens belonging to the Maculipennis complex was expressed as the geometric mean of mosquitoes in the complex per sampling (Fig. 1). To complete the picture, we actively collected larvae and adult Anopheles mosquitoes in areas where WNV traps were unsuccessful. The survey was carried out by actively collecting larvae by dipping and adult mosquitoes by direct aspiration in resting sites, from May to October. These sites included animal shelters (bovines, equines, goats and poultry) in farms to collect engorged and host-seeking mosquitoes, and uninhabited buildings to collect mosquitoes ready for overwinter. We separated Maculipennis complex specimens from other mosquitoes according to morphological keys 3,42 . Sites and relative methods of sampling are depicted in Fig. 1.
Barcoding and sequence analysis. We selected part of the collected specimens and analysed a maximum of 13 specimens belonging to the Maculipennis complex per sample (namely a collection made in one site in one day). We used the entire bodies or just one leg of these mosquitoes for the biomolecular analysis; in this case, the rest of the body was individually stored in a cryotube at − 80 °C.
We extracted the DNA which was submitted to traditional PCR protocol for amplification of the internal transcribed sequences 2 (ITS2) according to Marinucci et al. 51 ; this PCR amplified the DNA encoding for ITS2 and part of ribosomal 5.8 S and 28 S rRNA subunits. The obtained amplicons were then sequenced and obtained electropherograms were screened with the Lasergene 10.0 software (DNAStar) with default setting to obtain sequences. Part of the samples were identified by sequencing of the Cytochrome C Oxidase-I (COI) marker 52 or real-time PCR using An. maculipennis s.s. MP and An. messeae DMP probes 53 .
We defined all obtained haplotypes, also comprising ambiguous bases, and ascribed to them all the sequences obtained in a Region (Table 2). In order to prevent the incorporation of possible sequencing errors, we set the arbitrary threshold of 1% of conspecific sequences to insert ambiguous bases in one haplotype. These sequences were used in a BLAST search to obtain a homologue sequence of vouchered specimens from different countries. Only the ITS2 part of obtained sequences was aligned with the MAFFT algorithm 54 . We used the alignment to infer a phylogenetic tree by means of the maximum likelihood method implemented in PhyML 3.1 software 55 with 1000 bootstrap replicates. The model TPM2 + G was selected according to the jModelTest2 software 56 . The tree was visualized with ITol software 57 . These representative sequences were deposited in the European Nucleotide Archive (EBI) database under accession numbers from LR898482 to LR898499.
Ecological niche modelling. We used the presence points of the detected species to model the environmental suitability of the different species in the surveyed area according to the maximum entropy approach, using MaxEnt software v3.4.1 58 . This software estimates environmental suitability for a species departing from a set of occurrence locations and gridded covariates, maximising the entropy in geographic space or, in other terms, minimizing relative entropy between covariates 59 . A series of meteorological (such as temperatures and precipitations) and ecological variables (such as soil type and indexes) or covariates, were obtained from different sources (Table 3). Proximity maps were obtained by utilizing vector files for different categories of breeding sites (rice fields, wetlands, sparse water bodies under 1 km 2 of surface and riverbeds).
All covariates were acquired as raster, then rescaled and aligned at 1 km 2 spatial resolution in ASCII format on the surveyed area extent (WGS84 projection) by using QGIS 3.10. We defined the reference raster area by enlarging the boundaries of the plain area by 15 km, represented as the vector file already described 50 .
An explorative analysis was run with all covariates. Those with no contribution in the models, and thus ecologically irrelevant, were excluded from the final analysis. We screened covariates for correlation, selecting those having a Pearson's correlation coefficient of less than 0.7. In order to avoid potential collinearity problems within the set of covariates used in the model, we assessed collinearity with the variance inflation factor (VIF) which is a measure of correlation between pairs of covariates. Correlation and collinearity analysis were performed in R www.nature.com/scientificreports/ version 3.5 and ENM Tools 60 . The set of covariates used in the models is reported in the supplemental material (Supplementary Table S3). Ten replicates were done with the cross-validation run and the cloglog model output grid format. In order to overcome the different sampling efforts in different areas, a bias file was utilized categorizing provinces into three groups (100%, 60% and 40%) according to densities of observations.
Model performance was evaluated using the AUC value, a measure of the model's sensitivity. AUC was used to test the model's performance with real observations in the training area. An AUC value of 0.5 shows that the model predicts randomly, while a value close to 1 indicates optimal model performance. AUC > 0.75 was considered as a good value 58 .
The better covariates' relative contributions to the MaxEnt model were estimated using percent contribution. Covariate contribution was tested by jackknife analysis in the MaxEnt model to get alternate estimates of the most important covariates in the model 61 . Altitude averages and ranges of areas with more suitable conditions were obtained overlapping areas with a host suitability index over 0.75 to the DEM raster. We performed a GIS analysis with QGIS 3.10 software.