Background & Summary

Culicoides imicola Kieffer (Diptera: Ceratopogonidae) is a globally widespread species that vectors the agents of many important viral diseases of veterinary importance such as Bluetongue1,2,3, African Horse Sickness (AHS)4,5, and Schmallenberg6. Bluetongue (BT) is a viral disease that affects ruminants and the etiological agent has at least 27 different serotypes7,8,9. Historically, BT was enzootic in tropical regions of the world, but in recent years it has expanded its distribution markedly. The disease has become a concern in areas that experience a temperate climate, particularly in Europe. This expanding disease distribution is mainly facilitated by northward distribution of the infected Culicoides species mainly C. imicola and availability of competent and efficient vectors such as C. obsoletus and C. pulicaris7,10. The 1998 incursion and emergence of bluetongue virus in Southern and Eastern Europe were manly associated with C. imicola, while the 2006 incursion of Northern and Western Europe7 was mainly associated with C. obsoletus and C. pulicaris.

AHS is native to sub-Saharan Africa4. It is an infectious disease considered to be the most lethal viral disease of equines, especially in horses4,11. The recent emergence of the two Culicoides-borne diseases (BT and Schmallenberg) in Europe has raised a concern for the potential introduction and further spread of AHS virus in temperate parts of the world as well11.

Although C. obsoletus and C. pulicaris are considered as main vectors for Schmallenberg, experimental infection on field collected C. imicola provided evidence of high efficiency for Schmallenberg virus infection and transmission by C. imicola as well6. Schmallenberg virus is a very recently emerged virus first identified in North Rhine-Westphalia, Germany, during the summer of 201112 and since then it has spread across Europe causing congenital deformities in the offspring of infected adult ruminants13.

The recent emergence of Culicoides-borne diseases highlights large knowledge gaps on the biology and ecology of the vectors. Since the emergence of BT and Schmallenberg virus, Culicoides surveillance efforts have doubled. Thus, it is important and timely to expand the effort of Guichard et al.14 and update the global Culicoides occurrence record. With these research gaps in mind, this study compiled the global occurrences of C. imicola based on the dataset provided by Guichard et al.14 and literature published since 1st January 2014 and created the largest currently available standardized up-to-date georeferenced global dataset for the vector, containing 1 039 occurrence records.


Literature search and data extraction

PubMed ( was searched using the term ‘Culicoides imicola’ OR ‘Ceratopogonidae’. Automatic inclusion of all pseudonyms in the searches was guaranteed by using the Medical Subject Headings (MeSH) term technology of the PubMed citation archive ( The literature search was last updated on 14th January 2019, which resulted in a collection of 1 920 articles. However, a geo‐database of 649 occurrences of C. imicola compiled by Guichard et al.14 from 65 articles5,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78 covering 1943 to 2010 (1959 to 2014 by publication year) was obtained from the authors. Thus, in this study a literature search for the period 1st January 2014 till 14th January 20194,6,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111 was combined with existing data points obtained from Guichard et al.14 for the period 1959 to 2014. The search retrieved a total of 380 articles published since 1st January 2014 and the titles and abstracts of those articles were screened and those not fitting the criteria: 1) no mention of the vector species; and 2) data from experimental study were removed. After literature searching and initial selections, 150 eligible full-text articles were downloaded and examined in detail to filter those meeting the following criteria: 1) the coordinates of field sites were reported or could be retrieved from Google earth using the reported location information, and 2) occurrence of C. imicola was reported. Therefore, each entry was checked for site coordinates, occurrence of the target species, and other information if available. Subsequently, the geo-location of the vector was extracted from a total of 35 articles meeting all the criteria (Fig. 1). Each article was thoroughly reviewed and all important information was extracted: site location, site name, year of data collection and other information, and confirmed C. imicola occurrences within these articles were entered into the database. Occurrences were classified as confirmed when the article clearly stated the presence of the vector at a specific time in a specific location.

Fig. 1
figure 1

Flow chart of literature search and data extraction.

Geo-coding of data

The occurrence coordinates (longitude and latitude) of C. imicola was extracted from each articles and whenever the coordinates were not provided in the articles or supporting information of the articles, the study site name together with all contextual information as well as alternative spelling of site name was used to determine its coordinates using Google Earth ( When two locations have the same name and different geolocations, both the location name and occurrence coordinates were provided. All data points were then linked to the FAO Global Administrative Unit Layer (GAUL) system ( by using a join attributes by location tool in QGIS Version 3.4 (

Data Records

R (, QGIS (, Mendeley Desktop, and Microsoft Excel were the software packages used to manage, store and analyze the database. The dataset is saved in a comma-delimited (.csv), format and can be imported into a variety of Statistical and GIS software programs. The data records described in this paper are publicly and freely available on Figshare113. There are 2 589 entries (before technical validations) and 1 039 entries (after technical validations) with information in 8 columns (Table 1) in the dataset. The spatial thinning procedure was provided under technical validation section. In the data, the rows represent a single occurrence record (one or more C. imicola occurrences in the same unique location within a single calendar year). The fields contained in the database are described in Table 1.

Table 1 Description of attributes and columns in the dataset.

Technical Validation

To ensure the accuracy and validity of the occurrence records, a technical validation was performed. Firstly, a 5 km × 5 km resolution landcover raster was used to ensure all occurrences were positioned on a valid land pixel. Based on the reported coordinates some sites (n = 96) fell on water bodies. This was probably due to the precision of the longitude and latitude values since these sites were all in peri-coastal locations. Thus, from 2 589 occurrence points 96 were removed from the database.

Further, as the database was compiled from different sources and over many years, it was important to standardize the data entries such that identical locations which may have been geo-positioned slightly differently were given the same unique identifier. The present dataset is heavily clustered in Europe and Southern Africa, with a high degree of aggregation in Spain, Portugal, Italy and South Africa compared to elsewhere. Consequently, it was important to spatially thin the occurrence records. The spatial thinning was performed using R package spThin114 with the use of the following parameters: “thin.par” (the distance between occurrence records in kilometers) and “reps” (the number of times to repeat the thinning process). In the thinning process, the distance (in kilometers) between occurrence records was set to 5 km (meaning if occurrence records lay within the same 5 km × 5 km pixel within a global grid only one record was retained) and the number of times to repeat the thinning process was set at 100. As a result, the 2 493 occurrence points were reduced down to 1 039.

The resulting database consists of 2 589 (before technical validations) and 1 039 (after technical validations) geo-positioned occurrences of C. imicola spanning 50 countries worldwide, disaggregated by continent, region, and country (Table 2). The data before technical validations includes the 96 occurrence points that fell on water bodies as well. In Fig. 2 the global geographical distribution of C. imicola is displayed.

Table 2 Culicoides imicola occurrence records by UN region.
Fig. 2
figure 2

Map of occurrence points for Culicoides imicola.

Usage Notes

The database described here can be used to investigate the spatial and temporal distribution of C. imicola. The data are most appropriate for applications at global and continental scales. It is known that C. imicola and the diseases transmitted by the vector were previously known to be a problem of Africa. However, due to the recent spread of the species to Europe and other parts of the world115,116,117, this data could support improved modelling of new locations at high-risk of experiencing the occurrence of the vector as well as the diseases transmitted by it. The database after technical validations could be used to develop suitability and risk maps at global, continental, and regional scales. On the other hand, for local scale suitability and risk mapping, the database before technical validations could be used.

There are differences in the number of published studies and the availability of occurrence data by continent and region. Continental and regional biases in the density of occurrence records are apparent, and likely reflect differences in the level of surveillance. Due to the recent occurrence of Bluetongue and Schmallenberg viruses in Europe, substantial numbers of surveys have been conducted in Europe, and thus large numbers of recent occurrence records were from Europe. Many occurrence records were also obtained from Southern Africa. From 1 550 points thinned from the database during validations, 1 440 (92.9%) is from Southern Europe and Southern Africa. Thus, researchers using the technically unvalidated database would need to take into account geographical sampling bias.