Species delimitation and mitonuclear discordance within a species complex of biting midges

The inability to distinguish between species can be a serious problem in groups responsible for pathogen transmission. Culicoides biting midges transmit many pathogenic agents infecting wildlife and livestock. In North America, the C. variipennis species complex contains three currently recognized species, only one of which is a known vector, but limited species-specific characters have hindered vector surveillance. Here, genomic data were used to investigate population structure and genetic differentiation within this species complex. Single nucleotide polymorphism data were generated for 206 individuals originating from 17 locations throughout the United States and Canada. Clustering analyses suggest the occurrence of two additional cryptic species within this complex. All five species were significantly differentiated in both sympatry and allopatry. Evidence of hybridization was detected in three different species pairings indicating incomplete reproductive isolation. Additionally, COI sequences were used to identify the hybrid parentage of these individuals, which illuminated discordance between the divergence of the mitochondrial and nuclear datasets.

Speciation is a dynamic evolutionary process through which populations segregate into independently evolving lineages over time 1 . When gene flow is restricted either through geographic, behavioral, or ecological isolation, the accumulation of genetic changes, through selection or local genetic drift, may lead to divergence and potentially reproductive isolation [2][3][4][5][6] . Thus, the amount of genetic differentiation and level of gene flow between closely related lineages can be used to evaluate the strength of this isolation and determine species status 7 . Depending on the completeness of the speciation process between lineages, it can be challenging to unambiguously identify species 8 . Shallow divergence and hybridization can mask both morphological and genetic differences. While the most accurate assumptions about species delimitation are derived from a multifaceted approach 9,10 , genomic data has become a powerful tool to investigate species boundaries 11,12 . Both substantial and fine-scale genetic divergence is being uncovered across many study systems, even in the absence of morphological variation [13][14][15][16] . Species delimitation is especially important when working with organisms responsible for pathogen transmission, as misidentifications will lead to inaccurate vector surveillance data. Culicoides Latreille (Diptera: Ceratopogonidae) biting midges are responsible for the transmission of many pathogens worldwide 17,18 , including bluetongue virus (BTV) and epizootic hemorrhagic disease virus (EHDV). These viruses can cause severe symptoms and death in wild and domestic ungulates and are responsible for substantial economic losses globally 19,20 .
In North America, one of the main BTV and EHDV vectors is Culicoides sonorensis Wirth and Jones 20 , which belongs to the C. variipennis species complex. When originally described, this group consisted of five subspecies 21 ; though presently, three distinct species are recognized (C. occidentalis Wirth and Jones, C. sonorensis, and C. variipennis (Coquillet)) with C. albertensis Wirth and Jones and C. australis Wirth and Jones designated as synonyms of C. sonorensis 22 . Despite the current taxonomic arrangement, species identification remains difficult due to very subtle morphological differences and overall genetic similarities 23,24 . Additionally, the presence of cryptic species or hybridization within the complex could be further complicating proper species identification. Under laboratory conditions, C. sonorensis and C. occidentalis have been shown to hybridize 25 , and both C. occidentalis Inter-and intra-species population genetics. The geographic distributions of these clusters closely align with the distributions of the species (then subspecies) described in Wirth & Jones (1957) (Fig. 1), and recent morphological analyses of individuals from this study supports the species level designation of these clusters 41 . For the remainder of the manuscript, we will refer to each cluster by its corresponding species name. Culicoides occidentalis was located in Western North America, C. sonorensis in the Western and Southern United States (U.S.), C. albertensis in the Midwest U.S. and Ontario, C. variipennis in the Eastern U.S. and Ontario, and a fifth genetic group suggesting the occurrence of an additional, undescribed cryptic species in San Diego, CA. Notably, eight of the 17 sites had more than one species in sympatry, and one site had three species. At four sites, seven individuals were assigned to two genetic groups with an assignment score of ~ 50% for three individuals (scores = 45, 47 and 41%) and of ~ 25% for four individuals (scores = 34, 31, 25 and 24%), which suggests the occurrence of putative F1 or other types of hybrids (e.g., F2 or backcrosses). Interestingly, these hybrids were from three different species pairings (C. sonorensis X C. occidentalis; C. sonorensis X C. variipennis; and C. albertensis X C. variipennis). These hybrid individuals also stood out in the PCA, as they segregated between their parental clusters (Fig. 2a), as well as at the base of each parental branch in the phylogenetic tree (Fig. S3). In addition to these seven hybrids, 20 individuals had a secondary assignment score between 3 and 21%, signifying potential introgression between those pairings. However, we are less confident in STRU CTU RE's ability to identify this level of ancestry as other factors can also lead to mixed assignments.
The seven putative hybrids were excluded from the dataset used to calculate the intraspecies summary statistics (rearranged by cluster), which resulted in the isolation of 566 SNPs after more stringent filtering was applied. The mean F ST between species was 0.7147 (0.6541-0.7470), roughly 9 times higher than the mean F ST between the populations (i.e., localities) within each species (see below; Tables 2 & S1). Similarly, both the aR and LKC values of intra-individual genetic distance show a low level of divergence/high level of similarity within each species, including the San Diego population (Table S2). The four species-specific datasets were used to calculate the interspecies summary statistics as well as test for isolation by distance (IBD). These datasets contained 22 individuals of C. albertensis from four sites (3423 SNPs), 36 individuals of C. occidentalis from four sites (2714 SNPs), 97 individuals of C. sonorensis from seven sites (2357 SNPs), and 29 individuals of C. variipennis from four sites (2960 SNPs). The expected and observed heterozygosity, F IS , and number of private alleles for each species are reported in Table S2. No species-level dataset was created for the San Diego species, as this species was uncovered in a single locality. www.nature.com/scientificreports/ When examining each species individually, C. albertensis, had no evidence of population structure (K = 1), and had low genetic differentiation among locations (mean F ST = 0.054) ( Fig. 3a; Table 2). Although there does seem to be a pattern of IBD, this was found to not be significant in this species (Mantel test, P = 0.238; Partial Mantel test, P = 0.714). The low number of locations sampled potentially limits the statistical power of these correlations. The results obtained for C. occidentalis showed much more divergence compared to the other species, with populations being strongly differentiated from each other (mean F ST = 0.411) ( Table 2). Additionally, the fastSTRU CTU RE analysis suggested that each location of C. occidentalis sampled is distinct (K = 4) (Fig. 3b). While no IBD was found (Mantel test, P = 0.489; Partial Mantel test, P = 0.770), there seems to be a considerable amount of geographic isolation among populations of this species, with pairwise F ST values ranging from 0.14 to 0.70 (Table S4). Additionally, significant levels of dissimilarity were found between individuals from California and those from the other three populations (Fig. S4). In contrast, low genetic differentiation among locations were found for C. sonorensis (mean F ST = 0.029), with varying levels of support for IBD in this species (Mantel test, P = 0.039; Partial Mantel test, P = 0.082) ( Fig. 3c; Table 2). For this reason, the individuals from Colorado were combined into a single location, as were the individuals from Kansas. The fastSTRU CTU RE analysis suggested the occurrence of population structure in C. sonorensis (K = 2), with some individuals from Kansas belonging to a distinct group, though these were not highly divergent from any other C. sonorensis location (Table S4). Individuals of C. variipennis exhibited no evidence of population structure (K = 1) or of IBD (Mantel test, P = 0.587; Partial Mantel test, P = 0.125) (Fig. 3d). Consistently, almost no genetic differentiation was found among locations of this species (mean F ST = 0.026) ( Table 2).
Haplotype network. In total, 285 midges were included in the analysis of a 546 bp region of the COI gene. Four distinct haplogroups were identified with substantial genetic divergence between groups (p-distance = 2.99-3.30%) and little divergence within groups (p-distance = 0.25-0.86%; Fig. 4; Table 3). Consistent with the SNP datasets, C. occidentalis formed a divergent haplogroup, separated from the rest of its range. The mean percent divergence between the two C. occidentalis groups (2.99%) was similar to its divergence from the other species (3.01-3.30%). The San Diego population also clustered as a distinct group, with a similar level of Culicoides v. albertensis Culicoides v. occidentalis Culicoides v. sonorensis www.nature.com/scientificreports/ divergence from the other species (3.01-3.03%). Interestingly, C. albertensis, C. sonorensis, and C. variipennis were not separated from each other, and in some cases, C. albertensis and C. variipennis shared identical haplotypes ( Fig. 4, S5). Furthermore, these three species exhibit a mean percent divergence between individuals (0.80%) similar to the divergence observed among individuals within C. occidentalis (Table 3). Other than the grouping of C. occidentalis in California, there was no geographic clustering observed.

Discussion
Our study provides valuable insights into the population genetics of the C. variipennis species complex and highlights the presence of potential cryptic species. For most of the species examined, minimal genetic divergence was observed across locations, suggesting the maintenance of gene flow even over large geographic distances. The only exception was C. occidentalis, which showed a high level of geographic isolation, as well as two distinct COI haplogroups. We confirmed that mitochondrial data is not reliable to differentiate three out of the five species, due to the lack of segregation between the mitochondrial haplotypes of C. albertensis, C. sonorensis, and C. variipennis. This stands in stark contrast to their clear differentiation and high level of divergence inferred from the SNP data. Though a substantial amount of divergence exists between all five species, low levels of hybridization, and potentially introgression, are present in sympatric populations. While we do not know the fitness of these hybrids, but this could suggest that strong post-zygotic isolation barriers may have yet to evolve in this group. Thus, pre-zygotic isolation through either ecological or behavioral segregation is a possible mechanism maintaining divergence within this complex. With a considerable amount of geographic overlap between some species (Fig. 1), each sympatric population is potentially experiencing a set of unique selective pressures to maintain species boundaries. The high degree of genetic differentiation between clusters inferred by the SNP data supports the current species groupings of the C. variipennis complex (C. occidentalis, C. sonorensis, and C. variipennis), as well as raising C. albertensis and a cryptic species in San Diego, California to species status (Fig. S5). While this putative new species was only collected in San Diego, its true distribution could extend well into Mexico. While clear divergence was observed in the SNP data, the mitochondrial data showed a different pattern of divergence. Culicoides albertensis, C. sonorensis, and C. variipennis have a considerable amount of genome-wide differentiation (Fig. 1); however, there was no clear differentiation of the COI gene (Fig. 4). In fact, several individuals of C. albertensis and C. variipennis shared identical haplotypes. Multiple studies have shown a high degree of genetic similarity in mtDNA between C. sonorensis and C. variipennis 23,24,42 , though it was proposed that this was due to misidentifications. As all of the individuals included in our mitochondrial haplotype analysis from the current study were identified to species using the SNP data, this lack of mitochondrial separation must have an underlying biological cause. This finding can result from historical introgression with "leaky" pre-zygotic isolation, or semipermeable species boundaries, which have been shown to produce mitochondrial introgression without detectable nuclear DNA introgression in some taxa 43,44 . This is likely due to the fact that the mitochondrial genome is independent of the nuclear genome and thus unlinked to the genes contributing to reproductive isolation 45 . However, we cannot  Hybrids (h1-h7) are designated with a black circle and their inferred parental ancestry is depicted with pie charts. The geographic locations of the two C. occidentalis clusters are labeled next to each grouping (see Table 1 for abbreviation). (b) Unrooted maximum likelihood phylogenetic tree based on 199 individuals inferred from 3612 SNPs (the hybrids were removed here but are included in Fig. S3.). Clade colors also represent the clusters inferred from the structure analysis. Support values written on the branches: rapid bootstrap (%)/SH-aLRT support (%)/ultrafast bootstrap support (%). For clarity, the values within each cluster are not shown. www.nature.com/scientificreports/ rule out that other possibilities could have caused this discordance, such as recent speciation and incomplete lineage sorting or selection 13,46 . Regardless, it appears that the evolution of the mitochondrial genome is not congruent with the species tree of the C. variipennis complex. Notably, the SNP phylogenetic tree shows that C. occidentalis, and not C. sonorensis, is the sister taxa of C. albertensis and C. variipennis (Fig. S5). This suggests that the mtDNA similarity between C. albertensis, C. sonorensis and C. variipennis could stem from ongoing hybridization and introgression, rather than incomplete lineage sorting. Little to no IBD or structure was found within C. albertensis, C. sonorensis, and C. variipennis indicating a substantial amount of connectivity among localities of these species (Fig. 3a,c,d). The number of populations inferred by fastSTRU CTU RE for C. sonorensis was K = 2; however, a mean pairwise F ST of 0.0287 suggests that a high amount of gene flow exists between all locations. This could be an artifact of the propensity of delta K inferring two populations 47 or from a high level of relatedness among individuals from KS (Fig. S6). Interestingly, although no IBD was found in C. occidentalis, each location of this species clustered as a distinct population (Fig. 3b). The lack of IBD is therefore not indicative of a single, genetically homogeneous population, but rather stems from high levels of divergence between populations regardless of their geographic distances. Focusing sampling efforts on each of these species will surely permit robust landscape genetic approaches to gain understanding of the evolutionary forces driving population structure in this group. This will allow studying whether Culicoides population structure is characterized by uniform or discontinuous isolation by distance, as well as, isolation by adaptation (IBA) or by environment (IBE) 48 .
The strong genetic divergence between the C. occidentalis from California and the other populations was observed in both the SNP and mtDNA data (Tables 2, 3, Fig. 4). It is possible that this may represent a further cryptic species with a dispersal barrier created by the Sierra-Nevada mountain range (Fig. S4). This high level of differentiation within C. occidentalis could be due to geographic isolation alone; however, endosymbionts have also been shown to significantly increase mitochondrial diversity in the presence of geographic structure 49,50 . Naturally occurring endosymbionts have been found in Culicoides midges, including C. sonorensis 51,52 , and recently, a Cardinium sp. was linked to mitochondrial divergence in C. imicola 53 . Further screening is needed to determine the diversity and abundance of endosymbionts infecting Culicoides midges, though the possibility remains that they could be playing a role in the phylogeographical structure of C. occidentalis if they are causing incompatibility between populations. Additionally, patchiness of the specialized larval habitat of C. occidentalis, not present in the other members of the C. variipennis complex, could create isolation between populations, as well as reduce the number of individuals within each population. A small effective population size with little to no immigration would allow for a strong effect from drift 54 Table 2).
Similar to other species of Culicoides 30,32,33,55 , high values of the inbreeding coefficient (F IS ) were observed in all species investigated in this study (Table S2). Although these previous studies have suggested that the observed high F IS are an artifact from a large number of null alleles, the consistent reporting of these findings across various species using several types of molecular markers lends support to the hypothesis that high inbreeding has a biological origin in this genus. High levels of inbreeding and heterozygote deficiencies are common among mosquitoes [56][57][58] , even when using markers with a low level of null alleles 59,60 . Goubert et al. (2016) considered the typical Aedes albopictus population as "a network of interconnected breeding sites, each with a high level of inbreeding". Although we cannot rule out all other possibilities, our results strongly suggest that some aspects of the reproductive biology of Culicoides induce inbreeding within populations. High F IS and low F ST between populations can stem from high levels of migration between populations (i.e., homogenizing allele frequencies at large scale), followed by matings with close relatives within populations (i.e., increasing homozygosity without altering allelic frequencies). It is also possible that our sampling approaches (single night trapping) led to capturing cohorts of closely related individuals.
Low levels of hybridization were found in some sympatric populations involving several different species pairings. Under laboratory conditions, mating between C. sonorensis and C. occidentalis can produce viable offspring for at least six generations, though the hatch rate of the progeny is dependent on the species of the mother 25 . A cross of female C. sonorensis and male C. occidentalis only yields a 7% hatch rate whereas the reciprocal cross yields a 75% hatch rate. This asymmetrical hybrid viability is likely caused by cytonuclear incompatibility 61,62 , though endosymbionts have also been shown to cause reproductive incompatibility 63 . Upon secondary contact of closely related species, and in the absence of post-zygotic reproductive isolation, the production of unfit hybrids can induce the rapid evolution of premating barriers 2,64-66 . In most populations however, C. sonorensis females are unlikely to come across C. occidentalis males due to differences in mating behavior. Conversely, C. occidentalis females do come into contact with C. sonorensis males, who do not appear to have mate discrimination 67 , and will likely attempt to mate with these heterospecific females. As there are demographic disparities (population size and structure) between these two species, as well as viable offspring produced from this cross, rampant hybridization and asymmetric introgression would be detrimental to C. occidentalis 68 . Strong selection against hybridization can maintain species boundaries, but as two of the ten C. occidentalis collected from Borax Lake in California (CABL) appeared to be F1 hybrids (Fig. 1), another mechanism, potentially differences in the larval habitat or mating behavior, appears to be limiting directional introgression from C. sonorensis.
Culicoides occidentalis females lay their eggs in highly saline environments (up to 88.0 parts per thousand (ppt)) 69 , whereas C. sonorensis eggs will not hatch in water with salinity over 20.0 ppt 70 . However, ecological exclusion via the larval habitat would only limit introgression if the hybrids were inviable in highly saline environments, which does not appear to be the case. The difference in mating behavior between these two species may be a more likely mechanism by which the detrimental effects of hybridization are diminished. Culicoides occidentalis mates at the larval habitat while C. sonorensis mates at or near a host 22 www.nature.com/scientificreports/ with a C. sonorensis male, she would return to the high saline pools to lay her eggs and these hybrid offspring would have a high chance of only backcrossing within the C. occidentalis lineage. While only two C. occidentalis x C. sonorensis hybrids were tested in this study, both had C. occidentalis mothers (Fig. 4), providing evidence that this scenario takes place in nature. However, this type of isolation would not explain how C. sonorensis and C. variipennis maintain species boundaries in sympatry as they share a larval habitat. Further studies are needed to determine the mechanisms behind reproductive isolation within this group. The C. variipennis complex is one of many vector groups in which species delimitation can be challenging 46,[72][73][74][75][76] ; however, species identification is an integral part of vector surveillance. The species status of these group members has implications for vector surveillance, as any ambiguity in identification will lead to unreliable data. For example, while C. albertensis and C. sonorensis occur in sympatry, only C. sonorensis is reported as a vector species 77 . The addition of the non-vector species when conducting serological surveys could lead to a severe underestimation of the infection rate within the vector species. As BTV and EHDV are expanding northward into eastern Canada 78 , it has been suggested that the dispersal of C. sonorensis into new areas could be to blame for this incursion 42 . Specimens assigned to C. sonorensis by Jewiss-Gaines et al. (2017) were included in the present study and cluster instead with C. albertensis ("ON", Fig. 1). Thus, there are likely alternative reasons for the range expansion of these viruses, including an unidentified vector species outside of the C. variipennis complex, such as C. stellifer 79,80 . Molecular tools for accurate species-level delimitation within this complex is sorely needed for proper vector surveillance. Additionally, the detection of hybridization between a non-vector and vector species may be evidence of recent speciation, but it also highlights a potential path of introgression for genes controlling vector competency 81,82 .
Our study shows that using a population genomic approach to analyze sibling species can identify specieslevel divergence, fine-scale genetic structuring within species, and uncover the existence of hybrids and cryptic species in Culicoides. Radiation within the C. variipennis complex occurred despite the long-range dispersal capabilities of biting midges as well as hybridization between sympatric species. This does not preclude historical geographic isolation; however, we believe that behavioral and ecological isolation may have shaped evolution  Table 2. Mean pairwise F ST within and between species. The between species F ST values (below diagonal) were calculated using 566 SNPs and the within-species values (on diagonal) is the mean F ST calculated from individual species-specific datasets (see Table S4). www.nature.com/scientificreports/ within this group or is at least maintaining the current species boundaries. Significant geographic isolation was only found between populations of C. occidentalis, but more sampling is needed to determine if the lack of gene flow between California and the other populations represents an incipient speciation event or IBD. Additionally, focusing efforts in these various hybrid zones may provide a better understanding of the evolution of reproductive isolation in this group. Cryptic species have been reported in a number of other Culicoides species complexes 83 and the analyses presented here could help to identify these putative species. Delimiting the species in these complexes, will not only aid in vector surveillance efforts, but continued study of the speciation of closely related vector and non-vector species could produce valuable evolutionary insights into vector competency.

Materials and methods
Sample collection and sequencing. Culicoides midges were collected from 17 sites across the United States and Canada (Table 1). Specimens were collected either as pupae and reared to adulthood, or as adults using CDC light traps baited with CO 2 and UV light (Bioquip 2836BQ). Individuals morphologically assigned to the C. variipennis complex were sorted out from the by-catch and stored in 95% ethanol at -80 °C. Total DNA was extracted from individuals using a Puregene extraction protocol (Gentra Systems, Inc., D-5500A) with the addition of glycogen (ThermoFisher, R0561) to increase yields. DNA was only extracted only from females as their larger body size (compared to the males) produced sufficient amount of DNA for next-gen sequencing. The DNA quality was checked using gel electrophoresis and DNA concentration was measured using a Qubit 3.0 fluorometer and a Qubit dsDNA HS assay kit (Invitrogen, Q33230). A total of 300-400 ng of DNA per sample was sent to Floragenex, Inc. for library preparation using the protocol from Truong et al. (2012). DNA was digested using the restriction enzymes MseI and PstI. After PCR amplification, the samples in each plate were pooled and sequenced on a lane of single-end 100 bp sequencing on a HiSeq4000 at the University of Oregon Genomics Facility, Eugene, OR.
Raw sequence filtering and processing. Raw sequence quality was first assessed using FastQC v.0.11.9 and MultiQC v.1.7 84,85 , and then reads were filtered and processed using Stacks v.2.3 86 . Reads with a phred score below 25 were removed as well as individuals with a > 75.0% missing data. Next, reads were aligned to the C. sonorensis genome 87 (Accession: PRJEB19938) using the Burrows-Wheeler Aligner (BWA-mem) 88 . Finally, aligned reads were run through the reference-based pipeline of Stacks. Filtering options were set to only include loci found in at least half of the sampling locations (-p 8) and those occurring in at least 50% of individuals within those sites (-r 0.5) 89 . The minimum allele frequency was set to 0.05 to protect against potential sequencing errors 90 , and only the first SNP per locus was kept to minimize linkage disequilibrium between SNPs from influencing population structure and phylogenetic analyses. . The clustering of individuals into the distinct genetic groups was also visualized using a principal component analysis (PCA) and a discriminant analysis of principal components (DAPC). The most likely number of genetic groups was inferred by the find.clusters algorithm for the PCA and the optimal number of principal components to inform the DAPC was defined using the function optim.a.score. Both were performed in R 94 through the adegenet package 95 . Any individual with more than 25% of their loci grouping with a second cluster in the fastSTRU CTU RE analysis was marked as a hybrid and removed from the phylogenetic analysis, due to the uncertainty of assigning them to a given species. Maximum likelihood phylogeny among individuals was run using RAxML v.8.2.12 96 . An acquisition bias correction was applied to the likelihood calculations as alignments were solely composed of SNPs, with each invariant site removed through Phrynomics (https:// github. com/ bbanb ury/ phryn omics) 97 . The GTR + G nucleotide substitution model was used for each search. A rapid bootstrap analysis and search for the best-scoring maximum likelihood tree was executed using the extended majority rule-based bootstopping criterion to achieve a sufficient number of bootstrap replicates 98 . Additionally, to cross-validate our results, a second phylogeny was inferred in W-IQ-Tree version 1.6.12 99 , using the TVM + F + G4 substitution model determined by ModelFinder 100,101 . Branch support was calculated using 1000 ultrafast bootstraps 102 and a Shimodaira-Hasegawa like approximate likelihood-ratio test (SH-aRLT) 102,103 .
To measure the amount of divergence between genetic clusters, a new SNP dataset was generated with individuals grouped by cluster rather than locality. Additionally, to measure the amount of divergence within each genetic cluster, four cluster-specific datasets (grouped by site) were also generated. SNPs were obtained from these new datasets using the same processing methods above except with more stringent filtering parameters. Only SNPs that occurred in at least 75% of the clusters or collection sites and at least half of the individuals within those groups were included. Genetic diversity estimates (F IS , H E , and H O ) and population differentiation (pairwise F ST ) were calculated for each species dataset using Genepop v.4.7.0 104 . Population differentiation was not calculated for the new species found in San Diego, as only a single population of this species was uncovered. Rousset's distance aR 105 and Loiselle's kinship coefficient (LKC) 106 were calculated respectively with SPAGeDi v.1.5 107 . Geographic distances among localities were calculated as both Euclidean and anisometric distances and a Mantel test and a Partial Mantel test were preformed to test for isolation-by-distance (IBD) 108 . Tests for areas of significant genetic dissimilarity among individuals using aR were implemented in MAPI using 1000 replications 109 . Mitochondrial sequencing and haplotype network. Mitochondrial DNA haplotypes were obtained from a subset of 67 individuals from the five genetic clusters. PCR reactions were performed using a Taq-Pro COMPLETE kit (Denville Scientific, CB4065-4) targeting a partial region of the COI gene with the Lep50 primer set from Folmer et al. (1994) and the thermocycler profile from Herbert et al. (2003). PCR products were cleaned using an EXOSAP-IT kit (ThermoFisher, 78201.1.ML) and prepared for sequencing using a BigDye Terminator  , 4337454). Sanger sequencing was done using an Applied Biosystems 3500 Genetic Analyzer. Chromatograms were cleaned and aligned using the software Geneious v.9.1 110 . A haplotype network analysis was conducted using the 67 COI sequences obtained in this study combined with 218 C. variipennis complex sequences previously collected from 25 states across the U.S. 111 . Sequences were aligned in MEGA v.10.1.8 112 and trimmed to ensure all sequences contained identical lengths. A median-joining analysis was performed using NETWORK v.5.0.1.0 113 . Specimens collected in this study were assigned a color based on the results from the SNP clustering analyses while the remaining samples were left unassigned. All individuals were used to calculate the mean uncorrected p-divergence between and within the different groupings inferred from the haplotype network using MEGA.

Data availability
The data reported in this study will be deposited in the Open Science Framework database upon acceptance, https:// osf. io (https:// doi. org/ 10. 17605/ OSF. IO/ E3Z72). Mitochondrial sequences obtained in the current study have been deposited under Genbank Accession Numbers OL604713-OL604779.