Introduction

The target of conservation and restoration strategies for threatened taxa customarily is the species, thus correct identifications and determination of a species’ geographic range are fundamental to guarantee the success of implemented measures (Avise 1989; Frankham et al. 2002; Gaston and Fuller 2009). However, species delineation can be problematic in the presence of morphological ambiguities due to phenotypic plasticity, convergence (e.g., cryptic species), and/or interspecific hybridization. The use of molecular genetics can aid species delineation in the case of phenotypic plasticity and/or convergence (Patel et al. 2015), but in the presence of interspecific hybridization, delineation of species can still be problematic (Fitzpatrick et al. 2015).

Hybridization; defined here as the reproduction between members of genetically distinct populations (Barton and Hewitt 1985), occurs in a large proportion of plant and animal species (reviewed in Mallet 2005; Schwenk et al. 2008; Taylor et al. 2015; Gompert et al. 2017). The number of individuals that are hybrid when in sympatry is typically low (e.g., <1%) (Mallet 2005), but in rare instances can be high (e.g., >5%) (She et al. 1987; Seeb 1998; Roques et al. 2001). Hybridization can lead to either speciation or homogenization (also referred to as reverse speciation) (reviewed in Abbott et al. 2013), and the description of hybrid zones and species interactions is of great interest in evolutionary biology (Barton and Hewitt 1985).

The shifting environments that occurred with the retreat of continental ice sheets at the end of the last glacial period provided opportunities for secondary contact and hybridization in many taxa. As the continental ice sheets began retreating, ~18–20 kya, they created new habitats which species colonized (Pielou 1991). In North America, the Laurentian Great Lakes were formed and species that survived in refugia during the glaciation followed the ice edges, colonizing the newly created habitat (Pielou 1991; Graf 1997, 2002). Post-glacial species range expansions led to secondary contact of formerly isolated populations and species, and if reproductive barriers were incomplete, gene flow may have occurred (hybridization with or without introgression).

Freshwater mussels of the order Unionida are among the many taxa whose distributions were affected by the retreating Laurentian ice sheet. Closely related freshwater mussel species that have come into secondary contact after the last glaciation in the lower Great Lakes (reviewed in Strayer and Jirka 1997) may hybridize, but the evidence is limited (Clarke and Berg 1959; Kat 1986; Cyr et al. 2007; Doucet-Beaupré et al. 2012; Krebs et al. 2013; Hewitt et al. 2019; Beauchamp et al. 2020). In this study, we investigated genetic divergence and gene flow between the two closely related unionid species Lampsilis siliquoidea (Barnes 1823); an interior basin species (Mississippi, Ohio River, Great Lakes, western Hudson Bay drainage), and Lampsilis radiata (Gmelin, 1792); an Atlantic slope species.

Unionids are one of the most threatened groups of organisms in North America (Bogan 1993; Williams et al. 1993; Master et al. 2000; Lydeard et al. 2004; Christian and Harris 2008) and in the world (Lopes-Lima et al. 2018) as a result of human impacts such as overharvesting, pollution, dams and the introduction of exotic species. Correct identification of species and determination of geographic ranges are critical in developing and implementing measures to conserve and restore species and populations; however, these fundamental issues remain unresolved for many species. Traditionally, unionid species identification is based on conchological characteristics. Overall, classification of mussels based on these characters has been reliable for most species; however, these features vary geographically and with the environment, which has led to ambiguity in the species’ taxonomy and conservation challenges (Williams and Mulvey 1997; Lydeard and Roe 1998; Shea et al. 2011). Hybridization among closely related species can be a cause of morphological ambiguities and this has never been tested in unionids.

Lampsilis radiata and L. siliquoidea are considered different species based on morphological characteristics (Kat 1986; Strayer and Jirka 1997; Turgeon et al. 1998; Williams et al. 2017) and typical L. radiata and L. siliquoidea specimens are not especially difficult to differentiate from each other. However, based on the presence of genetic (Kat 1986; Krebs et al. 2013) and morphological (Clarke and Berg 1959) intermediate forms in the lower Great Lakes, St. Lawrence River and Lake Champlain, some authors have suggested that hybridization is occurring (Clarke and Berg 1959; Kat 1986; Krebs et al. 2013). Furthermore, there has been a long history of name confusion and debate on their phylogenetic relationship; currently these species are considered full species (e.g., Kat 1986; Turgeon et al. 1998; Strayer and Jirka 1997; Williams et al. 2017), but some authors have considered them as subspecies, L. r. siliquoidea (L. siliquoidea) and L. r. radiata (Lampsilis radiata) (Clarke and Berg 1959) or L. siliquoidea as synonym of L. radiata luteola (Watters et al. 2009).

The goals of this study were to (1) determine the phylogenetic relationship and (2) quantify the levels and direction of genetic admixture (none vs. limited hybridization vs. introgression), and (3) determine the geographic extent of the hybrid zone. In this study, we consider species as separately evolving metapopulation lineages (De Queiroz 1998; De Queiroz 2007). Species boundaries and potential hybridization were determined using maternally (or F-type mtDNA) and paternally (or M-type mtDNA) inherited mitochondrial cytochrome oxidase subunit I gene (COI) and seven microsatellite loci previously developed for the closely related species L. abrupta (Eackles and King 2002). Unionids and other bivalves have a distinct form of mitochondrial DNA (mtDNA) inheritance, termed doubly uniparental inheritance (DUI) (Zouros et al. 1994; Passamonti and Ghiselli 2009) where F-type and M-type mtDNA are inherited coexisting in the same individual (Zouros et al. 1994; Hoeh et al. 1996; Breton et al. 2007, Guerra et al. 2017; but see Breton et al. 2017). The origin of DUI predates the divergence of the orders Trigoniida and Unionida (Guerra et al. 2017) and the two mitogenomes do not recombine (Guerra et al. 2017). Given these characteristics the use of DUI is ideal to detect hybridization; the presence of F-type of one species and M-type of a second species within a single individual indicates an individual’s mixed ancestry (i.e., putative hybrids).

This study contributes to the general knowledge on speciation of closely related species that were isolated in the past but whose geographic ranges overlapped after the last glaciation allowing species to come into secondary contact and gene flow could have been re-established. This is relevant for vulnerable species whose protection depends on their classification and knowledge on their geographic ranges.

Materials and methods

Sample collection

In order to assess the phylogenetic relationship and levels of genetic admixture between Lampsilis siliquoidea and L. radiata a total of 1428 tissue samples from 77 sites across much of their geographic ranges were collected by the authors or kindly provided by colleagues from various agencies and institutions in the USA and Canada (Tables 1 and S1, Fig. 1). The sampling was designed to cover much of the distribution of each species, but concentrated on the putative hybrid zone in the lower Laurentian Great Lakes and St. Lawrence River drainage. The putative hybrid zone (Fig. 1, gray shaded area) is where the distribution of these species overlaps and where previous studies have indicated that the two species may hybridize (Clarke and Berg 1959; Kat 1986; Strayer and Jirka 1997; Krebs et al. 2013).

Table 1 Sample acquisitio: site abbreviation, waterbody, source, and catalog number.
Fig. 1: Sampling sites for Lampsilis siliquoidea (orange), L. radiata (dark blue), and putative hybrid zone (shaded gray).
figure 1

White circles represent locations where samples failed to amplify. Map created in ArcGIS version 10.1. Data source: CanadaRivers by ArcCanada3.1; North America State Province Boundaries and United States boundaries states, ESRI; Great Lakes shoreline Geomorphology.5, GLERLl. Projected Coordinate System: WGS_1984_UTM Zone17N.

Mussels collected in the field were identified to species and sexed when possible. Typical specimens of L. siliquoidea and L. radiata outside the hybrid zone are easily identified to species based on shell characteristics (Fig. 2). Lampsilis siliquoidea’s ventral margin is straight or slightly rounded, has a glossy periostracum (outer shell layer) and the nacre is always white with bluish tinge. Lampsilis radiata’s ventral margin is curved, the periostracum is roughened by fine wrinkles and nacre can be white, bluish-white, or pink (Fig. 2). There is not a definite hybrid morphology; some individuals may have characters of one or both species (Fig. 2); therefore, it is impossible to identify a hybrid solely based on morphological characters. Individuals from the putative hybrid zone were assigned to the species they most resembled. Species identification and recognition of hybrids were based on genetic data (see below).

Fig. 2
figure 2

Shell images of Lampsilis siliquoidea, hybrids identified using mitochondrial DNA Lampsilis radiata cluster 1 and L. radiata cluster 2. Localities (left to right) for L. siliquoidea: St. Croix River (MN), French Creek (NY), Ellicott Creek (NY), Lac Chicobi (QC). Hybrids: Sodus Bay (NY), Moira River (ON), Rivière Batiscan (QC), Lac St. Pierre (QC). L. radiata cluster 1: Young Lake (NY), Hudson River (NY), Rivière aux Brochets (QC), Sandy Stream (ME). L. radiata cluster 2: Meherrin River (NC), Little Waccamaw River (NC), Tar River (NC), Lake Marion (SC); L. radiata cluster 2 images by Jamie Smith (North Carolina Museum of Natural Sciences), catalog number for each specimen is shown below each image. Black horizontal bars correspond to 1 cm.

These species are sexually dimorphic, and sex was identified based on shell shape. Females posterior end is broader, in males the posterior end is bluntly pointed (Fig. 2). Sexual dimorphism is pronounced in L. siliquoidea and males are easily recognized. In L. radiata, sexual dimorphism is subtle and only males with marked characteristics were chosen. Additionally, gills were visually inspected for signs of gravidity. Tissue and swab samples of males and females were collected by non-lethal mantle biopsies (Berg et al. 1995; Henley et al. 2006). Up to five male individuals from each location were lethally collected to obtain tissue from the gonads. Mussels that were not retained were returned to the same location alive. All tissue samples and whole male specimens were fixed in 95% ethanol. Swab samples were stored in lysis buffer. Vouchers will be deposited at the Buffalo Museum of Science (BSNS).

Laboratory analysis

Genomic DNA was extracted from 0.10 to 0.25 cm3 of tissue or from 200 ml of lysis buffer from each sample using a modified alcohol extraction method following Wilson (1997). Phylogenetic relationship and levels of genetic admixture between L. siliquoidea and L. radiata were determined using the female (F-type) and male (M-type) inherited mitochondrial gene cytochrome oxidase I (COI, see below) and seven microsatellite loci developed for L. abrupta (Eackles and King 2002) that amplify across Lampsilis species (Eackles and King 2002; Rowe and Zanatta 2015).

Female inherited COI was amplified from mantle tissue using primers and amplification conditions from Folmer et al. (1994) and M-type inherited COI was amplified from male gonad tissue using the primer pair LCO1490 (Folmer et al. 1994) and Lamp mHCO (Krebs et al. 2013). PCR conditions for F-type and M-type inherited COI were as described in Krebs et al. (2013). All PCR products were screened on 2% agarose gel to confirm amplification and targeted sequence size. The forward strand was sequenced for all samples. Due to limited funding, the reverse strand was only sequenced to verify unique haplotypes and for problematic sequences (e.g., poor sequence quality at one end of the sequence). Sequences with overlapping peaks or poor quality were re-sequenced. Sequences were obtained through Sanger sequencing.

Each microsatellite locus was amplified via a polymerase chain reaction (PCR) in a 10 μl reaction containing the following concentrations: 10.0–20.0 ng/μl of extracted genomic DNA, 0.3 mM dNTPs, 10 mM Tris-HCl buffer (pH 8.3), 2.5–3 mM MgCl2, 0.2 μM each fluorescently-labeled primer and 1U Taq polymerase. The amplification conditions as follows: initial heating at 94 °C for 2 min, then 30 cycles of 94 °C for 40 s, annealing at 53–57 °C for 40 s, and a 1 min extension time at 72 °C followed by a final extension of 10 min at 72 °C. All PCR products were screened on 7% polyacrylamide gels in a LI-COR NEN® Global IR2 DNA Sequencer System, using fluorescently labeled primers. Allele size was determined by comparing amplified products to 50–350 bp size standards (LI-COR Biotechnology Division). Locus C2 (Eackles and King 2002) has a compound trinucleotide repeat motif and it showed two alleles that only differed in length by one base-pair (scored in this study as 157 and 158; these alleles are 19 bp longer than the length reported by Eackles and King (2002) due to an M13 tail that is added with florescent dye). These alleles were cloned using TOPO TA Cloning kit (ThermoFisher Scientific) following the manufacturer’s manual. Transformed clones were sequenced using Sanger sequencing by TACGen, CA. That demonstrated that they are two different alleles, identified as 157 and 160.

Phylogenetic relationship and levels of intermixing between L. radiata and L. siliquoidea

Mitochondrial

To determine the phylogenetic relationship of L. siliquoidea and L. radiata the F-type and M-type inherited mitochondrial genes COI were used. F-type and M-type-inherited mtDNA genomes diverged at least 200 mya (Curole and Kocher 2002; Hoeh et al. 2002); therefore, F-type inherited and M-type inherited COI can be considered as independent markers and all analysis were done separately. Chromatograph files that exhibited overlapping peaks were discarded. This was observed mainly in gonad tissue where F-Type and M-type can both be present (Breton et al. 2017). Chromatograph files of COI were aligned and edited using GENEIOUS v.10 (Kearse et al. 2012) and sequences were translated using the mitochondrial invertebrate genetic code to ensure the absence of stop codons. Identical haplotypes were collapsed using DNASP v.6 (Rozas et al. 2017). For ease of comparison, the haplotype numbers presented here are the same as the haplotype numbers in Krebs et al. (2013). All new haplotypes were numbered sequentially and submitted to GenBank.

Additional F-type and M-type inherited COI sequences of representative species for each tribe of the subfamily Ambleminae were included in the analysis (Table S2). Phylogenetic trees were estimated using maximum likelihood (ML) and Bayesian inference (BI) in IQ-TREE v 1.6.12 (Nguyen et al. 2015) and MRBAYES v.3.2.6 (Ronquist and Huelsenbeck 2003), respectively. The best partition scheme for F-type and M-type was determined using PartitionFinder (Lanfear et al. 2012) in IQ-TREE software. Based on the lowest Bayesian Information Criterion score (BIC), a three partition scheme was used for F-type ML tree, one per each codon following the selected models: TN+F+G4, F81+F, and K3Pu+F+R2. A two partition scheme was used for P-Type ML tree following the selected models: TPM2u+F+G4 for first and second codons, and HKY+F+I for third codon. The best-fit models of nucleotide substitution for each partition were determined using ModelFinder (Kalyaanamoorthy et al. 2017) and tree branch supports were obtained with the ultrafast bootstrap (Hoang et al. 2018) using 1000 replicates. Bayesian inference was implemented in MRBAYES using Markov chain Monte Carlo simulations. The above mentioned partition schemes for F-type and M-type trees were used but implementing Nst=mixed which allows sampling across the time-reversable space in the Bayesian MCMC analysis (Huelsenbeck et al. 2004). Searches were conducted for 4 × 106 generations for maternal tree and 1 × 106 for the paternal tree (until the mean SD of the split frequencies fell below 0.01) discarding the first 25% of samples from the cold chain. Each run consisted of four chains and one tree was saved every 500 generations. Shape, pinvar, statefreq, and revmat were all unlinked and other parameters were set to default values. Convergence of log likelihood was examined using Tracer v. 1.7. (Rambaut et al. 2018). A consensus tree was obtained by including all the post burn-in sampled trees in MRBAYES using sumt.

Hybrids can be detected when there is a mismatch between species assignment for an individual’s F-type and M-type inherited COI sequence (Cyr et al. 2007; Doucet-Beaupré et al. 2012; Krebs et al. 2013). In other words, an individual was identified as hybrid when F-type and M-type mtDNA of the two different species were both found in the same individual.

Microsatellite analysis

Allele frequencies per locus/species were calculated in GENALEX v.6.5. Linkage disequilibrium (LD) between all pairs of loci per population and deviations from Hardy-Weinberg equilibrium (FIS fixation index) for each population and at each locus were calculated using FSTAT v.2.9.4 (Goudet 1995). MICROCHECKER v 2.2.3 (Van Oosterhout et al. 2004) was used to identify possible genotyping errors and FREENA (Chapuis and Estoup 2007) was used to estimate null allele frequencies. STRUCTURE analyses were run with potentially problematic loci removed (e.g., loci that had large proportion of null alleles) and compared with the results using all seven loci.

To estimate the probable number of clusters (K) in the data set, Bayesian model-based clustering based on seven microsatellite loci (Eackles and King 2002) was performed using STRUCTURE v.2.3.4 (Pritchard et al. 2000; Falush et al. 2003, 2007). The admixture model was chosen with correlated allele frequencies and no prior population/species information was used. The length of the burn-in was 1.0 × 105 and the number of MCMC replications after the burn-in was 5.0 × 105. The best estimate of K was calculated using STRUCTURE HARVESTER (Earl and vonHoldt 2012) following the ad hoc statistic ∆K (Evanno et al. 2005) and by plotting the maximal value of the probability of the data, Ln Pr(X|K), against a range of K. The best estimate of K is that where Ln Pr(X|K) is the maximum or the one after the trend plateaus (Pritchard et al. 2000). The number of ancestral clusters K was determined by comparing the likelihood values between five independent replicate runs of K from one to ten and the results were displayed using DISTRUCT v.1.1.2 (Rosenberg 2004). Levels of admixture (non vs. limited hybridization vs. introgression) were estimated using STRUCTURE (Pritchard et al. 2000; Falush et al. 2003; Falush et al. 2007; Hubisz et al. 2009) and NEWHYBRID v.1.1 (Anderson and Thompson 2002). STRUCTURE uses a clustering algorithm that calculates an individual’s ancestry (q) and NEWHYBRID calculates the posterior probability of an individual of belonging to each of up to six predefined categories (i.e., purebred, F1, F2, and backcrosses to each parental species). NEWHYBRID analysis was run four independent times with Jeffreys-like prior with a burn-in of 2.0 × 105 and 1.5 × 106 iterations. Posterior probabilities were averaged for the four runs.

Individuals arising from several generations of backcrossing are difficult to differentiate from pure individuals and recognizing their presence requires a large number of molecular markers (e.g., >48) (Boecklen and Howard 1997; Vähä and Primmer 2006). Because we only used seven microsatellite loci, robust assignment of each individual to a hybrid category was not possible, but a coarse classification of individuals in hybrid zones is possible when using four or five markers (Boecklen and Howard 1997). The first step to detect hybrids was to assign individuals to a purebred category (either L. siliquoidea or L. radiata) or to an admixed category based on each individual’s ancestry coefficient (q) calculated in STRUCTURE. Assignments of individuals were compared under four different combinations of number of cluster (K = 2 and K = 3) and q thresholds (0.8 and 0.9). K = 2 was used assuming that there were two species contributing to the gene pool, but K = 3 was also used because that was the most likely number of cluster present in the data set (see results below). For K = 3 the q values of the two clusters within L. siliquoidea were added, and q values for L. radiata remained the same as STRUCTURE only found one cluster for this species. An individual was considered purebred if the ancestry coefficient (q) was equal or larger than a threshold of 0.8 or 0.9. Previous studies have shown that the greatest efficiency and accuracy of assigning individuals to a purebred category was achieved at thresholds of 0.8 (Vähä and Primmer 2006; Patel et al. 2015; van Wyk et al. 2017) but using a threshold of 0.9 decrease the chances of wrongly calling a hybrid a purebred (Beaumont et al. 2001; Grant et al. 2004; Vähä and Primmer 2006; van Wyk et al. 2017).

Then, the individuals with mixed ancestry identified by STRUCTURE were assigned into a hybrid category in NEWHYBRID. For this analysis a threshold of 0.5 was used to assign to a hybrid category and 0.8 to assign to a purebred category.

Results

Phylogenetic relationship between L. radiata and L. siliquoidea and levels of intermixing

Mitochondrial analysis

A total of 525 F-type inherited (625 bp) COI sequences from 66 sites and 116 M-type inherited (617 bp) COI sequences from 31 sites were obtained (Table S3a, b). A total of 28 (23 not previously reported) F-type inherited haplotypes were found for L. siliquoidea and 19 (16 not previously reported) for L. radiata (GenBank accession numbers MN432615-MN432653). A total of 13 (seven not previously reported) M-type inherited haplotypes were found for L. siliquoidea and five (three not previously reported) for L. radiata (GenBank accession number MN432654-MN432663). A total of 20 F-type inherited and 22 M-type inherited haplotypes that Krebs et al. (2013) reported were not found in this study. Maternally- (F-type) and paternally-(M-type) inherited topologies were identical (Figs. 3 and 4, respectively); therefore, only the ML topologies are shown but including the BI support values. Lampsilis siliquoidea formed a reciprocally monophyletic group in all analyses (Figs. 3 and 4). Lampsilis radiata formed a reciprocally monophyletic group in the M-type analysis (Fig. 4) but not in the F-type analysis (Fig. 3). Lampsilis radiata from the lower Great Lakes and St. Lawrence River, Lake Champlain, Susquehanna River, the north Atlantic slope (New York: HUR, Virginia: VAPR, VAOR, VAMR; abbreviations as in Table 1) and south Atlantic slope drainages (North Carolina: NCMR) formed a clade (henceforth named “cluster 1”). Thirteen sequences (seven unique F-Type haplotypes, Hap_69-Hap_75, GenBank accession numbers MW041231- MW041237) from individuals from the south Atlantic Slope (North Carolina: NCTR, NCLR NCMR, NCLW, NCPD, NCSC; South Carolina: SCBR, SCLM, abbreviations as in Table 1) grouped with L. fullerkati and L. radiata from the Waccamaw and the Yadkin/Pee Dee rivers (Fig. 3) (McCartney et al. 2016) (henceforth named “cluster 2”). However, McCartney et al. (2016) suggested that L. fullerkati is a lake form of L. radiata. This could not be examined in the M-type topology because gonad sequences for the individuals that form cluster 2 were not available.

Fig. 3: Maternally-inherited COI (F-type) Bayesian Inference (BI) and Maximum Likelihood (ML) combined tree.
figure 3

Support values are posterior probability for BI and bootstrap for ML. Asterix (*) indicates nodes with posterior probabilities ≥0.80 (above line) and bootstrap values >80 after 1000 replicas (below line). Posterior probabilities (percentage) or bootstrap below 50 are not shown for clarity. Scale bar is for ML genetic distance.

Fig. 4: Paternally-inherited COI (M-type) Bayesian likelihood and Maximum Likelihood combined tree. Support values are posterior probability for BI and bootstrap for ML.
figure 4

Asterix (*) indicates nodes with posterior probabilities ≥0.80 (above line) and bootstrap values >80 after 1000 replicas (below line). Posterior probabilities (percentage) or bootstrap below 50 are not shown for clarity. Scale bar is for ML genetic distance.

The genetic distances calculated using TN93+G as the best-fit-model between the main L. radiata cluster 1 and L. siliquoidea was 2.50% for F-type inherited mtDNA and 3.00% for M-type inherited mtDNA. The genetic distances between L. radiata cluster 2 and L. siliquoidea was 2.54% and between L. radiata clusters was 1.12%.

Within the main L. siliquoidea group and L. radiata cluster 1, clades with high support (>80%) were observed indicative of populations or even sub-species within each lineage (Fig. 3). Furthermore, some of these clades corresponded to geographic location (Fig. 3, Table S3). For example in L. siliquoidea, haplotypes 24, 25, and 27 are found in the Ohio River drainage but not in the Great Lakes and Haplotypes 41 and 42 were only found in Lake Pepin, Mississippi River.

Female and M-type inherited mtDNA haplotypes for both species were found at 11 of the 77 sites (Fig. 5), and evidence of hybridization was observed at six of these locations (Fig. 5): Lake Ontario basin: Sodus Bay, Moira River; Susquehanna Basin: Tioughnioga River; Saint Lawrence River basin: Rivière Châteauguay, Lac Saint-Pierre and Rivière Batiscan River. The haplotypes notation in Fig. 5 is as follows: “m/n LsLr” means that for a total of n males to which F-type and M-type inherited mtDNA were analyzed, m number of males had incongruences in the haplotype assignment; therefore, are considered hybrids. Then “n Lr” or “n Ls” indicate that in n number of individuals (males and females) inherited mtDNA haplotype corresponded to either Ls for L. siliquoidea or Lr for L. radiata. Of the 116 male individuals that were sequenced 12 (10.3%) were found to be hybrids, 59 (50.8%) had matching F-type and M-type Ls sequences, 41 (35.3%) had matching F-type and M-type Lr sequences, and four (3.4%) were indeterminate because of poor quality maternal COI sequences. Of the putative hybrids, nine (75%) had F-type Ls/M-type Lr sequences and three (25%) had F-type Lr/M-type Ls sequences. Detection of hybrids based on incongruences between F-type and M-type inherited mtDNA assignments has the limitation that only first-generation hybrids or certain backcrosses can be detected and there is no distinction between current from historical hybridization events.

Fig. 5: Geographic distribution of maternally and paternally-inherited COI haplotypes for Lampsilis siliquoidea (orange) and L. radiata (dark blue) in the putative hybrid zone (as described in Fig. 1).
figure 5

Haplotypes of both species were found in some locations (orange-blue circles); hybrids are shown as LsLr. Division within a circle does not depict frequencies, but only the presence of both haplotypes. For ease of visualization rivers were not drawn in the figure, drainage information for each location and location codes are in Table 1.

Microsatellite analysis

For the microsatellite analysis, a subset of 30 out of 77 sites were included, corresponding to a total of 782 individuals. Both species shared most of the alleles for all loci, but also unique alleles for each species were found in 4/7 loci (C2, C213, D111, C23) (Fig. S1). There was no linkage disequilibrium between all pairs of loci per population (Table S4). All populations except for MRF, SR, and PB (abbreviations as in Table 1) were in deviation from Hardy-Weinberg equilibrium and 38 out of 168 locus/population (FIS fixation index) (Table S5). The proportion of randomization that gave a larger FIS value than the observed was used to test for significant deviations from Hardy-Weinberg equilibrium. There was no evidence of scoring errors due to large allele dropout, but there was evidence of null alleles especially for loci D206 and D29, which explains the deviation from Hardy–Weinberg equilibrium across populations. There are other explanations for deviations from Hardy-Weinberg equilibrium such as the Wahlund effect, inbreeding, and selection, but since the deviation of these two loci was across populations, the presence of null alleles is the most plausible explanation. After eliminating these two loci, STRUCTURE analyses were re-run and it was found that there is greater difference in the number of admixed individuals based on q threshold (e.g., 0.8 or 0.9) or K (e.g., 2 or 3) (see below) than the effect of null alleles; therefore, we did not eliminate these two loci from the analyses. Furthermore, unusually large proportions (>20%) of null alleles are found in bivalves (McGoldrick et al. 2000; Launey et al. 2002; Nantón et al. 2014; Chiesa et al. 2016; Rico et al. 2017) and high frequencies of null alleles do not appear to have a significant effect in the population genetic parameters assessed for microsatellite loci in STRUCTURE (Rico et al. 2017).

The number of ancestral clusters calculated in STRUCTURE for the entire data set was K = 3 (Fig. 6) based on the ∆K method and Ln Pr(X|K) (Fig. S2). Lampsilis radiata and L. siliquoidea formed two distinct groups that are in agreement with the grouping resulted from the ML and BM analysis (Figs. 3 and 4). However, L. siliquoidea was subdivided in two other groups. One cluster was formed by populations from the Lake Huron drainage (MRF, SR, NR, MR) (henceforth named “Lake Huron cluster”); and the other cluster was formed by populations from the Lake St. Clair (LSC), Lake Erie (TC, EC) and Lake Ontario drainages (JC, HC, AC) (henceforth named “lower Great Lakes cluster”) (Fig. 6). These two clusters shared the common L. siliquoidea haplotypes 2 and 4 (Table S3a) which indicates that these are L. siliquoidea individuals. Likewise for the L. radiata cluster most populations shared the common haplotype 18 (Table S3a).

Fig. 6: Estimation of the number of clusters (K) using the program STRUCTURE for both species.
figure 6

Using K = 2 assuming that there are two species contributing to the gene pool (a), and K = 3, the most likely number for K. Collection location codes as in Table 1 except for LSC which is Lake St. Clair and it is composed of SCBB and SCBM and LR which is Lampsilis radiata composed of VAMR, VAOR and SCSR.

Using different q thresholds (0.8 vs. 0.9) and K = 2 or K = 3 generated slightly different interpretations of hybridization (Table 2). Table 2a shows the number of assigned individuals to each category (purebred vs. admixed) by STRUCTURE under different criteria: K values and q thresholds. At K = 3 and K = 2, more individuals were considered purebred at a q threshold of 0.8 (196 and 238, respectively) in comparison with a q threshold of 0.9 (167 and 216 respectively), which was expected. Using K = 2 more individuals were considered of mixed ancestry or purebred L. radiata in comparison with K = 3, at both q threshold values. However, fewer individuals were considered purebred L. siliquoidea at K = 2 than at K = 3. Table 2b shows the assignment of the admixed individuals identified by the STRUCTURE analysis to a hybrid category by NEWHYBRID. All hybrids were assigned to the F2 hybrid category. There were no F1 or backcrosses, which is a limitation of the low number of loci used here and the software (see Discussion section). When assigning admixed individuals from Table 2a to a hybrid category, a fraction of individuals were identified as a hybrid (Table 2b). A large number of admixed individuals (66 and 99 for K = 2, and 29 and 56 for K = 3, Table 2b) were assigned to a L. siliquoidea purebred category when using threshold of 0.5 and a smaller proportion (23 and 49 for K = 2 and four and 18 for K = 3, Table 2b) when using a q threshold of 0.8. STRUCTURE was successful at identifying purebred L. radiata and no admixed individuals were re-assigned to the L. radiata purebred category by NEWHYBRID.

Table 2 Number of assigned Lampsilis siliquoidea and L. radiata individuals to each category (purebred vs. admixed) by STRUCTURE under different criteria: K values and thresholds (a), and number of admixed individuals from part (a) assigned to a hybrid category by NEWHYBRID using threshold = 0.5 for hybrids and threshold 0.5(0.8) for purebreds (b).

Despite the differences in the number of individuals classified as hybrids based on different K and q threshold values, the locations where hybrids were found were consistent except for middle Maitland River (MRF), Hudson River (HUR), and the LR group (Table S6, Fig. 6).

Discussion

Phylogenetic relationship

Lampsilis siliquoidea and L. radiata were originally described as different species based on shell morphology (Gmelin 1791; Barnes 1823) and their distinction is supported by internal morphology (Kat 1986). Even though both species are found in lakes and rivers they inhabit slightly different habitats. In this study, searches for specimens included all riverine habitats (e.g., riffle, runs, pools, backwater areas) where mussels are found. Lampsilis siliquoidea was found mostly in fine sediments (e.g., silt) and occasionally in coarser sediments (e.g., gravel), whereas L. radiata was found in sand, gravel and rubble substrates (data not shown). To date, these species are still considered different species (Kat 1986; Strayer and Jirka 1997; Turgeon et al. 1998; Williams et al. 2017). The uncertainty on the validity of these species and the suggestion to reduce L. siliquoidea to L. radiata siliquoidea was based on the width of the hybridization zone in central New York State (Clarke and Berg 1959).

Genetic analysis conducted in this study supports that L. siliquoidea and L. radiata are species and not subspecies or populations within a single species. Even though genetic distances (mtDNA sequence divergence) between L. radiata and L. siliquoidea were low, they are within the range of mtDNA divergence observed between species in the family Unionidae. DNA sequence divergence between sister freshwater mussel species ranges between 1.7% and 10% (Lydeard et al. 1996; Mulvey et al. 1997; Roe and Lydeard 1998; Roe et al. 2001b; Jones et al. 2006; Doucet-Beaupré et al. 2012; Inoue et al. 2019), but it can be as low as <1% for recently diverged taxa (Krebs 2004; Doucet-Beaupré et al. 2012; Stanton et al. 2012; Inoue et al. 2014; Pieri et al. 2018).

Lampsilis siliquoidea and L. radiata occupy different geographic ranges and they evolved as separate lineages during the last glaciation and came into secondary contact in the lower Great Lakes and the St. Lawrence River drainages after glaciers retreated (reviewed in Strayer and Jirka 1997). Each lineage is characterized by a few haplotypes that are widely distributed across their respective geographic ranges crossing drainage divides, which is more consistent of species and not of different populations within the same species. Locations in which L. siliquoidea and L. radiata haplotypes co-occur are only in the secondary contact zone. If these two lineages were populations of the same species, common haplotypes would be found across the two lineages’ geographic ranges. Furthermore, within each species, groups with high support that corresponded to geographic location were observed (Fig. 3, Table S3); indicative of populations or even sub-species within each lineage, and all these populations (e.g., Ohio River and Mississippi River) shared the common L. siliquoidea haplotype 2.

The F-type topology showed that L. radiata is formed by two clusters, but the taxonomic relationship of these two L. radiata clusters needs to be resolved, and their presence further supports the distinctiveness of L. radiata and L. siliquoidea. Genetic distance between the L. radiata clusters was 1.12% whereas between L. radiata and L. siliquoidea genetic distance was >2%, indicating that even though 2% genetic distance seems low for delineating species, it is greater that within Lampsilis species/clades genetic distance.

Lampsilis belongs to the tribe Lampsilini which is monophyletic (Campbell et al. 2005; Lopes-Lima et al. 2017); however, the relationships among genera contained within this tribe such as Lampsilis, Obovaria, and Ortmanniana (Actinonaias) (Graf and O’Foighil 2000; Lydeard et al. 2000; Roe et al. 2001a; Campbell et al. 2005; Zanatta and Murphy 2006; Kuehnl 2009; Williams et al. 2017; Porto-Hannes et al. 2019) are problematic and need thorough investigation.

Hybridization

Based on incongruences in COI assignments and from seven microsatellite loci, the presence of admixed individuals indicated that these species hybridize where their geographic ranges overlap in the lower Great Lakes, St. Lawrence River, and Lake Champlain basins. Furthermore, the proportion of hybridizing individuals was overall high; ~10% from mtDNA and ~2–60% from microsatellite analyses (but see the discussion of limitations of these estimates when using low number of microsatellite loci). The direction of introgression appears to be mostly from L. radiata into L. siliquoidea as 75% of hybrids (assessed using mtDNA) had F-type Ls/M-type Lr; which is an interesting result given that hybrids were mostly found in L. radiata dominated populations. This suggests that there may be stronger pre or post-zygotic barriers or stronger selection against hybrids resulting from L. radiata females and L. siliquoidea males. However, this needs further investigation.

There is a recognized hybrid zone of many mammal, bird, and plants species in the lower Great Lakes region (Remington 1968; Swenson and Howard 2005), as well as among other freshwater mussel species in the St. Lawrence River basin (Cyr et al. 2007; Doucet-Beaupré et al. 2012) and fish races in the eastern Great Lakes (April and Turgeon 2006; April et al. 2013). This hybrid zone was formed after the last glaciation ~18–20 kya, when glaciers receded creating the Great Lakes of North America and species range expansion led to secondary contact between species that had been isolated.

Our results corroborate previous studies that suggested L. siliquoidea and L. radiata may hybridize in Lake Champlain (Kat 1986) and Lake Ontario drainage (Clarke and Berg 1959). Krebs et al. (2013) found L. radiata haplotypes in Lake Erie drainage (M-type inherited haplotypes 10, 13, and 16); however, in this study we did not obtain the same L. radiata mtDNA haplotypes. Our analyses placed M-type haplotype 13 within L. radiata but the phylogenetic relationship of M-type haplotypes 10 and 16 with L. radiata and L. siliquoidea is unclear (Fig. 4). Furthermore, in our analyses there was no evidence of L. radiata M-type haplotypes west of Lake Ontario and the geographic division (although blurred, see below) between L. siliquoidea and L. radiata seems to be in Lake Ontario between the Irondequoit River and Sodus Bay. On the northern (Canadian) side of Lake Ontario and Lake Erie the dividing line between the two species is less clear since there is a gap between Moira River and the Lake St. Clair drainage where L. siliquoidea is rare. Hybridization rates in the Moira River were very high, and east of that locality the majority of populations in the Canadian provinces of Ontario and Quebec are L. radiata with various degrees of hybridization (see below).

Contact or hybrid zones have been characterized as a continuum from unimodal to bimodal zones (Jiggins and Mallet 2000). In contrast, in this study the proportion of hybrids appeared to vary geographically forming a mosaic of hybrid swarms and purebred populations; however, this statement needs further testing. Some studies have reported similar geographic patterns of hybridization. Genetic analysis of the freshwater mussel species Pyganodon grandis and P. lacustris using heteroplasmic F- and M-type mtDNA found different frequencies of hybrid occurrences in two lakes on Beaver Island, in northern Lake Michigan (Beauchamp et al. 2020). Varying frequencies of hybrids from different lakes were also reported when assessing hybridization between “benthic” and “limnetic” species of Three-spined sticklebacks (Gasterosteus aculeatus) (Gow et al. 2006; Taylor et al. 2006). Historically in Enos Lake, Vancouver Island, BC, Canada, these species formed two distinct clusters but recently turned into one cluster due to high hybridization suggesting reverse speciation (Gow et al. 2006). In the marine realm, a sharp geographical discontinuity of introgressed and non-introgressed redfish Sebastes fasciatus and S. mentella was reported (Roques et al. 2001). However, despite high introgressive hybridization (~15% of all samples), sympatric populations maintained their morphological integrity resembling one or the other parental species (Roques et al. 2001).

The observed mosaic pattern of hybridization may be a result of historical events of dispersal and secondary contact and current ecological and selective pressures acting upon hybrid and purebred individuals across the landscape. An indication that some hybridization events may be historical is the presence of few hybrid individuals in populations that are outside the present secondary contact zone such as the Hudson River, middle Maitland, and Nottawasaga River or areas where there is no gene flow with other populations as above waterfalls in Tonawanda Creek. The evolution of complete reproductive isolation may take hundreds to millions of generations (Hewitt 2011). However, during the course of speciation, species can undergo changes in population sizes and/or geographic distribution (Hewitt 1996, 2011) and the processes that promote or break barriers to gene flow could be altered (Abbott et al. 2013). During the Pleistocene the arctic ice sheet advanced and receded initially with a roughly 41 ky cycle (from 2.4 mya to 0.9 mya) and later by a 100 ky cycle that produced changes in species distributions (reviewed by Hewitt 2011). Many species including freshwater mussel species experienced changes in their geographic distributions during this time (Ortmann 1913; Kat 1983; Bogan et al. 1989; Hewitt 1996, 2000, 2011; Watters 2001; Hewitt et al. 2019; Scott et al. 2019) which could have altered (weakening and/or strengthening) barriers to gene flow. There is no conclusive evidence that closely related freshwater mussel species hybridized during the Pleistocene interglacial times; however, the fossil record from the Fish House Clay fauna on the bank of the Delaware River in New Jersey from ca. 100 kya contains fossils that resemble extant mussel species from the Atlantic slope and from the Great Lakes, Mississippi and Ohio basins (Kat 1983; Bogan et al. 1989, but see an extensive discussion in Bogan et al. 1989 on the opposing views by some authors on the resemblance of the Fish House Clay mussel fossils with western extant species), suggesting that secondary contact between Interior Basin and Atlantic slope species during previous interglacial times was possible (Kat 1983, 1985).

In modern times, habitat use (lentic versus lotic) and substrate preference, differential tolerance to pollutants and sediments loads, host attraction and infection may favor or act against hybrids. Empirical tests are needed to quantify the differences in fitness (e.g., fecundity, survivorship, growth rates, etc.) between hybrids and purebreds in different habitats and in the presence of different host fish. Furthermore, because genomic regions may differ in the degree of introgression between species (Baack and Rieseberg 2007; Nosil 2008; Zheng and Ge 2010; Feder et al. 2014), the use of molecular data from barrier loci or genes under divergent selection could further shed light on the speciation process of freshwater mussels.

Differences between the proportion of hybrids detected by mismatches in mitochondrial haplotype species assignment was lower than the proportion of hybrids detected by microsatellites because detection of hybrids based on incongruences between F-type and M-type inherited mtDNA assignments has the limitation that only first-generation hybrids or certain backcrosses can be detected. Only males were used in this study which restricted the sample size since whole individuals need to be collected. DNA sequences from the F-type and M-type mitogenomes were useful in determining the phylogenetic relationships of L. siliquoidea and L. radiata and the broad scale geographic extent of hybridization. However, results from the microsatellite analysis should be considered a first approximation of the commonness of hybrids because of the limited number of loci and even though unique alleles for each species were found, both species shared most of the alleles for all loci. In order to estimate the exact number of hybrids and hybrid categories (F1, F2, or backcrosses) larger number of microsatellite loci (Boecklen and Howard 1997; Vähä and Primmer 2006) or the use of other markers such as single nucleotide polymorphisms (SNP) are needed.

Although the number of hybrids present was calculated using a limited number of microsatellite loci, it is worth noting that the presence of intraspecific genetic structure affected the number of hybrids identified with mixed ancestry. This should be further tested using SNPs or more microsatellite loci. The STRUCTURE analysis found that L. siliquoidea is subdivided in two general clusters, one composed of locations from Lake Huron drainage and the other by locations from the lower Great Lakes drainage. Strong population subdivision within a species could be the result of different post-glacial colonization routes into the Great Lakes (Beaver et al. 2019; Hewitt et al. 2019). Individuals from these two clusters share the common L. siliquoidea haplotypes which supports that these are L. siliquoidea. When using K = 2 it is assumed that there are two species contributing to the gene pool, and STRUCTURE calculates the ancestry coefficient (q) based on this. The problem arises when there is intraspecific genetic structuring and individuals’ genetic make-up can come from three (or more) different gene pools. An example to illustrate this issue can be seen in Johnson’s Creek population (JC, Fig. 6). When K = 2, a large number of individuals that have similar posterior probabilities (around 0.5) of belonging to either cluster (two different species in this case) will be identified as hybrids. When comparing these individuals in JC in the K = 2 plot with individuals from K = 3 plot, those that have mixed ancestry now show that they have a posterior probability to come from either the Lake Huron drainage cluster or from the lower Great Lakes drainage cluster and not from the L. radiata cluster; therefore, these individuals are no longer identified as hybrids. Many of the individuals that were identified as hybrids by the STRUCTURE analysis; especially when K = 2, were re-assigned to the purebred L. siliquoidea cluster or considered “unassigned” by NEWHYBRID. This demonstrates the importance to consider within species genetic structure and not always assume that K = 2 when determining individual’s ancestry. Furthermore, the use of both STRUCTURE and NEWHYBRID for analyses will produce more robust results and increase our confidence in the presence of hybrid specimens (Vähä and Primmer 2006).

The results presented in our study describe the first in-depth analysis of hybridization between freshwater mussel species that have come into secondary contact in the Great Lakes after the last glaciation. This is also the first study on Unionida to use multiple genetic loci (mtDNA and microsatellites) to assess hybridization. The project contributes to the general knowledge on speciation in unionids. Resolving the phylogenetic relationship of these species will allow for the development of better conservation strategies. The customary target of conservation and restoration strategies of threatened taxa is the species, thus correct identifications and determination of a species’ geographic range are fundamental to guarantee the success of implemented measures (Avise 1989; Frankham et al. 2002; Gaston and Fuller 2009). The existence of hybrids does not pose a problem for species delineation outside the hybrid zone. However, at the local scale (e.g., Provincial or State level) where protection of species and inventories of biodiversity depend on species delineation, the occurrence of hybrids possesses a problem of how the hybrids should be included in species diversity inventories and species management.

Furthermore, the effects of natural hybridization and introgression need to be considered when setting appropriate conservation policies (Rhymer and Simberloff 1996; Allendorf et al. 2001). The occurrence of hybridization and description of hybrid zones can inform managers and practitioners about the populations that should be used or avoided for population augmentation by translocation of individuals from healthy populations to increase population sizes and genetic diversity.