Comparison between single and multi-locus approaches for specimen identification in Mytilus mussels

Mytilus mussels have been the object of much research given their sentinel role in coastal ecosystems and significant value as an aquaculture resource appreciated for both, its flavour and nutritional content. Some of the most-studied Mytilus species are M. edulis, M. galloprovincialis, M. chilensis and M. trossulus. As species identification based on morphological characteristics of Mytilus specimens is difficult, molecular markers are often used. Single-locus markers can give conflicting results when used independently; not all markers differentiate among all species, and the markers target genomic regions with different evolutionary histories. We evaluated the concordance between the PCR-RFLP markers most commonly-used for species identification in mussels within the Mytilus genus (Me15-16, ITS, mac-1, 16S rRNA and COI) when used alone (mono-locus approach) or together (multi-locus approach). In this study, multi-locus strategy outperformed the mono-locus methods, clearly identifying all four species and also showed similar specimen identification performance than a 49 SNPs panel. We hope that these findings will contribute to a better understanding of DNA marker-based analysis of Mytilus taxa. These results support the use of a multi-locus approach when studying this important marine resource, including research on food quality and safety, sustainable production and conservation.

. Locations and codes for the six sampling sites. Codes for locations can be found in Table S5. Color indicates species as determined using the PCR-RFLP Me15-16 AciI assay: red for Mytilus chilensis, orange for Mytilus galloprovincialis, blue for Mytilus edulis and black for Mytilus trossulus. Background topographic map from GeoMapApp (http://www.geomapapp.org).  33 and Toro 67 , this marker therefore only differentiated between M. trossulus and the other three species, although those authors obtained somewhat different fragment sizes using agarose gels.
Mitochondrial marker PCR-RFLP COI XbaI targets the cytochrome oxidase subunit I region with the primers COIXbaF and COIXbaIR, designed to distinguish M. chilensis from other mussels. The 233 bp amplicon was restricted with the XbaI enzyme, generating two fragments (134 and 99 bp) in M. chilensis 20 only (Fig. 2c).
The mitochondrial marker PCR-RFLP 16S rRNA produces a ~540 bp fragment with universal primers 16sar-L/16sbr-H 68 46 and Zardi et al. 69 described a Southern Hemisphere M. galloprovincialis haplotype with three fragments (342, 167 and 28 bp) (Fig. 2d). For clarity, we will conserve the name M. galloprovincialis to refer the former Northern Hemisphere haplotype and use M. chilensis for the former Southern Hemisphere haplotype (Fig. 2d).
DNA amplifications were performed in a Techne TC-412 (Bibby Scientific Ltd, UK) thermocycler and Palm-Cycler TM (Corbett Life Science, Australia) with recombinant Taq DNA polymerase (RBC Bioscience ® and Thermo Fisher ® ) using 40 ng of template DNA in a final volume of 25 μL. The PCR primer pairs described were used as reported by the above authors with no modifications. Details regarding the PCR reaction conditions and amplification profiles for each marker are shown in Table 1. All experiments included a negative control (with no template DNA added) and a positive control consisting of template DNA that was previously extracted and successfully amplified by conventional PCR with Me15-16 54 . Digestion with AciI (New England Biolabs), HhaI and XbaI (Thermo Scientific) was performed separately in a final volume of 20 μL, using 15 μL of PCR product and 4, 10 and 10 units of each enzyme with 1x NEB4 and Tango buffers, respectively. Triple digestion with EcoRV, NheI and SpeI (New England Biolabs) was performed in the same reaction using 10, 5 and 5 units of each enzyme, respectively, with 1x NEB2 buffer. All incubations were carried out overnight at 37 °C. The size of amplified fragments resolved in PAGE was obtained by log-linear interpolation of the 10 bp DNA ladder (Invitrogen ® ) or HyperLadder V (Bioline ® ) on the gel.
Genotyping. Genotypes were scored using polyacrylamide gel electrophoresis (PAGE) (8%) and silver staining for the markers 18S rDNA, Me15-16, mac-1, COI and 16S rRNA, except for MT-1 specimens. In these individuals, mac-1 and 16S rRNA were genotyped with a Fragment Analyzer TM instrument (Advanced Analytical Technologies, Ames, IA), using the dsDNA 905 Reagent Kit (35-500 bp) following manufacturer instructions. This kit resolves 2 bp differences in DNA fragments and alleles. The data were normalized to 35 bp and 500 bp lower and upper markers and calibrated to the 75 to 400 bp range using PRO Size 2.0 software (Advanced Analytical Technologies, Inc.). Correspondence between the allele sizes obtained using the two methods was established by constructing an allele ladder, sizing all alleles obtained from polyacrylamide-scored genotypes and genotyping this allele ladder with a Fragment Analyzer TM instrument. Finally, the ITS marker was genotyped with the Fragment Analyzer TM in all individuals. Quality was verified by including negative controls in each run and re-genotyping a randomly-selected 5% of individuals.

Data analysis.
Mono-locus approach. The mitochondrial markers COI and16S rRNA might show two haplotypes, due to the double uniparental mitochondrial inheritance (DUI) observed in mussels, thus somatic cells carry the female and, in less frequency, the male mitochondrial genome possibly giving different haplotypes 70 .
To avoid affecting the species identification, individuals with two haplotypes were excluded from further analysis. Mitochondrial markers with only one haplotype were analyzed as homozygous diploid genotypes following Narum et al. 71 Numbers and frequencies of alleles and haplotypes were estimated with the R package strataG 72 . www.nature.com/scientificreports www.nature.com/scientificreports/ Individuals were assigned to a species based on each marker separately (nuclear Me15-16, nuclear ITS and mitochondrial COI) according the allele sizes and patterns defined for each species, as described above (Fig. 2). These markers are considered diagnostic in allopatric populations 7 . In the case of multiallelic markers mac-1 and 16S rRNA, species was determined using a leave-one-out (LOO) algorithm with the Bayesian method described by Rannala & Mountain 73 using GeneClass2 software 74 . Individuals were allocated to a species with an assignment threshold of 0.05. This procedure avoids the subjectivity of pooling species-specific compound alleles into synthetic alleles 75,76 .

RFLP-PCR
Each marker was evaluated through re-allocation analysis with the software package GeneClass2 as described above. A re-assignment was considered correct if it matched the classification by PCR-RFLP Me15-16 AciI. Numbers and percentages of matching and mismatching assignments were determined for each marker.
The concordance between PCR-RFLP Me15-16 AciI and the other markers was evaluated graphically using a heatmap as well as classical diagnostic test performance statistics for each species. (i) Sensitivity (S), calculated as the number of individuals correctly assigned to the species divided by the total number of individuals sampled from that species, reflects how well the test correctly assigns individuals to a species 77 . (ii) Specificity (E), calculated as the number of individuals appropriately excluded from the species divided by the total number of individuals who do not belong to the species, reflects how well the test correctly excludes individuals from a species 78,79 . Sensitivity vs. specificity for each marker in each species was plotted. (iii) Positive likelihood ratio (LR+), calculated as S/(1-E), summarizes how many times more likely it is that individuals of a species will be assigned to the species as compared to specimens of other species 80 . When specificity is 1.0, LR+ will be undefined, therefore, we added 0.5 to all counts in the table to calculate an approximate LR+ value 81,82 . The 95% of confidence intervals (95% CI), to determine if the diagnostics statistics values (S, E and LR+) were significantly different from zero, were estimated with the R package epiR (https://cran.r-project.org/web/packages/epiR/).

Multi-locus approach.
To visualize the separation of species using four markers (excluding PCR-RFLP Me15-16 AciI) and all five markers simultaneously. Two-dimensional factorial correspondence analysis (2D-FCA) and principal component analysis (PCA) were performed with the R package adegenet 83 . As with the mono-locus approach, the performance of the four and five markers together was evaluated using re-allocation analysis with the software package GeneClass2 (described above). We use the non-parametric Wilcoxon signed rank test to compare the assignment performance of the panel composed by the five RFLP-PCR markers obtained in this study (Table S4f) with that obtained with a 49 SNP panel in 338 mussels: M trossulus (17), M. edulis (27), M. galloprovincialis (105) and M chilensis (189), from previous work (Larraín et al. 38 ) and summarized in Table S2. In both studies, PCR-RFLP Me15-16 Acil was used as reference marker to perform specimen identification.

Results
All 298 individuals were successfully genotyped with Me15-16 23,54 as pure M. trossulus (50), M. edulis (50), M. galloprovincialis (99) or M. chilensis (99), producing the expected allele size (Fig. 2a). ITS and COI amplified in all individuals with PCR-RFLP. The loci mac-1 and 16S rRNA could not be amplified after two attempts each in five individuals. The global genotyping success rate was 98.32% across populations, with a 100% match rate in re-tested individuals (n = 15). On the other hand, only 16S rRNA marker showed two mitochondrial haplotypes in seven M. chilensis from MCh-1, 30 and three M. galloprovincialis from MG-2 and MG-1 populations, respectively. These 40 individuals were excluded from subsequent analysis, as was described in the data analysis section.
Mono-locus approach. First intron in the Mytilus actin protein gene: nuclear locus mac-1. Locus mac-1 was polymorphic in all locations, with 27 alleles ranging from 164 to 494 bp in length among individuals (Table S3). Two alleles (255 and 266 bp) were present in all four species and all six locations. Two alleles (303 and 328 bp) were exclusive to M. galloprovincialis. The frequency of the 328 bp allele was ten-fold higher in the Southern (0.220) than the Northern Hemisphere (0.021), while the reverse was true for the 303 bp allele. The M. chilensis populations (MCh-1 and MCh-2) and M. edulis sample from Canada (ME-1) were less diverse, with a maximum of three alleles (255, 266 and 298). M. trossulus had numerous private alleles with low frequencies except for two higher-frequency alleles (487 and 494 bp).
When the mac-1 locus genotypes were used to assign individuals to the species determined by PCR-RFLP Me15- 16  Concordance between mac-1 and PCR-RFLP Me15-16 AciI is shown in Fig. 3. Sensitivity (number of individuals from each species correctly assigned by mac-1 divided by total number of individuals from that species) was high for M. trossulus and M. galloprovincialis (0.76 and 0.68) but only 0.37 for M. chilensis (Table S5). This marker accurately excluded individuals from M. trossulus and M. galloprovincialis, with specificities of 1.00 and 0.99, respectively. The positive likelihood ratio indicated that M. galloprovincialis individuals were 63.63 times more likely than other individuals to be assigned to the species.
Nuclear marker PCR-RFLP ITS HhaI. The RFLP assay clearly differentiated M. trossulus from M. chilensis, M. edulis and M. galloprovincialis but, as expected, did not distinguish among the latter three (Fig. 2b). All 50 M. trossulus individuals were correctly re-assigned and the other 208 individuals correctly excluded (Fig. 3, Table S4b). Sensitivity and specificity for M. trossulus were optimal, with full concordance between ITS and Me15-16 (Fig. 4, Table S5).  (Table S5). This mitochondrial marker did not distinguish among M. trossulus, M. edulis and M. galloprovincialis but correctly excluded all M. chilensis specimens (specificity = 1.00).  www.nature.com/scientificreports www.nature.com/scientificreports/ 16S rRNA gene: mitochondrial marker PCR-RFLP 16S rRNA. After triple enzymatic digestion, we found other two haplotypes not previously described for this locus (shown with asterisk in Fig. 2d). The first one was present in one MT-1 individual (0.022) that was missing the NheI site in contrast to the standard M. trossulus haplotype. The second was present in three MG-2 individuals (0.15), each with the EcoRV site present and NheI site missing in contrast to the standard M. galloprovincialis haplotype (Fig. 2d).
The M. chilensis haplotype (the species previously named Southern Hemisphere M. galloprovincialis) was fixed (1.0) in MCh-1 and MCh-2. The frequency of this haplotype was of 0.10 in the third Chilean population (MG-2), identified as M. galloprovincialis by PCR-RFLP Me15-16 AciI. In the Northern Hemisphere populations, this haplotype was present at very low frequencies (0.02) only in ME-1, and absent in MG-1 and MT-1 (Table S6).

Multi-locus approach. This analysis uses the genetic information provided by all five markers simultane-
ously. The 2D-FCA multi-locus approach separated all four species, as shown in Fig. 5. As expected, ME-1, MG-1 and MG-2 mapped very closely. The Northern Hemisphere populations (MG-1 and ME-1) overlapped in all of the PCA plots (Fig. 6). These plots also showed an overlap between the Chilean M. galloprovincialis population (MG-2) and the same species from Spain (MG-1) but no overlap with ME-1. On the other hand, the M. chilensis populations (MCh-1and MCh-2) were clearly separated from the other Mytilus species, including M. trossulus (MT-1).
The comparison of assignment performance obtained with the RFLP-PCR multi-locus panel and the SNP panel did not show significantly differences (p-value = 1.00), indicating the same performance of both kind of markers sets.

Discussion
The use of genetic methods to assign taxonomic names to unknown Mytilus individuals, called specimen identification 84 , is useful for: increasing genetic knowledge of mussels, studying populations and hybrid zones, establishing taxonomy and systematics, identifying evolutionary relationships and phylogeny within the genus, performing ecological studies and verifying food authenticity.
As the Mytilus genus contains several taxa, researchers may need to authenticate the target species using molecular markers. Species assignment is typically performed using a single locus independently, called the mono-locus approach, likely because this method is relatively fast and cheap 85 . However, not all markers can differentiate all of the species in the genus. Furthermore, because the various markers target different regions in the genome, they often produce nonequivalent classification results 38,86 .
The 65 pb intron length polymorphism in the actin gene mac-1, used for genotyping in Mytilus population studies 49 , systematically fails to amplify in some individuals (~2%), possibly due to mutations in priming sites producing null alleles, as in microsatellites 87 . This phenomenon hinders allele scoring, limiting the accuracy of the mono-locus approach with this marker. mac-1 correctly excluded M. trossulus and M. galloprovincialis from other species but showed a weak ability to identify the four Mytilus species, with only 56.9% of assignments matching the results produced using PCR-RFLP Me15-16 (Table S4a). The poor performance of mac-1 is attributable to its limited ability to discriminate between M. edulis and M. chilensis: 18 of 50 M. edulis individuals were assigned to M. chilensis and 57 of 92 M. chilensis to M. edulis. Therefore, several studies that have used this marker in Chilean blue mussels (M. chilensis) may have been affected by this bias 2,65 . This nuclear marker has multiple alleles, some of which are shared across Mytilus species, and is therefore not fully diagnostic (Table S3). This multiallelic characteristic led others to propose the use of "synthetic alleles, " in which several similarly-sized alleles are pooled for specimen identification 76 . Pooling improved performance, but several of the compound synthetic alleles were still not exclusive to any one species 76 . Because the polymorphism in mac-1 is located in an intronic region, is quite variable even within a given species, as well as technically difficult to score, limiting performance.
ITS targets the internally transcribed spacer sectors between the 18S and the 28S genes from the nuclear rDNA coding region. The genomic organization of rDNA consists of a variable number of tandem repeats that is sufficient to provide a DNA template for PCR 88 . ITS successfully amplified all individuals and allowed for definitive discrimination of M. trossulus. Although ITS could not separate the other three species, concordance with Me15-16 AciI was optimal for M. trossulus. This result is consistent with other works comparing ITS to other markers 58 . For example, Toro 67 discriminated M. trossulus from M. chilensis and M. edulis but could not distinguish between the latter two. In the same work, these three species were clearly differentiated using the Glu-5′ marker targeting the polyphenolic adhesive protein gene, confirming that ITS only can distinguish M. trossulus specimens. Due to its multicopy nature, authors have warned that ITS should not be considered a codominant single-copy Mendelian marker 56,59 . However, Heat et al. 33 observed a Mendelian-like inheritance pattern when using ITS to genotype progeny from two test crosses (M. edulis x hybrids M. edulis/M. trossulus).
The mitochondrial PCR-RFLP COI marker showed full concordance with Me15-16 AciI marker, identifying the Chilean mussel in all locations sampled. However, two individuals (B12 and B42) from the Dichato population (MG-2) in the Arauco Gulf, that were classified as M. galloprovincialis by Me15-16 AciI, were identified by COI as M. chilensis. The Arauco Gulf is a sympatric zone where the presence of the non-indigenous M. galloprovincialis has been described 29,63,64,89,90 . Also, in this zone, our group found a frequency of ~4-7% for hybrids of the two species as part of a routine analysis using the PAPM marker 55   www.nature.com/scientificreports www.nature.com/scientificreports/ The mitochondrial PCR-RFLP 16S rRNA showed full concordance with the nuclear PCR-RFLP Me15-16 AciI marker for identifying M. trossulus and M chilensis, but somewhat lower sensitivity for M. edulis (0.98). This statistic decreased to 0.08 for the Mediterranean mussel (Table S5, Fig. 4). Most M. galloprovincialis specimens were classified as M. edulis (59 of 66). This inconsistency between the markers is likely due to the fact that all Mediterranean mussel populations that have been tested in Europe 45 and the Southern Hemisphere 64,69 also carry the M. edulis haplotype. Therefore, PCR-RFLP 16S rRNA is only semi-diagnostic for M. galloprovincialis and M. edulis.
PCR-RFLP 16S rRNA exhibits an evident DUI of mitochondrial DNA 93 in 40 samples that present two mtDNA haplotypes, these individuals were not used in further species identification analysis. Of the remaining 20 individuals with one haplotype, 18 had M. galloprovincialis haplotypes and the other two carried the M. chilensis haplotype. These last two correspond to the same individuals classified as M. chilensis by COI (B12 and B42), as expected by the fact that the mitochondrial genome is considered one locus with 16S rRNA and COI variants inherited linked. This finding could indicate an asymmetric hybrid zone in the Arauco Gulf area, in which M. galloprovincialis being the predominant species, with a lower frequency of the native M. chilensis.
Current aquaculture practices in the Arauco Gulf zone involve production of Mediterranean and Chilean mussels in the same area. Interestingly, Westfall & Gardener 64 also found introgressed individuals with nuclear M. galloprovincialis genotypes and mitochondrial M. chilensis haplotypes in Cocholgue, a location 12 km away from our sampling point in Dichato, supporting the concept of a hybrid zone.
PCR-RFLP Me15-16 follows a Mendelian inheritance pattern 94 and is an extremely robust and reliable diagnostic marker for routine specimen identification 53,95,96 . Therefore, this method is the most common DNA-based technique for identifying mussel species. Of course, Me15-16 alone is not able to distinguish introgressed individuals 40 , and also, is not able to differentiate M. chilensis from the Southern Hemisphere lineage of M. galloprovincialis from New Zealand, because in both species, this fragment of genome is not cut by AciI due to the substitution of the allele "G" by "T" in the restriction site 97 .
The mono-locus approach offers some advantages: this method is fast, relatively easy to preform and simple to score with the fully diagnostic markers for the species analyzed here (Me15-16, ITS and COI). However, they show weakness, such as the fact that some markers have multiple alleles that are not fully fixed in each species (mac-1 and 16S rRNA), making them only semi-diagnostic. Also, the presence of multiple alleles sometimes hinders interpretation. Another problem with the mono-locus approach is that some markers cannot identify all of the species analyzed here when used alone, and when two or more are used simultaneously, they produce contradictory results 32,35,61 . Moreover, mitochondrial markers (COI and 16S rRNA) must be used simultaneously with a nuclear marker to detect introgression in hybrid zones.
The discrepancies observed among the PCR-RFLP markers were expected, as each Mytilus species diagnostic marker targets a single locus in distinct zones of the nuclear or mitochondrial genome, likely with different times to common ancestor or gene-genealogy 98 . Moreover, it is widely recognized that evolutionary forces act differently and in an uncoordinated way on nuclear and mitochondrial genomes, and even on different regions within the nuclear genome 11,41 , for example, in monocopy vs. multicopy genes or introns vs. exons. On the other hand, hybridization is also associated with conflicting results, as backcrossing with one or both parental taxa can lead to introgression of alleles from one taxon into the other 99 . In this case, analyses including different types of markers are preferable 41 .
Given that genome of smooth-shelled mussels is about 1.6 Gb 100 , species identification using a single locus or a very small number of loci is relatively straightforward. However, specimen identification based on simultaneous use of the information provided by each locus, known as the multi-locus approach, allows us to consider the evolutionary forces acting on different genomic regions. Therefore, this approach provides more coherent and reliable outcomes, especially when introgression has occurred 58 .
The development of genotyping-by-sequencing methods to affordably discover and genotype hundreds or even thousands of single-nucleotide polymorphism (SNP) markers makes it possible to apply a multi-locus approach to specimen identification 40,101 . New multi-locus SNP panels have been developed recently, allowing for species identification using only the most informative SNPs 38,40,96,[102][103][104][105] . As an example of the multi-locus approach in mixed populations, hybrids and introgressed individuals were detected by a new twelve-SNP diagnostic panel in European Mytilus samples previously analyzed with PCR-RFLP Me15-16 40 . The multi-locus approach can also be applied to specimen identification, simultaneously analyzing the mitochondrial and genomic markers traditionally used in mono-locus analysis 58 .
In this work, we used a multi-locus strategy with two mtDNA (PCR-RFLP COI and PCR-RFLP 16S rRNA) and three nuclear (PCR-RFLP ITS, mac-1 and PCR-RFLP Me15-16) markers. The re-assignment analyses using the multi-locus panel, excluding the reference marker PCR-RFLP Me15-16, incorrectly classified 31.8% of M. galloprovincialis as M. edulis. This finding is not wholly unexpected, as these species are closely related due to a long history of hybridization and introgression in Europe 36,44,76,106,107 . Therefore, discriminating between these two species poses steep analytical challenges. With this panel, the same two M. galloprovincialis individuals (B12 and B42) from MG-2 were again classified as M. chilensis, likely due to the absence of the PCR-RFLP Me15-16 marker. On the other hand, the full multi-locus panel including all five markers was fully concordant with PCR-RFLP Me15-16, clearly separating the four species, similar to results for a 49-SNP panel analyzed by Larraín et al. 38 .
These results indicate that a multi-locus approach, as described here, might improve the accuracy of PCR-RFLP marker-based specimen identification. Furthermore, using a single marker may result in poor performance, especially in mixed populations. The capabilities and limitations of each marker in a mono-locus and multi-locus approaches summarized in Table 2, can be useful when analyzing the results of previous studies in which these markers were applied.
We conclude that the PCR-RFLP markers Me15-16, ITS and COI produce largely equivalent results when applied using a mono-locus approach; however, the latter two are useful only for separating M. trossulus and www.nature.com/scientificreports www.nature.com/scientificreports/ M. chilensis from the three other species (Table 2). Me15-16 distinguished among the four species tested but, as expected form single locus data, was not able to detect introgression. Mono-locus results for the nuclear mac-1 and mitochondrial 16S rRNA markers, due to their semi-diagnostic status, were difficult to interpret and showed low concordance with the results derived from the multi-locus approach. All five markers used simultaneously in a multi-locus approach produced more reliable and robust identifications, outperforming each of the markers when used separately, and comparable performance of SNPs panels. These findings support the use of a multi-locus approach when studying this important marine resource, with implications for research on food quality and safety, sustainable production, biodiversity and conservation.