Genetic homogeneity of the invasive lionfish across the Northwestern Atlantic and the Gulf of Mexico based on Single Nucleotide Polymorphisms

Despite the devastating impact of the lionfish (Pterois volitans) invasion on NW Atlantic ecosystems, little genetic information about the invasion process is available. We applied Genotyping by Sequencing techniques to identify 1,220 single nucleotide polymorphic sites (SNPs) from 162 lionfish samples collected between 2013 and 2015 from two areas chronologically identified as the first and last invaded areas in US waters: the east coast of Florida and the Gulf of Mexico. We used population genomic analyses, including phylogenetic reconstruction, Bayesian clustering, genetic distances, Discriminant Analyses of Principal Components, and coalescence simulations for detection of outlier SNPs, to understand genetic trends relevant to the lionfish’s long-term persistence. We found no significant differences in genetic structure or diversity between the two areas (FST p-values > 0.01, and t-test p-values > 0.05). In fact, our genomic analyses showed genetic homogeneity, with enough gene flow between the east coast of Florida and Gulf of Mexico to erase previous signals of genetic divergence detected between these areas, secondary spreading, and bottlenecks in the Gulf of Mexico. These findings suggest rapid genetic changes over space and time during the invasion, resulting in one panmictic population with no signs of divergence between areas due to local adaptation.


Results
From the Genotyping by Sequencing (GBS) library of 229 lionfish samples, a total of 404,254 sequence tags were retained in TASSEL 32 . Filtering for individuals with at least 75% of the called loci and loci that were present in at least 84% of individuals yielded a total of 1,654 SNPs and 162 individuals from the NW Atlantic and Gulf of Mexico (see Tables 1 and 2 Detection of SNP outliers. From the dataset of 1,220 SNPs, 23 outlier SNPs were identified as candidate markers under positive selection with Arlequin after FDR correction (24 SNPs were identified when the uncorrected p-value ≤ 0.01 was applied). No marker was identified to be under balancing selection from Arlequin (see Fig. 2). Lositan identified a total of 213 outlier SNPs, 73 of them candidates under positive selection (Fig. 2) and 140 candidates under balancing selection. Among all outlier SNPs, only 13 were found in common between both methods. We consider that only these 13 outlier SNPs had strong statistical support to be considered as  21 , and (b) Map of the sampling sites in this study. Maps were created for this study with the "marmap" package 73  potentially under positive selection. The remaining 1,207 SNPs were assumed to be neutral, although their neutrality could not be directly proven.
Lack of divergence between the NW Atlantic and the northern Gulf of Mexico. The main genetic descriptors obtained for all 1,220 lionfish SNPs are listed in Table 2. Most populations showed lower values of observed (Ho) than expected heterozygosity (He), which translated into positive values of the fixation index F IS , and significant deviation from HWE (Table 2). Genetic diversity values varied among populations: populations with only few individuals had the lowest diversity. For those populations with 10 or fewer individuals, the mean number of alleles per locus was lower than the potential maximum of 2 (see the curve of accumulative number of alleles related to sample size in Supplementary Material FS2). Our analyses did not detect significant differences in genetic diversity (assessed as Ho and He) between the NW Atlantic and the Gulf of Mexico (t-tests: t = 0.89 and 0.42, p-values = 0.39 and 0.68, respectively). The Maximum Likelihood (ML) tree reconstructed from 1,220 SNPs of 162 specimens from 13 locations did not show geographical clustering of specimens related to the sampling site and/or geographical area where they were collected (Supplementary Material FS3) and displayed very low bootstrap value support on most nodes.
In the Bayesian clustering analysis performed in Structure for all 1,220 SNPs, the optimal numbers of homogeneous genetic clusters (K) for the whole data set 33 were three and five (K = 3 and K = 5) according to the ad hoc statistic ΔK (see values of Delta K-ΔKin Supplementary Material FS4), but the Log likelihood for K (LK) did not significantly increased from 1 to 5 suggesting lack of spatial genetic clustering (see LK in Supplementary Material FS4). The individual-based cluster memberships from K = 2 to K = 8 (see Supplementary Material FS4) showed no spatial genetic heterogeneity among sampling sites and mixed membership of individuals to all clusters, supporting the hypothesis of panmixia across the Florida Strait (see Fig. 3 for K = 3 and K = 5, and Supplementary Material FS4 for all K values). Only slight differences in terms of a higher probability of one specific cluster (the blue cluster) could be observed in two sites located at south Florida, BNP and Isl (Fig. 3). Genetic admixture across the invasive range was also observed when 1,207 neutral SNPs and 13 outlier SNPs were separately analysed. In Fig. 3, we compare Structure results for K = 5 from the three different datasets.  The Analyses of Molecular Variance (AMOVA) for all 1,220 SNPs, 1,207 neutral SNPs, and 13 outlier SNPs also did not detect genetic differentiation associated with the NW Atlantic and Gulf of Mexico ("Among groups"), regardless of whether the two populations from the Florida Keys (Dry Tortugas-DT and Pulley Ridge-PR) were pooled or removed from the analyses, and most genetic variation was retained within individuals (Table 3). No differences were observed from AMOVA results between the datasets including all 1,220 SNPs and only 1,207 neutral SNPs (see Table 3 and Supplementary Material TS1). Nevertheless, from the 13 outlier SNPs, we detected significant differences at two additional variance components: between populations within the NW Atlantic and Gulf of Mexico, and among individuals within populations, but still most genetic variation was retained within individuals (Table 3). F ST statistics gave us more details of the pairwise differences between populations within the NW Atlantic and the Gulf of Mexico. The F ST statistics were in general very low, and only two pairwise comparisons were significant (between AL and BNP, and Isl) after FDR correction of the p-values when all 1,220 SNPs (Table 4) or only 1,207 neutral SNPs (data not shown) were included in the analyses. F ST distances from the 13 outlier SNPs revealed significant genetic differentiation between other sites: Ap and FG seemed to be the most   genetically divergent sites (see Table 4), but this genetic divergence was not fully supported by Bayesian clustering analyses (see previous results from Structure) or discriminant analyses of principal components (see explanation below). The discriminant analyses of principal components (DAPC) also showed a general pattern of low genetic differentiation, as that observed from previous analyses. According to the Bayesian Information Criterion (BIC) that compares different DAPC clustering solutions, two clusters were the optimal number to describe our data. The DAPC plot, including all sampling sites and all 1,220 SNPs, showed no clear separation of populations or clusters between the NW Atlantic and Gulf of Mexico (Fig. 4a). Only FG seemed lightly isolated from all the other sampling sites. This pattern was maintained when only 1,207 neutral SNPs were included in the DAPC analysis (Fig. 4b). When 13 outlier SNPs were separately analysed, the divergence between FG and all the other populations decreased (Fig. 4c).

Discussion
Our genomic data of 1,220 SNPs from 162 P. volitans specimens across the NW Atlantic and the northern Gulf of Mexico represents the first study using nuclear loci to explore the genetic structure of this invasive predator and is among the few studies applying genome-wide scanning, based on Next Generation Sequencing technologies, to investigate population structure of a marine invader (see a review in ref. 34 , and examples in refs 35,36 ). Although 18 nuclear microsatellite loci were isolated for P. volitans and P. miles a few years ago 37 , to our knowledge, those markers have not yet been used for population analyses.
Our fine-scale population genomic analyses of lionfish demonstrate lack of a current genetic break between the first and the last invaded areas in US waters: the NW Atlantic and the Gulf of Mexico, respectively. The Bayesian clustering analysis and DAPC showed different clustering solutions due to a lack of clear genetic differentiation across the whole analysed area. From all 1,220 SNPs and 1,207 neutral SNPs, we only noticed significant genetic differences between three sites based on F ST distances; the 13 outlier SNPs, FG and Ap showed significant  differences with five additional sites, but that divergence was not mirrored by other analyses (e.g. Structure and DAPC). Only FG seemed to be genetically divergent for most analyses and databases. However, this location only includes four individuals, a sample size that cannot be considered representative of the genetic diversity and structure of this location, as demonstrated by the low number of accumulated alleles within these four individuals (see Table 2). Hence, independently of the database used (all 1,220 SNPs, 1,207 neutral SNPs, or 13 outlier SNPs), results show a picture of general genetic homogeneity with enough gene flow between the NW Atlantic and the Gulf of Mexico to erase signals of secondary spreading detected from d-loop analyses in previous years and studies 23 . Whether current gene flow occurs directly across the Florida Strait or indirectly throughout the Caribbean Sea cannot be assessed by the presented data, and further analyses including samples from the Caribbean Sea are necessary to investigate connectivity routes. Our findings, therefore, contrast with the discontinuity between the NW Atlantic and the Gulf of Mexico and the lower genetic diversity of the Gulf of Mexico observed in previous studies based on d-loop data 16,20,21,27 . Different mitochondrial and nuclear DNA patterns (mito-nuclear discordance), such as those noticed here for lionfish, are becoming more commonly reported as the number of nuclear multilocus datasets increases (see examples in refs 38,39 ) and can be explained by several non-exclusive causes 40 . Demographic asymmetry due to sex-biased dispersal can cause mito-nuclear discordance in motile animals, including marine fish species 40,41 . Although recent findings suggest that lionfish adults move more than initially thought 42 , they are not migratory, and buoyant eggs are the dispersal stage 11 , so different migratory behaviour between males and females cannot explain the pattern we found, and other hypotheses should be considered, including temporal genetic shifts and selection.
A plausible cause of discordance between lionfish studies using different markers is temporal changes in the genetic structure over the invasion process. Population genetics theory anticipates fast genetic changes in introduced populations characterized by bottlenecks, founder effects, strong genetic drift, and new selective pressures in the introduced environments 43-45 . Despite the importance of temporal genetic trends for the invasion dynamics, this point is overlooked in most studies of marine invaders, and it is assumed that genetic diversity remains stable over time 34 . The few studies investigating temporal trends of genetic structure in introduced marine species showed variable outcomes [46][47][48][49][50][51] . Whereas some invasive ascidians suffering massive seasonal die-off events maintained stable levels of genetic diversity and homogeneous structure over time due to the re-establishment of populations from the survivors or recolonization from nearby sites 48,51 , other introduced species exhibited changes in genetic architecture over short time periods. For instance, in the introduced colonial ascidian, Perophora japonica, a genetically isolated population from Plymouth (South England) displayed a linear reduction in mitochondrial genetic diversity and large haplotype frequency changes over a 9 year-monitoring period, due to either genetic drift and/or selection 46 . Rapid allele frequencies changes over time, high heterozygous deficiency, and inbreeding were also detected in isolated populations of the colonial ascidian Botryllus along the coast of Israel 49 , but genetic isolation was not always associated with genetic diversity loss in this species. Invasive Botryllus populations along the Californian coast, isolated from other genetic sources and highly influenced by genetic drift and selection, maintained stable levels of genetic diversity thanks to high mutation rates generating a complex pattern of allele gains and losses 47 . Nevertheless, marine invasive populations are, in many cases, characterised by high levels of genetic diversity due to multiple introductions from genetically distinct sources 28 . An outstanding example of multiple cryptic introductions and genetic admixture within the invaded range, which might be related to the high invasion success, is that of the European green crab, currently one of the most important aquatic invaders established across all temperate shores around the world 30,36 .
In lionfish, d-loop data revealed strong bottlenecks and scientists discarded the idea of multiple introductions into the NW Atlantic from the native range 16,20 , which would result in a small initial effective population size. Moreover, changes in genetic structure are expected to occur faster in mitochondrial than nuclear DNA because mitochondrial DNA, a haploid, maternally inherited molecule, has an effective population size of one-quarter that of nuclear DNA and therefore is more sensitive to diversity changes associated with genetic drift. For this reason, the comparison of the NW Atlantic d-loop data (collected between 2007 and 2009) and the Gulf of Mexico (collected between 2011 and 2013) 16,20,21,27 should be taken with caution since it assumes that NW Atlantic populations remained static over a six-year period with no changes in haplotype frequencies due simply to genetic drift. This assumption of temporal stability might obscure the most recent pattern of genetic diversity in lionfish. In this sense, the analyses presented here, based on 1,220 nuclear SNPs from samples collected during a brief period of 20 months seems to be a more reliable way to determine lionfish's current genetic structure.
Besides neutral temporal trends, differential selection can also promote discrepancy patterns between mitochondrial and nuclear DNA. Mitochondrial selection under divergent environmental conditions plays an important role for the distribution of mitochondrial variants in other marine fish species 52,53 . Different environmental pressures between the NW Atlantic and the Gulf of Mexico could also have favoured some lionfish mitochondrial haplotypes over others, thus shaping the spatial distribution of haplotypes during the invasion process. However, the sampling scheme used in our study and the lack of new mitochondrial sequences do not allow mitochondrial selection and/or adaptation hypotheses to be tested. None of the 1,220 tags containing the SNPs were identified as mitochondrial fragments, and the different analyses performed did not reveal evidence of local adaptation and/ or nuclear selection across the lionfish's invasive range. In some marine species with long-dispersal potential, outlier SNPs unravelled significantly finer genetic structure than neutral markers, suggesting the existence of local adaptation 36,[54][55][56][57] . For example, in the European hake, Atlantic and Mediterranean populations showed sharper divergence in analyses using outlier SNPs than neutral SNPs 54 , a pattern of higher resolution that was also found in other fish species at small geographical scales of a few hundred kilometres when analysing outlier SNPs 55,56 . In some marine invaders, local adaptation also seemed to play an important role in shaping populations' genetic structure, showing either latitudinal clines in outlier allele frequencies 36 or significant correlation with environmental variables such as salinity and water temperature 57 . Nevertheless, in lionfish, differential selection between Scientific RepORts | (2018) 8:5062 | DOI:10.1038/s41598-018-23339-w the NW Atlantic and the Gulf of Mexico, and mito-nuclear interactions remains as an open question because although we did not find strong evidence of local adaptation, 1,220 SNPs still represent a small proportion of the species' genome, and selection on non-explored genomic areas could be possible.
The SNP data here presented yield valuable information of the genetic trend in the lionfish invasion, which could potentially be affected by genetic changes over time and across space, although other hypotheses such as different selective pressures between mitochondrial and nuclear DNA cannot be completely discarded. Additionally, we perceive some limitations in our study that should be taken in consideration for further investigations. For instance, we noticed that population analyses based on SNPs should include: sizes over 10 individuals to retain the potential maximum genetic diversity within populations, representative populations from the native range to shed light on the current impact of bottlenecks in genetic diversity across the whole invasive range, and populations from the Caribbean Sea to clarify the most important connectivity routes within the invaded area.
As demonstrated by previous publications based on mitochondrial DNA, which detected strong bottlenecks during the first introduction steps and invasion progression 16,20,21,27,28 , the lionfish invasion is an example of how reductions in genetic diversity do not necessarily compromise population establishment and spreading 16,20,21,27,28 and points out the importance of primary (pre-border) and secondary (post-border) introductions, e.g., secondary introductions to the Caribbean Sea and later to the Gulf of Mexico 16,20,21,27 . The potential of lionfish to overcome these initial steps of the invasion with low genetic diversity at the mitochondrial DNA 16,20,21,27 , and to homogenize nuclear genetic structure across the invaded area (as shown in this study with SNPs), should be considered when developing theoretical models on the expected geographical spreading of this invasion 58 and implementing appropriate strategies for its management and control.
Finally, the lionfish invasion to the Wider Caribbean can be used as a lesson to anticipate the genetic trend and potential impacts of P. miles invasion in the Mediterranean Sea. As P. volitans across the Wider Caribbean, P. miles has quickly colonized wide areas of the eastern Mediterranean 59 , which adds an additionally threaten in a small sea that is at the same time a hotspot of marine biodiversity and one of the world's most impacted seas.  Table 1 and Fig. 1. Collections were often opportunistic by SCUBA divers, so collection depths could not always be recorded. Fin or gill clips were obtained from the collected specimens and preserved in absolute ethanol, frozen at −20 °C or stored in 320 µl of chaotropic buffer (4.5 M guanadinium thiocynate, 2% N-lauroylsarcosine, 50 mM EDTA, 25 mM Tris-HCL pH 7.5, 0.2% antifoam, 0.1 M β-mercaptoethanol) (see Table 1).

Ethics Statement.
No endangered or protected species were involved in this study. Lionfish were sampled opportunistically by the authors from lionfish derbies or state and federal collections (as stated below); only dead lionfish were obtained. Lionfish were collected by a number of organizations in areas open to fishing with a spear or permitted by methods utilized. These fish were collected as a result of other activities such as tournaments, commercial harvest, and general fisheries surveys, and were sampled opportunistically for this study. No permits were required to collect lionfish beyond a state saltwater fishing license, which was in possession of divers at each collection. In the case of lionfish collected from offshore Florida, no fishing license is required. The University of Miami Institutional Animal Care and Use Committee (IACUC) did not require a protocol for this study since only dead specimens were donated to the University. State and Federal government organizations, although exempt from IACUC requirements, follow best practices to minimize pain and suffering of specimens. These are approved Institutional Animal Care and Use Committee protocols via the American Veterinary Medical Association Guidelines for the Euthanasia of Animals and the American Society of Ichthyologists and Herpetologists Guidelines for Use of Fish in Research.

Library construction and SNP isolation.
Genomic DNA was extracted from tissue clips using silica columns. DNA quality was assessed via agarose gel electrophoresis, and DNA concentrations were quantified using Biotium AccuBlue TM Broad Range dsDNA Quantitative Solution according to the manufacturer's instructions. After quantification, 100 ng of DNA from each sample was dried down in 96-well plates in a SpeedVac concentrator. Samples were then rehydrated overnight with 5 µl of ultrapure milliQ water before further processing.
Genotyping by Sequencing libraries were constructed using the restriction enzyme ApeKI. A total of 50 ng of genomic DNA per sample was digested at 75 °C for 2 hours. Unique barcoded adapters were used for library construction as described in 60 . A total of 229 DNA samples were pooled together and fragments approximately 300 bp in length were selected with magnetic beads. Primers complementary to the adapters were then used for library amplification 60 . Before sequencing, library quality was checked in an Agilent 2100 Bioanalyzer. The GBS library including the 229 individuals was sequenced in two lanes of an Illumina Hi Seq. 2500 using 75 bp single end reads at Elim Biopharmaceuticals, Inc. Hayward, CA.
The UNEAK GBS analysis pipeline in TASSEL 32 for species without a reference genome was used to call SNPs using Bowtie 61 . The software identifies SNPs found on single non-overlapping "tags" (64 bp sequences) initiated at the restriction sites. Only SNPs that had a minimum of five reads across all samples were retained to reduce the impact of sequencing errors. Loci with significant linkage disequilibrium (D' p-value False Discovery Rate correction-FDR-adjusted to 0.01) identified in TASSEL and those with significantly greater observed than expected heterozygosity (p-value < 0.01) were removed from the database before performing further analyses. SNPs were then filtered to select individuals with at least 75% of the called loci and loci that were present in at Scientific RepORts | (2018) 8:5062 | DOI:10.1038/s41598-018-23339-w least 84% of individuals. Maximum heterozygosity during filtering was set at 0.5 to avoid excess of heterozygotes due to sequencing errors. All sequences containing selected SNPs were blasted (e-value < 10 −5 ) against the mitochondrial DNA of Salmo salar and the Genbank database to identify mitochondrial fragments. Detection of outlier SNPs. Two different software programs, Arlequin 62 and Lositan 63 , were used to identify non-neutral SNPs, as candidate markers under selection, based on an F ST -outlier detection method and coalescence simulations. Arlequin uses simulations based on observed heterozygosity (Ho) to create a null distribution of F ST values and associated p-values for each locus. We performed a total of 20,000 simulations, with 100 demes, under a finite island model. This model was chosen due to the general lack of genetic structure (see Results). FDR correction of the p-values was applied to detect significant outliers; we also considered a more conservative approach with significance at p < 0.01 since strong corrections can increase type II error thereby assuming neutrality in SNPs that are not neutral 55 (although both approaches showed similar results). Lositan, on the other hand, creates a distribution based on the relation between F ST values and expected heterozygosity (He). We performed a first run using all loci to estimate mean F ST values with 20,000 simulations, 99% confidence interval, infinite alleles mutation model and false discovery rate of 0.1%. After the first run, loci in the confidence interval were removed, and "neutral" F ST values were recalculated. A third run was finally performed using all loci, and the neutral F ST values previously calculated were implemented to detect outliers. Finally, outliers recovered from both software programs, Arlequin and Lositan, were considered as candidate SNPs under selection.
Genetic structure analyses. General descriptors of genetic diversity as mean number of alleles, observed heterozygosity (Ho), expected heterozygosity (He), fixation index F IS , and the Hardy Weinberg Equilibrium were calculated for all markers per population using Arlequin 3.5.1.2 62 and the "adegenet" package in R 64 .
A Maximum Likelihood (ML) tree, including all genotypes obtained, was reconstructed in RAxML with a GTR+ G model and 100 rapid bootstrap replicates 65 to explore potential clustering of individuals related to different geographical areas and/or sampling sites. The ML tree was then visualized and edited in Figtree 1.4.0 (http:// tree.bio.ed.ac.uk/software/figtree/).
A Bayesian clustering analysis, performed with the software Structure 2.3.4 66 , was used to investigate the optimal number of major homogeneous genetic clusters (K) found within our datasets under the null hypothesis of genetic homogeneity. Because Bayesian analysis can be computationally very intense and long, an initial fast run was performed with a K from 1 to 13 with five independent replicates, 20,000 Markov chain Monte Carlo (MCMC) per replicate, and a 2,000 burn-in period to get a general idea about the maximum number of clusters expected. Then, a definitive run was performed with K from 1 to 8 with five independent replicates, 100,000 MCMC per replicate, and a 10,000 burn-in period. We used an "admixture model" and correlated gene frequencies as implemented in Structure. The five independent runs were averaged using the clumpak server 67 (http:// clumpak.tau.ac.il). The K value was determined by comparing the rate of change in the likelihood of K, using the ad hoc statistic ΔK in Structure Harvester 0.6.94 68 .
Analyses of Molecular Variance (AMOVA), based on allele frequencies, were performed to specifically explore the potential genetic break between the two genetically different regions previously identified from mitochondrial DNA, the NW Atlantic and the Gulf of Mexico. The locations of PR and DT, at the Florida Strait, are rich mesophotic reefs and part of the Florida Keys reef complex but far inside the Gulf of Mexico. Since the genetic break between the NW Atlantic and the Gulf of Mexico shifts at different points of the Florida Strait depending on the species 69,70 , we could not a priori assign these two sites (PR and DT) to one or the other area. Therefore, we performed three different AMOVA analyses: the first analysis included PR and DT within the NW Atlantic pool, the second included them within the Gulf of Mexico pool, and the third one excluded these two sites from the analysis. After testing differences between major marine areas, pairwise F ST distances based on allele frequencies between all sampling sites were calculated with the same software. The significance of AMOVA and F ST values was assessed after 50,000 non-parametric permutations of individuals among populations and/or populations between geographical areas and under the null hypothesis of genetic homogeneity. FDR correction of these p-values was applied for F ST multiple testing 71 .
Significant difference in genetic diversity (Ho and He) between the two marine regions, the NW Atlantic and Florida Keys (CW, CC, FP, BNP, Isl, PR and DT, see Results section), and the Gulf of Mexico (TB, Ap, AL, MS, FG, GT) was evaluated with a t-test.
Additionally, discriminant analyses of principal components (DAPC) 72 were computed for the complete dataset. DAPC does not assume any underlying population genetic model and is not as affected by Hardy Weinberg disequilibrium as other methods based on genetic distances (e.g. F ST and AMOVA) and Bayesian clustering analyses. We used collection sites as populations with the "adegenet" package in R 64 . DAPC extracts multivariate information from genetic datasets by first performing a principal component analysis (PCA) on predefined groups (collection sites in this case) and then using the PCA factors as variables for a discriminant analysis (DA), which seeks to maximize the inter-site component of variation. Thus. DAPC allows the visual identification of genetic clusters and can outperform more computer-intensive approaches, such as Structure, in detecting genetic structure 72 . Since the number of principal components (PCs) retained may have large impact on the DAPC output, the optimal number of PCs to be retained was first explored by the cross-validation method implemented by this package.
Scientific RepORts | (2018) 8:5062 | DOI:10.1038/s41598-018-23339-w To understand whether selection and/or local adaptation within the lionfish's invasive range is an important driver of the genetic structure, the searching strategy explained before for the Bayesian clustering analysis, AMOVA, F ST and DAPC was comparatively applied to three different SNP datasets: for all isolated SNPs, for neutral SNPs, and for candidate SNPs under selection (outliers).