SNP genotyping reveals substructuring in weakly differentiated populations of Atlantic cod (Gadus morhua) from diverse environments in the Baltic Sea

Atlantic cod (Gadus morhua) is one of the most important fish species in northern Europe for several reasons including its predator status in marine ecosystems, its historical role in fisheries, its potential in aquaculture and its strong public profile. However, due to over-exploitation in the North Atlantic and changes in the ecosystem, many cod populations have been reduced in size and genetic diversity. Cod populations in the Baltic Proper, Kattegat and North Sea have been analyzed using a species specific single nucleotide polymorphism (SNP) array. Using a subset of 8,706 SNPs, moderate genetic differences were found between subdivisions in three traditionally delineated cod management stocks: Kattegat, western and eastern Baltic. However, an FST measure of population differentiation based on allele frequencies from 588 outlier loci for 2 population groups, one including 5 western and the other 4 eastern Baltic populations, indicated high genetic differentiation. In this paper, differentiation has been demonstrated not only between, but also within western and eastern Baltic cod stocks for the first time, with salinity appearing to be the most important environmental factor influencing the maintenance of cod population divergence between the western and eastern Baltic Sea.

significant levels of differentiation were found between pairs of samples from North Sea (EGR), Kattegat (KAT) and west Baltic (SCH). There was no statistically significant differentiation for the pair from the Belt Sea (SCH) and Øresund (ORE) in western Baltic. While the greatest divergence, above 0.08, was observed between Scotland (MRF) and four populations from the eastern Baltic (GDN, BOR, LAT and LIT) ( Fig. 1; Table 1). F ST values between western samples (SCH, ORE, KAT and EGR) and samples from the eastern Baltic were at a similar level: from 0.0510 for the pair LIT-SCH to 0.0721 for the pair GDN-KAT (Table 1). An Neighbour Joining tree showing corrected F ST distance was constructed within all nine cod populations from the 8076 SNP data set (Fig. 2). Western samples (KAT, MRF, EGR, SCH and ORE) were shown to form one branch of the tree, while four populations from the East Baltic (LAT, LIT, GDN, BOR) formed a separate clade; all clades had a high value of bootstrap reliability. The highest values of observed heterozygosity (Table 2) were in western samples (0.353 in MRF to 0.359 in EGR). The heterozygosity levels in the east Baltic samples (LAT, LIT, GDN and BOR) were lower and ranged from 0.332 in BOR to 0.337 in GDN. The vast majority of loci were in Hardy-Weinberg equilibrium (HWE) in all populations, the greatest fraction of SNPs with HWE departure (p < 0.05) were observed in populations from Egersund (EGR; 507 polymorphic sites) and the lowest in GDN and KAT (174 and 175 respectively) ( Table 2).
Genetic relationships among cod populations and possible genetic admixture was calculated using the Bayesian algorithm in STRUCTURE. When a full set of 8076 polymorphic SNPs was used for all nine cod samples, the most probable number of populations was 2 (ΔK = 1329.3), with four samples from the east Baltic (LAT, LIT, GDN, BOR) distinguishing themselves from the remaining populations. The F ST calculations based on 588 outlier loci (Supplementary information Table S1) for 2 groups, including 5 western (North Sea, Kattegat west Baltic) and 4 eastern Baltic samples, increased and indicated high genetic differentiation (0.187, p < 0.001) ( Table 3). F ST pairwise comparisons between the East Baltic samples (LIT, LAT, GDN and BOR) remained non-significant, while F ST for pair SCH-ORE became statistically significant. Pairwise F ST values between remaining populations increased significantly and generally reproduced mapping of F ST relations between samples.
The five samples from western Baltic, Kattegat and the North Sea were analyzed with a set of 175 outlier loci (Supplementary information Table S2). The maximum value of ΔK (279.4) was found for K = 2 and 2 clusters were identified capturing SCH + ORE + EGR, and KAT + MRF (Fig. 3), this distribution of samples does not coincide with their geographic origin. The sample from Kattegat was closely related to the Moray Firth sample while cod from the Egersund was grouping with samples from the Schlei (Belt Sea) and Øresund. The variation among groups was 10.04% while among individuals within populations only 0.14%. Pairwise differences were statistically significant (p < 0.001) and their value ranged from 0.012 for pair ORE -SCH to 0.162 for pair MRF -ORE. Pairwise differences between samples were similar to those observed in relations among the western     www.nature.com/scientificreports www.nature.com/scientificreports/ samples based on outlier loci for all samples ( Fig. 4a; Table 3). The four samples from the eastern Baltic stock (LAT, LIT, GDN, BOR) were analyzed with a set of 89 outlier loci (Supplementary information Table S3) and formed 4 clusters (ΔK = 90.98) (Fig. 5). Pairwise differences between samples correlated with geographic collocation of samples confirmed by Mantel test (Fig. 6). The lowest value of F ST was observed among LAT (Latvia) and LIT (Lithuania) samples (0.05614, P < 0.000). The highest difference was noted for the pair BOR (Bornholm) -LAT (0.09358, P < 0.000) and intermediate values for other pairs of samples. Pairwise differences (F ST ) values suggest some difference between the BOR sample and the remaining samples from the Baltic Sea (Fig. 4b). Three eastern most samples from the Baltic Sea formed 3 clusters (ΔK = 221.30) in the Structure analysis with the 76 outliers ( Supplementary Information Fig. S1; Table S4). The fixation index calculated for the 3 samples indicated moderate differentiation (F ST = 0.07552, p < 0.000). Pairwise F ST distances were generally close and reached 0.06912 for LAT -GDN pair, 0.07044 for LIT -GDN and 0.08387 between LAT -LIT.
Genetic distance and assignment test. Principal coordinates analysis (PcoA) performed for the full marker set (8076 SNPs) showed low values of percentage of variation between axes, however the potential clades are well separated (Fig. 7). Analysis performed for 588 outlier loci revealed higher genetic variation of the 1 st axis and low differentiation on the 2 nd and 3 rd axes. For the western samples PcoA revealed the highest variation on the 1 st axis while lower on the 2 nd , and, on the 3 rd axis only 3.70%. In the subset of eastern Baltic populations values were lower and for the 1 st axis reached 6.84% and respectively 5.34% and 4.79% of variability (Fig. 7).
To determine the most likely origin of all 240 cod individuals, assignment tests were conducted with the allele frequency based method that allowed the identification of potential migrants and estimated sample heterogeneity. In the west group of samples, the frequency of self-assignments varied from 30% for the KAT sample to 100% for the ORE. In the eastern group, values were much lower and ranged from 5% for BOR to 41% in LAT. Generally, only 48% of individuals were assigned to the population they were collected from with mean for the western group at 68% and 24% for the eastern. No genotypes of the individuals from the east Baltic group were represented in the Danish Straits/North Sea group and vice versa ( Table 4). The Mantel test was significant for all applied comparisons with p values 0.001 for F ST vs. geographic distance and geographic distance vs. bottom salinity and p = 0.01 for F ST vs. bottom salinity (Fig. 6). Analysis of pairwise linkage disequilibrium (LD) based on 588 outlier loci for all investigated populations show 11 114 highly significant pairwise LDs. From that number  www.nature.com/scientificreports www.nature.com/scientificreports/ 60.4% were inter-chromosomal LDs and respectively 39.6% intra-chromosomal. The largest block was revealed on linkage group LG2 and covered 22% of all LDs (Fig. 8). The smaller blocks were primarily on LG3, LG4 and LG5 and covered 9.6, 8.6 and 8.5 percent of all LDs. The share of found LDs decreased then to chromosome 23, where their share was only 1%, however they occurred on every LG. Calculations performed on the western dataset only for 175 detected outliers showed 389 highly significant LDs. The majority of them constituted intra-chromosomal LDs, 78.4%. The major and largest blocks were located on LG2 and 69% of all LDs belonged to him (Fig. 9). Next analysis, based on Baltic dataset and 89 outliers showed only two LDs, one inter and one intra-chromosomal, both related with LG13 (Fig. 8). The last analysed dataset containing three easternmost Baltic samples calculated with 76 detected outliers also show only two highly significant LDs, however what is important they were inter-chromosomal and located on LG16. Detailed description of LDs detected for datasets based on 89 outliers (BOR, GDN, LIT, LAT) revealed that they concerned loci ss1712298167 vs. ss1712303712 both located on scaffold 07407 (LG13) and ss1712298916 (scaffold 08672, LG12) vs. ss1712298167. LDs detected for easternmost samples (GDN, LIT, LAT) occurred for loci ss1712298846 vs. ss1712298845 both located on scaffold 08549 and for loci ss1712299176 (scaffold 09117) vs. ss1712297964 (scaffold 07099). All of them were on LG16. The distribution of the outlier loci across LGs are displayed on Manhattan plots constructed for same datasets (Fig. 9). The presented patterns of outlier loci locations are congruent with distribution of the detected LDs. Outlier subsets with detailed positions and gene annotations have been presented in Supplementary information Table S5.

Discussion
The status of Atlantic cod in the Baltic Sea has been reported as an example of a geographically and genetically separated marginal subpopulation 79,80 . Populations of cod inhabiting the Baltic Sea have evolved differently from Atlantic populations as a consequence of isolation and bottlenecks, as well as selection on adaptive traits 80 . www.nature.com/scientificreports www.nature.com/scientificreports/ Nonetheless, partial genetic separation might have occurred before formation of the Baltic Sea 61 . By analyzing 3 allozyme loci Moth-Poulsen 81 indicated a gradual transition/cline between North Sea and Baltic Sea cod, i.e. a potential intraspecific hybrid zone, this was later confirmed by the analysis of nine highly variable microsatellite loci 29 . Strong differentiation between the east and west Baltic stocks was indicated by SNPs 61,62,69,82 . The analysis of SNPs presented here showed a difference between west samples including North Sea, Kattegat and west Baltic Sea, and Baltic Proper (south-eastern) samples. This divergence was represented by a clear split between the analyzed 9 populations and clustering of samples from the west Baltic together with samples from North Sea, F ST values reduced 10-fold and showed a lack of haplotypes shared with samples from the East Baltic Sea. In this study, the isolation-by-distance (IBD) between samples tested by Mantel test for all 8076 SNP loci was significant and correlation between genetic diversity and geographic distance and bottom salinity were detected. The PCoA results suggested that the main differentiating factor could be explained by variable salinity represented by the 1 st axis what was further supported by results from outlier loci distribution and presence of different LDs associated with environmental factors. For outlier loci calculated for all 9 populations rapid change of maximum salinity level is the best explanation for the clear separation of groups from the North Sea and the Baltic Sea. Differences in salinity tolerance and subsequent low fitness of transplanted cod from the Baltic Sea and the Skagerrak/Kattegat 83 and eastern (Gdańsk) and western (Kiel Bight) Baltic may be the result of genetically based adaptive differences between populations 52 , which potentially explain transcriptomic differences of G. morhua from the Baltic Sea   Table 4. Results of the assignment test performed for 9 populations with 8076 loci, computed using GeneClass software. Individuals were assigned to the populations in which the genotype is most probable to occur. Values are given in percent. Self-assignment is indicated in bold. www.nature.com/scientificreports www.nature.com/scientificreports/ that have also been observed 84 . Johannesson and André 80 assumed that the cause of lost diversity of Atlantic cod was an efficient barrier to gene flow, which has evolved as a consequence of divergent selection on reproductive traits, such as egg buoyancy, sperm motility 73,85 and different time of spawning season. Local adaptation of these traits can be manifested by selection evidence related with presence of outlier loci and their relations can be detected across genome by analysis of linkage disequilibrium 4,86 . The observed patterns of detected LD distribution in western dataset are congruent with earlier studies which indicated the presence of large LD region located on LG2 in cod from North Atlantic 68,87 . It was suggested that they are associated with salinity and oxygen level at spawning depth 68 . It is important that we did not observe significant outlier loci from LG2 in Baltic dataset. Significant LDs for Baltic samples were located on LG12 and LG13 for analysis with Bornholm cod and only on LG16 for easternmost cod. In first case, observed LDs concerned loci ss1712298167 described as associated with surface temperature and loci ss1712298916 located on important scaffold 08672 associated with many environmental correlation including surface and bottom salinity, oxygenation and temperature 68 . LDs detected in easternmost dataset also concerned outliers associated with bottom salinity (ss1712298846 and ss1712298845, scaffold 08549) 68 but they were located on different LG and this may be related to the existence of adaptation to lowering salinity in the Baltic Sea from west to east. Furthermore, these loci were not indicated as outliers in western dataset and Bornholm cod.
The cod stocks in the North Sea and Kattegat were described as an indicator of the condition of Atlantic cod populations 88,89 . In present study with a large number of SNP loci, the Kattegat sample was closely related to the Scottish cod suggesting a high share of the North Sea cod. A low pairwise difference between North Sea and www.nature.com/scientificreports www.nature.com/scientificreports/ Kattegat samples has also been reported with microsatellites 29 , and may be explained by the significant transportation of cod larvae from the North Sea stocks into Kattegat 90 . High connectivity with offshore populations in Scandinavian fjords has been characterized recently using a large number of SNPs 38 . In the current study a sample of cod collected from the North Sea (Egersund fjord, Norway; EGR) was found to be slightly statistically different from the Kattegat and West Baltic, which coincides with presumably different local spawning areas in the western Baltic (Kattegat, Sound, Kiel and Mecklenburg Bays). However, analysis of the outlier loci for the group of samples from North Sea, Kattegat and western Baltic showed inconsistency between geographic origin and genetic distance of samples. The samples from the Egersund fjord (EGR) differed both from the Moray Firth and Kattegat samples. Genetic characteristics of the EGR samples could be potentially occurred due to a relatively closed coastal population breeding locally.
Low genetic distance among samples from EGR, Øresund (ORE), and Schlei fjord (SCH) suggests closer relationships with cod living under similar environmental conditions characterized by periodically reduced salinity 91,92 . Samples from EGR and SCH shared the same haplotypes (Table 4), which resulted in self-assignment of a significant percentage of individuals (40.74%) from Egersund fjord to the SCH sample. Despite high connectivity between populations caused by migration, cod populations could be characterized by adaptive differences influencing genetic differentiation. The association of genomic signatures and ecotypic divergence was noted by Hemmer-Hansen et al. 37 and results presented here seem to support this conclusion.
The most divergent sample in the Baltic Proper is Bornholm (BOR), which was collected in July, after the spawning season, in order to avoid the migrants from other sub-locations. This sample, when tested with outliers, showed a relatively high value of F ST , distinguishing this population from other Baltic samples (F ST = 0.06-0.09, P < 0.000). The genetic structure analysis showed that eastern Baltic stock samples formed four close clusters. This supports the assumption that the analyzed samples included individuals representing local populations, not the migrants. Genetic variability in the Baltic Proper samples was much lower than among western samples including also the west part of the Baltic Sea. The values of F ST were lower suggesting that some specimens from each sample share the most functional spawning area in the Bornholm Deep. This is reflected in the results of the assignment tests where no clean baselines were observed. This is also indicator of high gene flow and significant level of mixing within the stocks. Additionally, low self-assignment of samples from Bornholm area with high share of easternmost stock is another argument that eastern Baltic cod occur in SD 24 on west cost of Bornholm what was also clearly demonstrated in recently published study by Hemmer-Hansen et al. 2019 61 .
Since the mid-1980s, successful spawning of eastern Baltic cod stock has been generally restricted to the Bornholm Basin 72,93,94 . The data here suggest that, thanks to the salt water inflows, spawning areas like Gdańsk Deep might have retained limited functionality in supporting divergence between local subpopulations 95,96 . The salinity factor seems to support more the divergence between west and east Baltic Sea than local divergence 57 . The potential spawning area is also determined by oxygen availability 54 . Genetically divergent but geographically close subpopulations have been identified, for instance, in Icelandic waters 97 , in the North Sea 88 , and along the Skagerrak coast 57,98,99 . For such differentiation to be preserved -even for small genetic differences -reproductive isolation is implied. In the Baltic Sea, local niches settled by cod are characterized by a unique set of environmental features like the diurnal and seasonal exchange of water masses 100 , vertical distribution of salinity and www.nature.com/scientificreports www.nature.com/scientificreports/ temperature 101 . These features and homing behaviour affect the genetic profile of the local subpopulation and maintain the distribution of genes/alleles responsible for local adaptations 22,38,52,69,84,89. Population genetic analyses facilitate detection of mixed stocks within the management units, and catch quotas are estimated on the assumption that a management unit includes only one stock. If more than one stock occurs in a management unit, the less abundant component becomes overfished and may collapse 28 . Studies of the relationship between population units and ICES subareas for North Sea Atlantic cod (Gadus morhua L.) have revealed that the genetically derived population units did not map accurately enough onto the existing cod management units 35 . One strategy for compensating for this situation is extending the spawning closure areas 72 . In Norway, along the coastal area, cod stocks were divided into two: north and south of 62 o latitude. Finding genetic structures along the Norwegian coast line by sampling 55 locations and analysing microsatellites 28 and the pantophysin (Pan I) locus 102 provided strong evidence to support possible revision of cod management strategy. The resilience of cod populations in the Kattegat may also be different when considered on smaller spatial scales than those delineated by traditional stock management boundaries 99 . Cod populations in the Baltic and Danish Straits have been managed for fishery purposes as 3 stocks differing in morphometric and genetic structures: Kattegat (subdivision 21), western (22)(23)(24) and eastern (25-32) Baltic 60,[103][104][105] . Similarly to other studies, here we observed variation at the SNP loci between both Baltic stocks. Genetic differentiation between samples from the western and eastern Baltic stocks has been indicated as a tool for separation of western and eastern Baltic cod in mixed stock occupied SD24 61 . In addition, using outlier SNPs, this study was able to demonstrate genetic differences among populations from subdivisions. Genetic differences revealed between GDN and LIT (subdivision 3d 26) samples of cod collected from the eastern Baltic stock were supported by statistical analyses. Despite strong mixing, possible hindrances in connectivity between the Gdańsk Deep and Gotland Basin can be considered as explanation for the observed spatial differentiation of the eastern Baltic cod stock.

conclusions
This study demonstrated that genotyping with Norwegian cod SNP array constructed in CIGENE enables detection of genetic differentiation at a fine and local geographic scale in marine pelagic cod populations in the Baltic and adjacent waters. The sensitivity of the array towards identification of cod stocks can be enhanced by putting larger number of SNPs on the chip, including those polymorphic in the Baltic cod. Outlier SNPs are more informative markers in finding differences between Baltic cod populations in comparison with neutral SNPs. Here, with outlier SNPs, differentiation was identified between cod populations from subdivisions of existing management units in the Baltic. A tentative discrepancy between Lithuanian (LIT) and Polish (GDN) cod samples within one subdivision was also observed and can be related to possible isolation by environmental barriers between spawning areas in the Gdansk Deep and Gotland Basin. It is recommended to carry out further survey of Baltic cod populations using advanced genetic techniques on a larger number of specimens including larvae in order to further document the observed genetic pattern in the eastern Baltic cod population. Changes in time in genetic composition of Baltic cod stocks may be anticipated after periodic restrictions on fishing activities.

Materials and Methods
Sampling, DNA isolation and genotyping. A total of 240 cod individuals from 9 locations at 7 ICES (International Council for the Exploration of the Sea) subdivisions along a transect across the Baltic Sea, Kattegat and North Sea (Fig. 10, Table 5) were collected between October 2012 -August 2013. Fin clips were stored in 70% ethanol at −70 °C. Genomic DNA was isolated using the Qiagen DNeasy 96 blood and tissue kit according the manufacturer's instructions and stored at −20 °C. The concentration of DNA was determined by UV-vis spectroscopy using an Epoch Microplate Spectrophotometer (BioTek Instruments, Inc., Winooski, USA). After normalization, samples were genotyped on a custom Gadus mohua SNP-array (Illumina, USA) containing 10,923 SNP assays, and developed by a Norwegian consortium composed of four research organisations: Norwegian University of Life Sciences (NMBU), University of Oslo (UiO), NOFIMA AS, and the Institute for Marine Research (IMR) 38,68,69 . Samples were processed according manufacturers instructions and genotypes obtained from Genome Studio (V2011.1). After filtering to remove poorly clustering SNPs (failing assays, multisite variants), a total of 8221 diploid SNPs remained. This data set was further trimmed to remove: SNPs with relatively a high missing data level (over 20%; n = 15), monomorphic SNPs (n = 32), and SNPs with minor allele frequencies (MAF) < 0.01 (n = 98). The final data set included genotypes from 8076 loci.
All methods complied with EC Directive 2010/63/EU for animal experiments and were approved by the Local Ethics Committee on Animal Experimentation at Gdansk Medical University (decision no. 60/2012).

Statistical analysis.
Allele frequencies and MAFs in each sample were calculated from spreadsheet data using Arlequin v. 3.5.1.3 106 . Genetic structure was analyzed using the program STRUCTURE v2.3.4 107 which assigns individual genotypes to a specified number of groups, K, based on membership coefficients estimated from the genotype data. The analysis for 9 cod population samples was conducted from K = 1 to 12 using a burn-in period of 100,000 steps followed by 200,000 MCMC (Monte Carlo Markov Chain) replicates with 5 iterations, assuming an admixture model. The most probable number of clusters was defined by calculating the ΔK value 108  www.nature.com/scientificreports www.nature.com/scientificreports/ exact test using a Markov chain with chain length =1,000,000 and dememorization steps =200,000. To adjust P value for each pair in multiple tests, Bonferroni corrections were included. GenAlex 6.502 was applied to perform a principal coordinates analysis (PCoA) 112,113 . Assignment tests were conducted using GeneClass 114 with the allele frequency-based method. This enabled the identification of potential migrants or their descendants 115 . Relationships among 9 cod populations were examined using Poptree2 116 with neighbor-joining (NJ) method based on F ST distance with sample size correction 117 and the number of bootstrap replications at 1000. The Mantel test based on dissimilarity matrices 118 was applied to investigate the significance of relationships between genetic distance, geographic distance and bottom salinity with 999 permutations used to test the statistical significance of the values in GenAlex 6.502. Results were cross validated in Arlequin 3.5.1.3. Bottom salinity values were obtained from models GETM 119 , BALANCE 120 and INSPIRE 121 . The hierarchical island model, implemented in Arlequin, was used to detect outlier loci. Loci as candidates under selection exhibited F ST values out of the 99% quantile, based on coalescent simulations (50,000). Outlier loci were calculated with 50,000 simulations and number of demes at 100. Outlier loci were segregated and those with F ST ≤ 0 or with F ST > 0.01 were excluded. Separate structure investigation of outlier loci for west and east Baltic populations were carried out with increased burning (2,000,000) and MCMC (4,000,000). GenAlex was applied to perform a principal coordinates analysis (PCoA). Linkage disequilibrium (LD) was estimated for outlier loci by calculating the square value of correlation coefficient (r 2 ) between pairs of markers 122 using the TASSEL 5.2.58 software 123 . A threshold of r 2 > 0.8 was considered to indicate LD. The level of LD was estimated for the entire panel and for the specific subgroups identified with STRUCTURE v2.3.4. Within these subgroups, LD was calculated considering only the detected panel of candidate outlier loci. The p-values for each r 2 estimate were obtained with a two-tailed Fisher's exact probability test and a threshold of p < 0.0001 was considered as significant. LD visualization was done by heat maps based on P values for pairwise r 2 estimates to assess the overall view of LD patterns and evaluate LD blocks in various chromosomes at specific map locations. Additionally the distribution and clustering of detected outlier loci on linkage groups (LG) were indicated by Manhattan plots constructed for same subsets as for LD analysis   Table 5. Numbers of examined cod specimens, sampling sites and ICES subdivisions in the Baltic Sea and North Sea.