Sustainable exploitation of living marine resources by fishery, aquaculture and biotechnology, and monitoring and predicting the effects of climate changes require an understanding of taxonomy and population biology. Populations are sustainably exploited if the removal of individuals does not reduce the ability of a population to reproduce and maintain its phenotypic and genetic diversity. Such populations have been defined for conservation purposes as “evolutionary significant units”1, and traditionally have been defined using genetic methods such as analyses of allozymes, nuclear DNA loci, microsatellites and mitochondrial DNA2 and knowledge of fish biology and morphology3. The management units are defined for reporting on stock assessment and catches by different countries. The issue of inconsistency between existing management units and population biology and genetic differentiation has been reported for some marine fish species4,5,6. Presently, population genetic analysis using classical genetic markers is being replaced by more detailed genomic analysis which provides qualitatively new information on stock differentiation and identification7,8,9,10,11,12,13,14. Specific methods include genome-wide genotyping using a large number of single nucleotide polymorphisms (SNPs) and next-generation sequencing (NGS). NGS for population analyses is a powerful tool that arguably provides the greatest insight into population genomics but can be expensive and demands significant data analysis. In contrast, genotype data for relatively many SNPs can be generated quickly for large numbers of individuals using genotyping arrays and raw data requires little pre-processing before it can be analyzed15. A large number of SNP loci in comparison with few genetic markers are better able to report subtle differences in genomic variation and their robustness is an advantage in evolutionary and population biology studies including exploited species with high dispersal potential in the oceans16,17,18. Additionally SNPs may affect protein function and expression levels directly and hence are subject to evolutionary selective forces19. Genotyping a significant number of SNP loci also provides an opportunity to identify ‘outliers’ (i.e. loci under selection20,21), which can be more informative markers in defining conservation units in comparison with neutral SNP markers22,23,24,25,26.

Compared to many other harvested and aquacultured fish species, Atlantic cod has been subject to extensive population genetics analysis4,27,28, that have provided insight into the links between genomics, biology and life-history. For example, studies using neutral molecular markers, such as microsatellites, have detected very weak differentiation between populations of cod29, whereas loci under selection from environmental conditions as temperature, salinity and depth, often display much stronger differentiation30,31,32. SNP analysis revealed diversity between eastern and western Atlantic and Baltic cod populations33,34,35,36, and using the same technology divergence within a genomic region between migrating and stationary ecotypes has been found despite high connectivity37,38,39,40,41. Genomic rearrangements (e.g. inversions of chromosome fragments) in cod populations differing in ecological behaviour, such as migration routes have been found42, and a subset of the SNP loci analyzed in this study has been reported as significantly correlated with temperature in North Atlantic cod populations43,44. It has been shown that despite mixing and migrations between the northeast Arctic cod and the Norwegian coastal cod populations, genomic islands undergo selection and reduced recombination, which promote divergence of these populations despite habitats overlap45. Finally, frequencies of some SNPs have been shown to be correlated with salinity in Baltic herring populations46.

The Baltic was transformed from a freshwater lake receiving melting ice waters into today’s brackish water sea body over 7000 years ago. It was colonized by a variety of marine species populations47,48, which could tolerate salinity as low as 5–7 ppt, including Atlantic cod. This species adapted to living in low salinity waters of the Baltic Sea despite requiring salinity above 12–14 ppt13,49,50,51,52,53 for successful spawning54,55. For the fisheries purposes, the Baltic cod is assessed and managed as western and eastern stocks, located in ICES Subdivisions 22–24 and 24–32 respectively56.

Significant transport of cod larvae from the North Sea to Skagerrak and Kattegat57,58 and mixing of western and eastern stocks of Baltic cod in the Arkona Basin (ICES subdivision 24) have been reported22,59,60,61,62. The dynamic changes of salinity and oxygenation in the Danish Straits and the open waters of Baltic Sea undoubtedly affect the condition and genetic divergence of the cod stocks in this area. Western populations of the Baltic cod may hybridize with the North Sea cod, as was suggested by Nielsen et al.29. Besides the important physiological differences between western and eastern Baltic cod such as haemoglobin polymorphisms63,64, genetic structure differences were reported at the level of population markers, e.g. microsatellites, Pan I locus, mtDNA29,65,66,67 and using SNP analysis62,68,69. Implications of population genetics structure in Baltic populations for management, have been summarized by Östman et al.6 and Wennerström et al.70,71.

Two main cod spawning areas in the Kattegat are situated along the Swedish coast. Western Baltic cod spawning areas include Sound, Kiel and Mecklenburg Bays and Arkona Basin56,72. Eastern Baltic cod spawning regions were Bornholm Basin, Słupsk Furrow, Gdańsk Deep and Gotland Basin in Southern Baltic. The main area of spawning is the Bornholm Basin where the eastern Baltic cod migrated every season62,73,74,75. Due to reduced inflow of North Sea water into the Baltic Proper and other changes in environmental and ecological conditions (e.g. oxygen deficiency, low nutrition, infestation with parasites, increased water temperature, size selective fishing), eastern Baltic cod underwent changes in the biology; slower growth rate and maturation at a smaller size76,77. The anoxic or hypoxic conditions in the Baltic Sea, exceptionally pronounced in recent years inflicted contraction of southern populations77, and limited the size of the reproductive volume of eastern Baltic cod54. In recognition of a serious threat to the eastern Baltic cod stock, fishing for cod in ICES subdivisions 24, 25 and 26 has been banned by the European Commission in 2019, and restricted beginning on 1st January 2020.

Genetic differentiation at functionally important genes between cod stocks in the north-western Atlantic have been related to local adaptations caused by differential selection pressure among spawning aggregations14. In the Baltic, differences in adaptation of western and eastern cod stocks to low salinity have been described13,52,69. SNP array has been used for the first time to characterise different stocks in the Baltic. Earlier population genetic studies conducted with this technique included only limited number of individuals or few sampling locations inside Baltic. However, to date no genetic differentiation within sub-stocks of western and eastern Baltic cod populations have been reported. The aim of this research was to characterize for the first time differences in structure of sub-stocks within eastern and western Baltic cod populations and the transition zone with the North Sea using a large number of SNP loci. The possible existence of sub-structuring of the eastern stock within a management unit is indicated.


Genetic diversity

In total 8076 SNPs that passed quality and informativity control were used to estimate the genetic diversity of cod populations from the Baltic and North Sea. The overall inbreeding coefficient FIS obtained by AMOVA was very low (−0.0032) and statistically insignificant (p = 0.62). The global FST across all nine populations was 0.0396 (p < 0.01), which indicates a moderate78 level of differentiation. Low, but statistically significant levels of differentiation were found between pairs of samples from North Sea (EGR), Kattegat (KAT) and west Baltic (SCH). There was no statistically significant differentiation for the pair from the Belt Sea (SCH) and Øresund (ORE) in western Baltic. While the greatest divergence, above 0.08, was observed between Scotland (MRF) and four populations from the eastern Baltic (GDN, BOR, LAT and LIT) (Fig. 1; Table 1). FST values between western samples (SCH, ORE, KAT and EGR) and samples from the eastern Baltic were at a similar level: from 0.0510 for the pair LIT-SCH to 0.0721 for the pair GDN-KAT (Table 1). An Neighbour Joining tree showing corrected FST distance was constructed within all nine cod populations from the 8076 SNP data set (Fig. 2). Western samples (KAT, MRF, EGR, SCH and ORE) were shown to form one branch of the tree, while four populations from the East Baltic (LAT, LIT, GDN, BOR) formed a separate clade; all clades had a high value of bootstrap reliability. The highest values of observed heterozygosity (Table 2) were in western samples (0.353 in MRF to 0.359 in EGR). The heterozygosity levels in the east Baltic samples (LAT, LIT, GDN and BOR) were lower and ranged from 0.332 in BOR to 0.337 in GDN. The vast majority of loci were in Hardy-Weinberg equilibrium (HWE) in all populations, the greatest fraction of SNPs with HWE departure (p < 0.05) were observed in populations from Egersund (EGR; 507 polymorphic sites) and the lowest in GDN and KAT (174 and 175 respectively) (Table 2).

Figure 1
figure 1

Pairwise FST values calculated according to the westernmost sample (MRF, Scotland) for all 8076 loci (dark grey) and 588 outlier loci (light grey). Black line represents observed heterozygosity (Ho) across West-East transect.

Table 1 Below diagonal: pairwise FST values based upon 8076 polymorphic SNPs in 9 sampled populations of cod, calculated in Arlequin.
Figure 2
figure 2

A neighbor-joining tree constructed using Nei’s distances among the nine cod populations. Bootstrap probabilities are shown on the tree.

Table 2 Genetic parameters of the 9 cod Gadus morhua sampled populations.

Genetic relationships among cod populations and possible genetic admixture was calculated using the Bayesian algorithm in STRUCTURE. When a full set of 8076 polymorphic SNPs was used for all nine cod samples, the most probable number of populations was 2 (ΔK = 1329.3), with four samples from the east Baltic (LAT, LIT, GDN, BOR) distinguishing themselves from the remaining populations. The FST calculations based on 588 outlier loci (Supplementary information Table S1) for 2 groups, including 5 western (North Sea, Kattegat west Baltic) and 4 eastern Baltic samples, increased and indicated high genetic differentiation (0.187, p < 0.001) (Table 3). FST pairwise comparisons between the East Baltic samples (LIT, LAT, GDN and BOR) remained non-significant, while FST for pair SCH-ORE became statistically significant. Pairwise FST values between remaining populations increased significantly and generally reproduced mapping of FST relations between samples.

Table 3 Below diagonal: pairwise FST values based upon 588 outliers SNPs in 9 sampled populations of cod, calculated in Arlequin.

The five samples from western Baltic, Kattegat and the North Sea were analyzed with a set of 175 outlier loci (Supplementary information Table S2). The maximum value of ΔK (279.4) was found for K = 2 and 2 clusters were identified capturing SCH + ORE + EGR, and KAT + MRF (Fig. 3), this distribution of samples does not coincide with their geographic origin. The sample from Kattegat was closely related to the Moray Firth sample while cod from the Egersund was grouping with samples from the Schlei (Belt Sea) and Øresund. The variation among groups was 10.04% while among individuals within populations only 0.14%. Pairwise differences were statistically significant (p < 0.001) and their value ranged from 0.012 for pair ORE – SCH to 0.162 for pair MRF – ORE. Pairwise differences between samples were similar to those observed in relations among the western samples based on outlier loci for all samples (Fig. 4a; Table 3). The four samples from the eastern Baltic stock (LAT, LIT, GDN, BOR) were analyzed with a set of 89 outlier loci (Supplementary information Table S3) and formed 4 clusters (ΔK = 90.98) (Fig. 5). Pairwise differences between samples correlated with geographic collocation of samples confirmed by Mantel test (Fig. 6). The lowest value of FST was observed among LAT (Latvia) and LIT (Lithuania) samples (0.05614, P < 0.000). The highest difference was noted for the pair BOR (Bornholm) – LAT (0.09358, P < 0.000) and intermediate values for other pairs of samples. Pairwise differences (FST) values suggest some difference between the BOR sample and the remaining samples from the Baltic Sea (Fig. 4b). Three eastern most samples from the Baltic Sea formed 3 clusters (ΔK = 221.30) in the Structure analysis with the 76 outliers (Supplementary Information Fig. S1; Table S4). The fixation index calculated for the 3 samples indicated moderate differentiation (FST = 0.07552, p < 0.000). Pairwise FST distances were generally close and reached 0.06912 for LAT – GDN pair, 0.07044 for LIT – GDN and 0.08387 between LAT – LIT.

Figure 3
figure 3

Estimated Gadus morhua population structure by STRUCTURE software based on 175 outlier loci in the samples from North Sea, Kattegat and West Baltic.

Figure 4
figure 4

Pairwise differences (FST) between: (a) samples from North Sea, Kattegat and West Baltic based on 175 outlier loci and (b) samples from eastern Baltic stock based on 89 outlier loci.

Figure 5
figure 5

Graph represents the STRUCTURE results for the Baltic dataset, based on 89 outlier loci and best K = 4.

Figure 6
figure 6

Relationships between geographical and genetic distance FST (upper graph), bottom salinity and geographical distance (middle graph), genetic distance and bottom salinity (lower graph). P values from top: 0.001, 0.001 and 0.01.

Genetic distance and assignment test

Principal coordinates analysis (PcoA) performed for the full marker set (8076 SNPs) showed low values of percentage of variation between axes, however the potential clades are well separated (Fig. 7). Analysis performed for 588 outlier loci revealed higher genetic variation of the 1st axis and low differentiation on the 2nd and 3rd axes. For the western samples PcoA revealed the highest variation on the 1st axis while lower on the 2nd, and, on the 3rd axis only 3.70%. In the subset of eastern Baltic populations values were lower and for the 1st axis reached 6.84% and respectively 5.34% and 4.79% of variability (Fig. 7).

Figure 7
figure 7

Principal Coordinates Analysis (PCoA) 2D plots imaging variation between all and outlier loci calculated for: (a) all samples with full set 8076 and (b) 588 outlier SNPs, (c) western samples - 175 outlier SNPs, (d) eastern Baltic stock samples - 89 outlier SNPs.

To determine the most likely origin of all 240 cod individuals, assignment tests were conducted with the allele frequency based method that allowed the identification of potential migrants and estimated sample heterogeneity. In the west group of samples, the frequency of self-assignments varied from 30% for the KAT sample to 100% for the ORE. In the eastern group, values were much lower and ranged from 5% for BOR to 41% in LAT. Generally, only 48% of individuals were assigned to the population they were collected from with mean for the western group at 68% and 24% for the eastern. No genotypes of the individuals from the east Baltic group were represented in the Danish Straits/North Sea group and vice versa (Table 4). The Mantel test was significant for all applied comparisons with p values 0.001 for FST vs. geographic distance and geographic distance vs. bottom salinity and p = 0.01 for FST vs. bottom salinity (Fig. 6). Analysis of pairwise linkage disequilibrium (LD) based on 588 outlier loci for all investigated populations show 11 114 highly significant pairwise LDs. From that number 60.4% were inter-chromosomal LDs and respectively 39.6% intra-chromosomal. The largest block was revealed on linkage group LG2 and covered 22% of all LDs (Fig. 8). The smaller blocks were primarily on LG3, LG4 and LG5 and covered 9.6, 8.6 and 8.5 percent of all LDs. The share of found LDs decreased then to chromosome 23, where their share was only 1%, however they occurred on every LG. Calculations performed on the western dataset only for 175 detected outliers showed 389 highly significant LDs. The majority of them constituted intra-chromosomal LDs, 78.4%. The major and largest blocks were located on LG2 and 69% of all LDs belonged to him (Fig. 9). Next analysis, based on Baltic dataset and 89 outliers showed only two LDs, one inter and one intra-chromosomal, both related with LG13 (Fig. 8). The last analysed dataset containing three easternmost Baltic samples calculated with 76 detected outliers also show only two highly significant LDs, however what is important they were inter-chromosomal and located on LG16. Detailed description of LDs detected for datasets based on 89 outliers (BOR, GDN, LIT, LAT) revealed that they concerned loci ss1712298167 vs. ss1712303712 both located on scaffold 07407 (LG13) and ss1712298916 (scaffold 08672, LG12) vs. ss1712298167. LDs detected for easternmost samples (GDN, LIT, LAT) occurred for loci ss1712298846 vs. ss1712298845 both located on scaffold 08549 and for loci ss1712299176 (scaffold 09117) vs. ss1712297964 (scaffold 07099). All of them were on LG16. The distribution of the outlier loci across LGs are displayed on Manhattan plots constructed for same datasets (Fig. 9). The presented patterns of outlier loci locations are congruent with distribution of the detected LDs. Outlier subsets with detailed positions and gene annotations have been presented in Supplementary information Table S5.

Table 4 Results of the assignment test performed for 9 populations with 8076 loci, computed using GeneClass software.
Figure 8
figure 8

Heat maps of pairwise linkage disequilibrium (LD) values (r2) throughout the Atlantic cod genome constructed for four outlier loci subsets. Markers were ordered on the x and y axes based on genomic location so that each cell of the heat map represents a single marker pair located on particular chromosome. The r2 values for each marker pair are on the upper half of the heat map. The p values of each r2 estimate are on the bottom half of the heat map.

Figure 9
figure 9

Manhattan plots of outlier analyses based on median log10(p value). The SNPs are distributed according to LG and their position within the LGs along the X axis124,125. The solid line is standard p value cutoff124.


The status of Atlantic cod in the Baltic Sea has been reported as an example of a geographically and genetically separated marginal subpopulation79,80. Populations of cod inhabiting the Baltic Sea have evolved differently from Atlantic populations as a consequence of isolation and bottlenecks, as well as selection on adaptive traits80. Nonetheless, partial genetic separation might have occurred before formation of the Baltic Sea61. By analyzing 3 allozyme loci Moth-Poulsen81 indicated a gradual transition/cline between North Sea and Baltic Sea cod, i.e. a potential intraspecific hybrid zone, this was later confirmed by the analysis of nine highly variable microsatellite loci29. Strong differentiation between the east and west Baltic stocks was indicated by SNPs61,62,69,82. The analysis of SNPs presented here showed a difference between west samples including North Sea, Kattegat and west Baltic Sea, and Baltic Proper (south-eastern) samples. This divergence was represented by a clear split between the analyzed 9 populations and clustering of samples from the west Baltic together with samples from North Sea, FST values reduced 10-fold and showed a lack of haplotypes shared with samples from the East Baltic Sea. In this study, the isolation-by-distance (IBD) between samples tested by Mantel test for all 8076 SNP loci was significant and correlation between genetic diversity and geographic distance and bottom salinity were detected. The PCoA results suggested that the main differentiating factor could be explained by variable salinity represented by the 1st axis what was further supported by results from outlier loci distribution and presence of different LDs associated with environmental factors. For outlier loci calculated for all 9 populations rapid change of maximum salinity level is the best explanation for the clear separation of groups from the North Sea and the Baltic Sea. Differences in salinity tolerance and subsequent low fitness of transplanted cod from the Baltic Sea and the Skagerrak/Kattegat83 and eastern (Gdańsk) and western (Kiel Bight) Baltic may be the result of genetically based adaptive differences between populations52, which potentially explain transcriptomic differences of G. morhua from the Baltic Sea that have also been observed84. Johannesson and André80 assumed that the cause of lost diversity of Atlantic cod was an efficient barrier to gene flow, which has evolved as a consequence of divergent selection on reproductive traits, such as egg buoyancy, sperm motility73,85 and different time of spawning season. Local adaptation of these traits can be manifested by selection evidence related with presence of outlier loci and their relations can be detected across genome by analysis of linkage disequilibrium4,86. The observed patterns of detected LD distribution in western dataset are congruent with earlier studies which indicated the presence of large LD region located on LG2 in cod from North Atlantic68,87. It was suggested that they are associated with salinity and oxygen level at spawning depth68. It is important that we did not observe significant outlier loci from LG2 in Baltic dataset. Significant LDs for Baltic samples were located on LG12 and LG13 for analysis with Bornholm cod and only on LG16 for easternmost cod. In first case, observed LDs concerned loci ss1712298167 described as associated with surface temperature and loci ss1712298916 located on important scaffold 08672 associated with many environmental correlation including surface and bottom salinity, oxygenation and temperature68. LDs detected in easternmost dataset also concerned outliers associated with bottom salinity (ss1712298846 and ss1712298845, scaffold 08549)68 but they were located on different LG and this may be related to the existence of adaptation to lowering salinity in the Baltic Sea from west to east. Furthermore, these loci were not indicated as outliers in western dataset and Bornholm cod.

The cod stocks in the North Sea and Kattegat were described as an indicator of the condition of Atlantic cod populations88,89. In present study with a large number of SNP loci, the Kattegat sample was closely related to the Scottish cod suggesting a high share of the North Sea cod. A low pairwise difference between North Sea and Kattegat samples has also been reported with microsatellites29, and may be explained by the significant transportation of cod larvae from the North Sea stocks into Kattegat90. High connectivity with offshore populations in Scandinavian fjords has been characterized recently using a large number of SNPs38. In the current study a sample of cod collected from the North Sea (Egersund fjord, Norway; EGR) was found to be slightly statistically different from the Kattegat and West Baltic, which coincides with presumably different local spawning areas in the western Baltic (Kattegat, Sound, Kiel and Mecklenburg Bays). However, analysis of the outlier loci for the group of samples from North Sea, Kattegat and western Baltic showed inconsistency between geographic origin and genetic distance of samples. The samples from the Egersund fjord (EGR) differed both from the Moray Firth and Kattegat samples. Genetic characteristics of the EGR samples could be potentially occurred due to a relatively closed coastal population breeding locally.

Low genetic distance among samples from EGR, Øresund (ORE), and Schlei fjord (SCH) suggests closer relationships with cod living under similar environmental conditions characterized by periodically reduced salinity91,92. Samples from EGR and SCH shared the same haplotypes (Table 4), which resulted in self-assignment of a significant percentage of individuals (40.74%) from Egersund fjord to the SCH sample. Despite high connectivity between populations caused by migration, cod populations could be characterized by adaptive differences influencing genetic differentiation. The association of genomic signatures and ecotypic divergence was noted by Hemmer-Hansen et al.37 and results presented here seem to support this conclusion.

The most divergent sample in the Baltic Proper is Bornholm (BOR), which was collected in July, after the spawning season, in order to avoid the migrants from other sub-locations. This sample, when tested with outliers, showed a relatively high value of FST, distinguishing this population from other Baltic samples (FST = 0.06–0.09, P < 0.000). The genetic structure analysis showed that eastern Baltic stock samples formed four close clusters. This supports the assumption that the analyzed samples included individuals representing local populations, not the migrants. Genetic variability in the Baltic Proper samples was much lower than among western samples including also the west part of the Baltic Sea. The values of FST were lower suggesting that some specimens from each sample share the most functional spawning area in the Bornholm Deep. This is reflected in the results of the assignment tests where no clean baselines were observed. This is also indicator of high gene flow and significant level of mixing within the stocks. Additionally, low self-assignment of samples from Bornholm area with high share of easternmost stock is another argument that eastern Baltic cod occur in SD 24 on west cost of Bornholm what was also clearly demonstrated in recently published study by Hemmer-Hansen et al. 201961.

Since the mid-1980s, successful spawning of eastern Baltic cod stock has been generally restricted to the Bornholm Basin72,93,94. The data here suggest that, thanks to the salt water inflows, spawning areas like Gdańsk Deep might have retained limited functionality in supporting divergence between local subpopulations95,96. The salinity factor seems to support more the divergence between west and east Baltic Sea than local divergence57. The potential spawning area is also determined by oxygen availability54. Genetically divergent but geographically close subpopulations have been identified, for instance, in Icelandic waters97, in the North Sea88, and along the Skagerrak coast57,98,99. For such differentiation to be preserved - even for small genetic differences - reproductive isolation is implied. In the Baltic Sea, local niches settled by cod are characterized by a unique set of environmental features like the diurnal and seasonal exchange of water masses100, vertical distribution of salinity and temperature101. These features and homing behaviour affect the genetic profile of the local subpopulation and maintain the distribution of genes/alleles responsible for local adaptations22,38,52,69,84,89.

Population genetic analyses facilitate detection of mixed stocks within the management units, and catch quotas are estimated on the assumption that a management unit includes only one stock. If more than one stock occurs in a management unit, the less abundant component becomes overfished and may collapse28. Studies of the relationship between population units and ICES subareas for North Sea Atlantic cod (Gadus morhua L.) have revealed that the genetically derived population units did not map accurately enough onto the existing cod management units35. One strategy for compensating for this situation is extending the spawning closure areas72. In Norway, along the coastal area, cod stocks were divided into two: north and south of 62o latitude. Finding genetic structures along the Norwegian coast line by sampling 55 locations and analysing microsatellites28 and the pantophysin (Pan I) locus102 provided strong evidence to support possible revision of cod management strategy. The resilience of cod populations in the Kattegat may also be different when considered on smaller spatial scales than those delineated by traditional stock management boundaries99. Cod populations in the Baltic and Danish Straits have been managed for fishery purposes as 3 stocks differing in morphometric and genetic structures: Kattegat (subdivision 21), western (22–24) and eastern (25–32) Baltic60,103,104,105. Similarly to other studies, here we observed variation at the SNP loci between both Baltic stocks. Genetic differentiation between samples from the western and eastern Baltic stocks has been indicated as a tool for separation of western and eastern Baltic cod in mixed stock occupied SD2461. In addition, using outlier SNPs, this study was able to demonstrate genetic differences among populations from subdivisions. Genetic differences revealed between GDN and LIT (subdivision 3d 26) samples of cod collected from the eastern Baltic stock were supported by statistical analyses. Despite strong mixing, possible hindrances in connectivity between the Gdańsk Deep and Gotland Basin can be considered as explanation for the observed spatial differentiation of the eastern Baltic cod stock.


This study demonstrated that genotyping with Norwegian cod SNP array constructed in CIGENE enables detection of genetic differentiation at a fine and local geographic scale in marine pelagic cod populations in the Baltic and adjacent waters. The sensitivity of the array towards identification of cod stocks can be enhanced by putting larger number of SNPs on the chip, including those polymorphic in the Baltic cod. Outlier SNPs are more informative markers in finding differences between Baltic cod populations in comparison with neutral SNPs. Here, with outlier SNPs, differentiation was identified between cod populations from subdivisions of existing management units in the Baltic. A tentative discrepancy between Lithuanian (LIT) and Polish (GDN) cod samples within one subdivision was also observed and can be related to possible isolation by environmental barriers between spawning areas in the Gdansk Deep and Gotland Basin. It is recommended to carry out further survey of Baltic cod populations using advanced genetic techniques on a larger number of specimens including larvae in order to further document the observed genetic pattern in the eastern Baltic cod population. Changes in time in genetic composition of Baltic cod stocks may be anticipated after periodic restrictions on fishing activities.

Materials and Methods

Sampling, DNA isolation and genotyping

A total of 240 cod individuals from 9 locations at 7 ICES (International Council for the Exploration of the Sea) subdivisions along a transect across the Baltic Sea, Kattegat and North Sea (Fig. 10, Table 5) were collected between October 2012 - August 2013. Fin clips were stored in 70% ethanol at −70 °C. Genomic DNA was isolated using the Qiagen DNeasy 96 blood and tissue kit according the manufacturer’s instructions and stored at −20 °C. The concentration of DNA was determined by UV-vis spectroscopy using an Epoch Microplate Spectrophotometer (BioTek Instruments, Inc., Winooski, USA). After normalization, samples were genotyped on a custom Gadus mohua SNP-array (Illumina, USA) containing 10,923 SNP assays, and developed by a Norwegian consortium composed of four research organisations: Norwegian University of Life Sciences (NMBU), University of Oslo (UiO), NOFIMA AS, and the Institute for Marine Research (IMR)38,68,69. Samples were processed according manufacturers instructions and genotypes obtained from Genome Studio (V2011.1). After filtering to remove poorly clustering SNPs (failing assays, multisite variants), a total of 8221 diploid SNPs remained. This data set was further trimmed to remove: SNPs with relatively a high missing data level (over 20%; n = 15), monomorphic SNPs (n = 32), and SNPs with minor allele frequencies (MAF) < 0.01 (n = 98). The final data set included genotypes from 8076 loci.

Figure 10
figure 10

Map showing sampling sites and ICES subdivisions. Samples locations and codes are detailed in Table 5. Thin lines show borders between ICES subdivisions.

Table 5 Numbers of examined cod specimens, sampling sites and ICES subdivisions in the Baltic Sea and North Sea.

All methods complied with EC Directive 2010/63/EU for animal experiments and were approved by the Local Ethics Committee on Animal Experimentation at Gdansk Medical University (decision no. 60/2012).

Statistical analysis

Allele frequencies and MAFs in each sample were calculated from spreadsheet data using Arlequin v. Genetic structure was analyzed using the program STRUCTURE v2.3.4107 which assigns individual genotypes to a specified number of groups, K, based on membership coefficients estimated from the genotype data. The analysis for 9 cod population samples was conducted from K = 1 to 12 using a burn-in period of 100,000 steps followed by 200,000 MCMC (Monte Carlo Markov Chain) replicates with 5 iterations, assuming an admixture model. The most probable number of clusters was defined by calculating the ΔK value108 determined by Structure Harvester109. Clumpp v.1.1.1110 was applied to average cluster membership using the Large K Greedy algorithm. Output from Clumpp was visualized in Distruct v.1.1111.

Arlequin v. was used to perform an Analysis of Molecular Variance (AMOVA) with number of permutations = 90,000. Variance among the cod populations was detected by STRUCTURE, among samples, among individuals within populations and within individuals. The differentiation was tested amongst pairwise fixation index FST estimates and inbreeding coefficient FIS estimates. The number of polymorphic loci and genetic diversity was calculated by measuring observed and expected heterozygosity (Ho and He) with p < 0.05 and with exact test using a Markov chain with chain length =1,000,000 and dememorization steps =200,000. To adjust P value for each pair in multiple tests, Bonferroni corrections were included. GenAlex 6.502 was applied to perform a principal coordinates analysis (PCoA)112,113. Assignment tests were conducted using GeneClass114 with the allele frequency-based method. This enabled the identification of potential migrants or their descendants115. Relationships among 9 cod populations were examined using Poptree2116 with neighbor-joining (NJ) method based on FST distance with sample size correction117 and the number of bootstrap replications at 1000. The Mantel test based on dissimilarity matrices118 was applied to investigate the significance of relationships between genetic distance, geographic distance and bottom salinity with 999 permutations used to test the statistical significance of the values in GenAlex 6.502. Results were cross validated in Arlequin Bottom salinity values were obtained from models GETM119, BALANCE120 and INSPIRE121. The hierarchical island model, implemented in Arlequin, was used to detect outlier loci. Loci as candidates under selection exhibited FST values out of the 99% quantile, based on coalescent simulations (50,000). Outlier loci were calculated with 50,000 simulations and number of demes at 100. Outlier loci were segregated and those with FST ≤ 0 or with FST > 0.01 were excluded. Separate structure investigation of outlier loci for west and east Baltic populations were carried out with increased burning (2,000,000) and MCMC (4,000,000). GenAlex was applied to perform a principal coordinates analysis (PCoA). Linkage disequilibrium (LD) was estimated for outlier loci by calculating the square value of correlation coefficient (r2) between pairs of markers122 using the TASSEL 5.2.58 software123. A threshold of r2 > 0.8 was considered to indicate LD. The level of LD was estimated for the entire panel and for the specific subgroups identified with STRUCTURE v2.3.4. Within these subgroups, LD was calculated considering only the detected panel of candidate outlier loci. The p-values for each r2 estimate were obtained with a two-tailed Fisher’s exact probability test and a threshold of p < 0.0001 was considered as significant. LD visualization was done by heat maps based on P values for pairwise r2 estimates to assess the overall view of LD patterns and evaluate LD blocks in various chromosomes at specific map locations. Additionally the distribution and clustering of detected outlier loci on linkage groups (LG) were indicated by Manhattan plots constructed for same subsets as for LD analysis using the R package “qqman”124. Homology searching was done through BLAST search of the available flanking sequences125 for each detected outlier loci on the NCBI and Ensembl public databases126. Functions of annotated outlier loci were determined using UniProt database. These analyses were done for three selected ouliers datasets specific for all Baltic, west Baltic and east Baltic (Supplementary Information Figure S1, Tables S1S5).