Abstract
Non-synonymous variation (NSV) of protein coding genes represents raw material for selection to improve adaptation to the diverse environmental scenarios in wild and livestock populations. Many aquatic species face variations in temperature, salinity and biological factors throughout their distribution range that is reflected by the presence of allelic clines or local adaptation. The turbot (Scophthalmus maximus) is a flatfish of great commercial value with a flourishing aquaculture which has promoted the development of genomic resources. In this study, we developed the first atlas of NSVs in the turbot genome by resequencing 10 individuals from Northeast Atlantic Ocean. More than 50,000 NSVs where detected in the ~ 21,500 coding genes of the turbot genome, and we selected 18 NSVs to be genotyped using a single Mass ARRAY multiplex on 13 wild populations and three turbot farms. We detected signals of divergent selection on several genes related to growth, circadian rhythms, osmoregulation and oxygen binding in the different scenarios evaluated. Furthermore, we explored the impact of NSVs identified on the 3D structure and functional relationship of the correspondent proteins. In summary, our study provides a strategy to identify NSVs in species with consistently annotated and assembled genomes to ascertain their role in adaptation.
Similar content being viewed by others
Introduction
Marine fish species are often distributed across a variety of habitats differing in environmental conditions, particularly water temperature, salinity, dissolved oxygen and light intensity, which in turn affect the distribution of infectious pathogens that, along with predation, compromise their viability. These environmental factors strongly influence somatic growth and reproduction, and furthermore, all of them represent energetically costly metabolic activities engaged by fish. Metabolism and evolution are closely connected1, and the many genes underlying these processes are targets of natural selection that may lead to adaptive divergence in organisms inhabiting heterogenous environments2,3,4,5,6,7. Knowledge about adaptive genetic variation and its spatial structuring is crucial for the sustainable management of wild fish resources, but also for improving production traits by selective breeding of economical important aquaculture species. Genomic sequencing of an increasing number of fish species is contributing to unravelling the broad genetic variation across genomes through identification of thousands of single nucleotide polymorphic sites (SNPs) and their possible association with various traits both in domestic and wild populations.
Turbot (Scophthalmus maximus; Scophthalmidae; Pleuronectiformes) is a flatfish widely distributed throughout the European coast in the Northeast Atlantic Ocean from Morocco to the Arctic Circle, including the Baltic Sea, and in the South across the Mediterranean Sea until the Black Sea8. The species experiences a diversity of physicochemical environments across its range with a north–south temperature cline from ≈7 °C up to ≈22 °C and with salinities spanning from ≈35 PSU in the North Atlantic Ocean to ≈2 PSU in the northern Baltic Sea9. Whereas the juveniles and adults are relatively sedentary, the pelagic larvae possess high dispersal potential mediated by oceanic currents and enhanced by the high fecundity of the species10. Genetic diversity and population structure of turbot has been investigated with microsatellites and SNPs, mostly in the North Atlantic Ocean11,12,13,14 and to a minor extent in the Southern area15,16 and adaptive variation, across its full distribution range, was recently assessed using a set of SNPs covering the whole genome17. Four main genetic regions: Baltic, Atlantic, Mediterranean and Black Sea, were identified using neutral variation. Consistent signals of divergent selection attributed to salinity and temperature, and stabilizing selection related to salinity, were detected across the turbot genome, including candidate genes at specific regions18. Moreover, the same set of markers was used to analyze genetic differentiation between wild and farm populations, as a baseline to evaluate the impact of restocking and farm escapees in the wild18. Besides the notable differentiation detected (wild vs farm FST ~ 0.060), signals of selection mostly attributed to growth and resistance to pathologies were detected at specific genomic regions including candidate genes.
Whereas wild turbot populations have declined over the last decades mainly due to overfishing, this delicious species has become the main flatfish farmed worldwide due to its high commercial value19. Intensive farming during six generations has been accompanied by a fast development of genomic resources to identify quantitative trait loci (QTL) and candidate genes associated with: growth19,20,21,22,23, temperature tolerance24, adaptation to salinity25, sex determination26,27 and resistance to various pathogens28,29. Furthermore, runs of homozygosity and genetic diversity across the turbot genome were analyzed to check for selective sweeps in farm and wild populations. This information was integrated with previously reported QTL-associated markers, candidate genes and outlier loci related to natural or artificial selection, and a robust framework on selection signatures across the turbot genome was obtained30. Furthermore, functional data on resistance to the main industrial pathogens obtained from the main immune organs have been comparatively assessed and integrated with previous signatures of selection across the turbot genome31. The broad information gathered in this species, both in wild and farm populations, make it a suitable candidate for assessing the relevance of the different sources of genetic variation on turbot adaptation to different scenarios.
Studying association between polymorphisms within candidate genes and traits of interest or environmental variables in populations or families with different genomic background is a convenient approach to validate their putative adaptive role on natural or domestic selective pressures23,32. The significant association can be taken as evidence that the gene is either directly involved in the control of the trait or in linkage disequilibrium with the responsible variant due to its vicinity. If non-synonymous variation is considered, association could eventually lead to the identification of the causative mutation as reported in various vertebrates, including teleost fish. In Atlantic salmon (Salmo salar), non-synonymous SNPs in two strong candidate genes coding for the epithelial cadherin and the NEDD-8 activating enzyme 1 (NAE1)33,34 have been suggested to be responsible for resistance to infectious pancreatic necrosis virus. Differences in spawning time associated with functionally different protein variants have been documented in Atlantic salmon vestigial-like protein 3 (VGLL3) and in herring (Clupea harengus) thyrotropin receptor (TSHR)35,36. A hemoglobin polymorphism in turbot was reported to be associated with differences in juvenile growth rates37,38 and the underlying amino acid substitution was predicted to influence the stability of the oxygen-binding protein39.
Advances in modelling three-dimensional (3D) protein structures together with the progressive enrichment on mutation databases are making feasible to approach the interpretation of non-synonymous variation in terms of protein function40,41,42. This information is essential to understand the evolutionary significance of non-synonymous variation associated with environmental variables43,44,45,46. In silico approaches for predicting the protein 3D structure directly from the sequence information play a key role in filling the gap between the numerous sequences available and the experimentally solved structures47,48. In the absence of sequence similarity with other sequences in the protein structure database (PDB), the modelling strategy can rely on threading and ab initio modelling49,50 or deep learning42,51 to predict protein structure.
The amount of genomic information on adaptive variation in wild and farmed turbot prompted us to ascertain the putative role of non-synonymous variants (NSV) of candidate genes on selection related to environmental variation in nature or associated with target traits in breeding programs of turbot aquaculture. Specifically, we committed to: (i) call NSV using resequencing data over the recently assembled chromosome-level turbot genome; (ii) filter the most consistent and relevant functional variants among the ~ 21,500 protein coding genes in the turbot genome; (iii) select NSV on candidate genes putatively related to osmoregulation, growth and disease resistance; (iv) identify signals of selection across the whole distribution range of the species and farms; and (v) to validate functional differences of the most consistent variants using 3D structural protein modelling. Our results provide a broad map of NSV across the turbot genome and support the role of several candidate variants on adaptation to osmotic changes or growth in wild and domestic populations.
Materials and methods
Calling non-synonymous variation in the turbot genome
DNA from ten adults (five males and five females) of commercial size (1.5 kg) coming from the breeding program of a turbot company were re-sequenced using 150 bp PE reads on an Illumina NovaSeq 6000 System to 20 × coverage and individually aligned against the turbot reference genome (GCA_013347765.1)27 to screen for SNP variation. Individuals were previously checked for parentage using a set of 9 microsatellites to choose unrelated individuals52 representative of the genetic diversity of the broodstock. The origin of the founders was the NE Atlantic Ocean, and this population has been selectively bred for five generations with the support of the microsatellite tool mentioned above to avoid inbreeding while retaining as much genetic diversity as possible. Quality filtering and removal of residual adaptor sequences was conducted on read pairs using Fastp v.0.20.053; then, filtered reads were mapped with the Burrows-Wheeler aligner v.0.7.8 BWA-MEM algorithm54 against the turbot genome and SNPs and indels were called using bcftools v1.55, discarding those aligned reads with a mapping quality (MAPQ) < 30 and those SNPs with a Phred quality score < 30. Variants were annotated using SNPeff v5.156 taking as reference the updated turbot chromosome-level genome assembly (GCA_013347765.127).
Filtering of NSV: reliability and functional information
The thousands of NSV detected were filtered following functional, technical and population genetics criteria to obtain a map of the most consistent NSV across the turbot genome following previous filtering pipelines reported for the species27,57. Functional criteria included: (i) dismiss putative pseudogenes using a conservative criterion, to say, those genes with 3 or more NSVs were discarded; (ii) remove non-sense variants producing truncated proteins; and (iii) discard genes with low-quality annotation. Technical criteria included: (i) availability of ± 100 bp without additional variation which could compromise primer annealing and PCR amplification for further genotyping; (ii) compatibility of the adjacent regions selected for designing multiplex primer panels for genotyping; (iii) validation of the in silico detected allelic variants with the MassARRAY technology58. Population genetics criteria included: (i) discard SNPs deviated from Hardy–Weinberg proportions (P < 0.01); and (ii) remove tri-allelic SNPs. From this broad NSVs map, we performed additional filtering to focus on the main traits putatively associated with selection in wild or farm populations where previous information was available to choose a final manageable set of SNPs for validation: (i) select the most relevant candidate genes related to growth, osmoregulation and resistance to pathologies crossing previous literature, mostly on fishes, with previous QTL and functional (differentially expressed genes, DEG) data in turbot28,29,31; (ii) identifying suggestive genes close to markers associated with signatures of selection (< 500 kb) 13,14,30; (iii) discarding deleterious variants from previous information in other species for the same genes available in public repositories (PROVEAN software59; (iv) selecting the most diverse SNP per locus (higher MAF: minimum allele frequency). The conservation of the substituted residues in the 18 selected turbot protein variants was examined by blasting against the corresponding proteins in other teleost species available at NCBI (https://www.ncbi.nlm.nih.gov/).
Population genetics of selected NSVs across the turbot distribution range
Sampling
In our screening, we analyzed 13 wild populations including the main genetic regions reported across the turbot distribution range17,also representative of the wide variety in temperature and salinity, the main drivers for selection in turbot17, but very likely also influencing pathogen distribution60,61. We also included samples from the broodstock of the three turbot companies carrying out breeding programs for comparison with wild samples to detect signals of selection related to the main target traits. The broodstock of the three main turbot companies, located in NW Spain and France, were founded with individuals collected from NE Atlantic Ocean18, where non-significant genetic differentiation was reported with neutral markers17. A total of 355 individuals were analyzed from 16 sampling locations, mostly exceeding 20 individuals/sample (AQUATRACE project; Fig. 1, Table 1). Wild samples included the four main regions of the turbot distribution: Baltic Sea (BAS), Atlantic Ocean (ATL), Mediterranean Sea (MED) and Black Sea (BLS)17. The Atlantic Ocean region was overrepresented because of the higher abundance of the species. Farm samples included a representative sample of the broodstock of the three European turbot companies with ongoing breeding programs18.
SNP genotyping
To genotype and validate in silico allelic variants of the SNPs finally selected we used the MassARRAY technology. Briefly, the protocol consists of a two-step reaction: i) PCR amplification of an amplicon of ~ 150 bp including the selected SNP; and ii) mini-sequencing reaction using an internal primer adjacent to the SNP which extends the primer with a dideoxy nucleotide complementary to the SNP variant58. Flanking regions of ± 100 nucleotides of the selected SNPs were obtained from the turbot reference genome (GCA_013347765.1). Design of primer multiplexes and MassARRAY genotyping was done at the UCIM-Universitat de Valencia Genomics Platform.
Genetic diversity and differentiation
Mean number of alleles per locus (Na) and expected (HE) and observed (HO) heterozygosities were estimated to assess genetic diversity per locus. Departure from Hardy–Weinberg equilibrium (HWE) and intrapopulation fixation index (FIS) were tested for each locus and population. Global FST across loci was estimated considering all samples, but also wild sample and farm sample groups separately. Analyses were performed using GENEPOP v4.062.
Detection of outlier loci
We followed two different statistical approaches to detect outlier loci showing signals of divergent or balancing selection implemented in BAYESCAN v2.163 and ARLEQUIN v3.564, respectively. Outliers were investigated in: (i) all samples, (ii) wild samples, and (iii) wild vs farm samples; additionally, a hierarchical approach was also explored considering two hierarchical groups (wild vs farmed); in all cases we used as background the neutral datasets previously reported for the same comparisons by do Prado et al.17,18. The following BAYESCAN parameters were used: 100,000 burn-in length, prior odds of 10 and 20 pilot runs, to identify outliers using a q value < 0.05. The FDIST FST method implemented in ARLEQUIN was used to investigate loss of heterozygosity after selective sweeps regarding FST. For this program we used the following parameters: 50,000 simulations, 100 demes per group and 20 groups when a hierarchical model was applied. In all ARLEQUIN analyses, outliers were identified considering a P-value < 0.01, considering it is prone to a higher number of false positives65. The hierarchical scenario could only be implemented with ARLEQUIN, because this option is not available in BAYESCAN.
Protein 3D structure modelling of non-synonymous variants
To find potential template structures for homology modelling, a specific PSI-BLAST sequence search in the Protein Data Bank (PDB) was performed (https://blast.ncbi.nlm.nih.gov/Blast.cgi)66. Identified template structures showed large unresolved regions which encompassed point mutations analyzed in the present study. Two different strategies for modelling were therefore undertaken: I-TASSER48 and RoseTTAfold51. I-TASSER is a metaserver that automatically employs ten threading algorithms in combination with ab initio modelling to build the tertiary structure of a protein as well as replica-exchange Monte Carlo dynamics simulations for the atomic-level refinement. For comparison an algorithm led by artificial intelligence, RoseTTAFold (https://robetta.bakerlab.org) was also used. The presence of intrinsically disordered regions in the proteins was investigated by the following disorder predictors: PONDR67, DISOPRED68, IUPRED369 and PrDOS70.
Homology modelling was used to generate the 3D model structures of the polymorphic turbot HbαD (see Results) together with the turbot Hbβ1 subunit (AWP17400.1) in the deoxy form (T-state). The structure of deoxyhemoglobin of the Antarctic icefish Pagothenia bernacchii (PDB code: 1HBH)71 was selected as the most appropriate template to generate the tetramer model (sequence identities of 82.3 and 76% for α and β chains, respectively). Twenty models of each Hb variant were built using MODELLER72 as implemented in Biovia Discovery Studio. The model with the lowest MODELLER objective function was selected for analysis.
Results
Non-synonymous variants and filtering
Among the ~ 3.3 M SNPs detected in the ten 20 × re-sequenced turbot samples, 55,176 represented NSVs after quality control (MAPQ < 30; PHRED < 30; Supplementary Table S1). Among the filtering steps used to select a consistent and manageable set of NSVs for validation, genes with ≥ 3 NSVs (82.9% drop over the previous step) and the functional criterion of selecting genes previously identified associated with growth, osmoregulation and resistance to pathogens (87.4% drop), were the most decisive (Fig. 2; Supplementary Table S1). In the last step, information on candidate genes related to growth and osmoregulation either in turbot or in other fish species (see Introduction for citations), but also for resistance to the main turbot pathogens, Aeromonas salmonicida (AS, furunculosis), Philasterides dicentrarchi (PD, scuticociliatosis) and Enteromyxum scohpthalmi (ES, enteromixosis) was used to retain 1179 SNPs in 876 genes. The SNP with highest MAF for each gene was retained. A total of 84 turbot NSVs were detected in other species using PROVEAN database, and among them, eight were categorized as deleterious and thus discarded for further analyses (Supplementary Table S2). The number of transitions was very similar to that of transversions in the 876 listed NSVs: 432 transitions (A/G = 227; C/T = 205) vs 444 transversions (A/C = 129; AT = 81; C/G = 120; G/T = 115).
Selection for genotyping and population screening
Our intention was to select a final set of ~ 25 NSVs from the consistent list of 868 candidates to be genotyped in a single multiplex using the MassARRAY technology to validate the reliability of our pipeline and to search for signals of natural or artificial selection in turbot populations across its distribution range. Furthermore, to add functional support, 3D protein structure was evaluated specially on those genes showing significant signals of selection. Accordingly, we focused on genes previously associated with signals of selection in the wild or farm populations related to growth, osmoregulation and resistance to pathogens, detected either by functional assays (DEG: differentially expressed genes) or QTL associations in turbot, but also in other fish species (Table 2). Most of the genes included in the list matched to more than one selection criteria, except for hbαD (hemoglobin subunit alpha-D) of particular interest regarding metabolism and growth39. The final list included 22 genes associated with growth (13 genes); resistance to ES (13), AS (7) and PD (9); osmoregulation (3); and signals of natural (7) or artificial (3) divergent selection, mostly in turbot, but also from other fish species (10) (Table 2).
Multiplex design and genotyping on a MassARRAY platform
Among the 22 preselected SNPs, 18 could be included in a single multiplex for MassARRAY genotyping using primers designed from the ± 100 bp flanking regions retrieved from the turbot genome (Supplementary Table S3 and S4). In all cases, the allelic variants detected with MassARRAY genotyping matched with the in silico SNP calling from the re-sequencing turbot data and thus they were validated for further research. Genotypes for the 355 individuals from wild and farm origin were very consistent and only one missing data was detected among the 6390 genotypes (Supplementary Table S5).
Genetic diversity and differentiation across loci, populations and groups
Global genetic diversity in the wild for the set of 18 SNPs was significantly higher than previously reported using an anonymous SNP panel across the whole genome (Na: 1.77 vs 1.49; HE: 0.223 vs 0.090, respectively17), which can be explained by the filtering criterion followed for detecting NSVs in this study (at least two variants in the 10 individuals analyzed (20 alleles per locus); minimum allele frequency (MAF) = 0.1). Also, genetic diversity was higher on average in farm than in wild samples (HE = 0.261 vs 0.214) even for the Atlantic region (HE = 0.227) suggesting a good management of genetic diversity after five generations of selection. Average genetic diversity per locus ranged from aqp8b (aquaporin 8b) (Na = 1.19; HE = 0.0199) to vipr1b (vasoactive intestinal peptide receptor 1b) (Na = 2; HE = 0.4664), but other loci, such as eya3 (eyes absent 3), hamp (hepcidin antimicrobial peptide), fga-like (fibrinogen-alpha chain-like), ciart (circadian-associated transcription repressor) and tshr (thyroid stimulating hormone receptor), also showed high genetic diversity figures (Table 3). The remaining loci were polymorphic in most populations (MAF > 0.01). No deviation from Hardy–Weinberg proportions were detected either per locus across populations or per population across loci, excluding Skagerak (SK), which showed a significant excess of heterozygotes for most of the polymorphic loci analyzed (P < 0.0023). Interestingly, this population is located in the transition between Baltic Sea and North Sea, where a contact between two highly divergent salinity environments occurs, depicting a rather complex hybridization area12.
Seven loci showed MAF < 0.1, among which aqp8b and sstr3 (somatostatin receptor 3) showed rare allelic variants (MAF < 0.01). In fact, the sstr3 locus was nearly fixed for one allelic variant across most populations, while the igfbp2 (insulin-like growth factor binding protein 2b) and aqp8 loci were polymorphic at MAF > 0.1 only in one population (Supplementary Table S5). At the other end, eya3, vipr1b (vasoactive intestinal peptide receptor 1b), fga-like and ciart were highly polymorphic (MAF > 0.3). Abrupt changes in allele frequencies at some genetic regions or related to the origin of samples (farm, wild) were observed. For instance, it was remarkable the polymorphism decay in the Black Sea of eya3, or the increasing/decreasing polymorphism in the southern populations for slc12a3 (solute carrier family 12 member 3) and hmox (heme oxygenase), respectively (Fig. 3). Also, saw peaks showing the effects of genetic drift or sampling variance were observed in the least polymorphic loci, such as igf1rb (insulin-like growth factor 1b receptor), myb (v-myb avian myeloblastosis viral oncogene homolog) and cmtm3 (CKLF-like MARVEL transmembrane domain containing 3). Finally, striking variation was also displayed when comparing farm samples between them or to the wild ones.
Genetic differentiation and signals of selection
We searched for signals of selection on the selected set of NSVs under three different scenarios: i) the 13 WILD populations; ii) ALL the 16 populations (13 wild and 3 farm); iii) comparing wild vs farm populations using a hierarchical approach (HIER). In all cases the set of neutral loci reported by do Prado et al.17, when analyzing wild populations, and by do Prado et al.18, when comparing wild vs farm populations, were used as the neutral background. A single locus, fga-like, which showed a significant decrease of genetic variation in the southern populations, was significant with BAYESCAN (Supplementary Fig. S1), but not with ARLEQUIN. Using the latter software, two loci showed signals of divergent selection, either consistent or suggestive (P < 0.01 and 0.05, respectively), in the three comparisons performed: eya3 was nearly monomorphic in the Black Sea while at intermediate frequencies in the remaining populations; and tshr showed a progressive decrease in genetic diversity from the Baltic to the Black Sea, with an abrupt change in the Adriatic Sea, the only Mediterranean population studied. Another locus, paxbp1 (PAX3 and PAX7 binding protein 1), showed signals of stabilizing selection in two scenarios (WILD, ALL), and close to significance in the third one (HIER). The comparison of wild and farm populations (HIER, ALL) unveiled significant signals of divergent selection for igfbp2, aqp8b and hbαD, which showed a rather similar pattern of differentiation, being monomorphic in nearly all wild populations while the alternative allele increased in two of the farms analyzed. Finally, locus igf1rb, although not significant, showed a notable differentiation (FST = 0.1057) between wild and farm samples (HIER), reaching the highest frequencies of the alternative allele in the same two farms as igfbp2, aqp8b and hbαD.
3D structural analysis of the non-synonymous variants with signals of selection
The tetrameric deoxyhemoglobin structure of the polymorphic turbot HbαD and Hbβ1 subunits revealed that the Ala44αThr replacement occurs at the α1β2 interface, which is involved in the allosteric transition of the protein (Fig. 4). The Ala44α variant shows a hydrophobic interaction with His98β that may stabilize the Hb tetramer and so determine a lower oxygen affinity, whereas the interaction is lost upon replacement of Ala with Thr. The conservative Val78αIle substitution does not affect the protein interfaces.
The absence of a suitable template in the PDB for homology modelling of TSHR, PAXBP1, EYA3 and IGFBP2 led us to generate 3D structures using I-TASSER and RoseTTAfold (Supplementary Table S6). The RoseTTAfold TSHR model showed a confidence score of 0.6 and the per-residue error estimate suggests that the Leu339Glu substitution is positioned in an unstructured region from position 298 to 404 (Fig. 5), corresponding to the hinge between the extracellular leucine-rich repeats and the seven-helix transmembrane domain. RoseTTAfold models of PAXBP1 and EYA3 were of low confidence (0.39 and 0.42 scores, respectively), while the IGFBP2 model showed a good confidence score of 0.66, but the C-terminal region containing the Pro261Ser mutation was of low-quality. Modeled structures and corresponding per-residue error estimate are shown in Fig. 6.
Discussion
Non-synonymous variation plays an important role in evolution and local adaptation to the diverse environment experienced by species with broad distribution ranges73,74 and has been profusely screened in humans, Drosophila and other model species75,76,77,78. The increasing genomic resources due to the lowering sequencing costs make it feasible to catch in a quick and cheap way a picture of existing NSVs to be further used to investigate its adaptive role79,80,81. Other sources of variation such as structural variants have been associated with adaptation of fish species in the wild82, even in flatfish83,84, but the relative importance of NSV and structural on adaptation is still a matter of debate and further studies are needed85.
Here, we report the first genome-wide collection of NSVs in the turbot, a flatfish species distributed all around the European coasts, where it experiences gradual and abrupt changes in temperature and salinity17. Our study is based on genome resequencing of 10 farm fishes (5 males and 5 females) originated after five generations of selection from breeders of Northeast Atlantic Ocean, the most important region of turbot distribution, thus representing a preliminary picture of NSV in the genome of the species. However, it should be noted that expected heterozygosity was higher in farm than in wild samples, even from the Atlantic region, which shows a good management of genetic diversity in the breeding program. Since 10 diploid genomes were sequenced, our capacity to disclose low frequent and rare variants is limited, especially because filtering included a step for reliability related to MAF = 0.1 (at least two variants in the sample). Nonetheless, we could identify more than 50,000 NSVs across the ~ 21,500 protein coding genes annotated in the turbot genome, which will likely increase when a broader sample including the four main genetic regions identified across its distribution range17 is explored. However, since most genetic diversity in the turbot is contained within populations (global NE Atlantic FST = 0.002 ns; global distribution FST ~ 0.09017), our small sample from Northeast Atlantic Ocean would include a significant representation of NSV of the species. After removing those NSVs from putative pseudogenes and those representing non-sense mutations, a set of ~ 10,000 NSV was retained constituting the most reliable set in our study. Considering the expected frequency of NSVs in our small sample (MAF = 0.1) and the high turbot effective population size (usually Ne > 10,000 in the Atlantic Ocean region17), it could be assumed that most of this variation is not strongly detrimental and in fact, a very minor proportion of variants were homologous to deleterious mutations in other species. Previous studies on allozyme variation in the turbot supported much lower variation for this fraction of protein coding genes than in other flatfishes (~ fivefold lower), unlike the very similar diversity observed with microsatellites, which was interpreted as an ancient bottleneck in this species11. If this observation could be extrapolated to all protein coding genes, this would mean that a much higher NSV would occur in other flatfish, which is supported by the ~ 10 million SNPs detected in Senegalese sole vs ~ 3 million SNPs in turbot obtained from the recent whole genome resequencing of 12 sole individuals86.
The broad NSV collection identified in the turbot was filtered using technical, population genetics and functional information to obtain a consistent database that could be further validated and eventually used with practical purposes on breeding programs and management of wild fisheries. We were very conservative to retain NSVs on functional genes, and those genes with ≥ 3 NSVs were dismissed, which dramatically dropped NSVs to ~ 10,000. We are aware that this filtering is likely very strict and a significant quantity of genes with ≥ 3 NSVs could be functional, so the whole ≥ 50,000 should be considered as a suggestive repository for future studies. Pseudogene identification in the turbot genome using the vast functional information coming from the AQUAFAANG project87 will improve our ability to discriminate pseudogenes, resulting in a more refined list of NSVs. The second most important drop (from ~ 10,000 to ~ 1200) was related to previous functional information (differentially expressed genes in response to pathogen challenges or growth) or association (close to QTL for growth and resistance to pathologies) studies in farm populations28,31 or with signals of selection related to environmental variables (temperature, salinity) in the wild across its distribution range30. The broad genomic information in turbot facilitated the targeting of this subset of NSVs on candidate genes under selection. However, the greater relevance of resistance to pathologies and growth for industry determined a bias in the final selection. Our list includes other interesting genes potentially related to adaptation in the wild (i.e. eight opsin genes very relevant for adaptation to the sea bottom88) to be explored in future studies. From this collection, a small subset of NSVs was validated using the MassARRAY genotyping technology on representative wild and farm samples trying to obtain some clues on their relevance for adaptation across its distribution range or in breeding programs. All the 18 SNPs finally genotyped in a single multiplex matched with the in silico predictions supporting the confidence of our pipeline and showed a very robust genotyping with hardly missing data, which makes feasible its further application as a cost-effective molecular tool.
We intended to identify signals of selection in this NSV set, either divergent or stabilizing, in the different scenarios studied using wild and farm populations covering the whole population range of the species and broodstock from companies with breeding programs, respectively. The joint analysis of loci under selection would blur/mask potential population structuration considering the different evolutionary forces89, which include clinal, patchiness or local variation patterns involving balanced or divergent selection models. However, locus-specific patterns of spatial variation were observed in the wild, as expected given the environmental variation (both biotic and abiotic) across the turbot distribution range. The most consistent pattens of turbot spatial structure were related to differentiation of the Southern populations at several loci (fga-like, slc12a3) or specifically in the Black Sea (eya3, hamp) or the Adriatic Sea (tshr), but gradual changes in the Atlantic from the Baltic Sea east and southwards (ciart, LOC118312496) and very particular local patterns such as virp1, were also observed. At the other end, paxbp1 showed a great constancy across the whole distribution area. Interestingly, some of these gene markers have been associated with strong genetic differentiation at spatial scale in other fish species, like slc12a3 and tshr related to osmoregulation and variation in spawning time, respectively, between Atlantic and Baltic herring90,91. Signals of selection for some of these genes has also been reported in other fish species across geographical ranges, such as paxbp1 linked to myogenesis and thermogenesis92, or virp1 associated with local adaptations to extreme environments93.
In addition to eya3 and tshr, outlined before, three other loci, aqp8b, igfbp2 and hbαD, showed consistent or suggestive signals of selection when comparing wild vs farm populations. Of note, igfbp2 and hbαD showed a very strong differentiation when comparing wild vs farm populations (FST > 0.3) due to the increase of a rare allelic variant in the wild in both farms. This fact was not observed in farm 2, which could suggest different selective pressures, or alternatively, a founder effect in the farm 2 broodstock. Interestingly, farms 1 and 3 appeared to be genetically closer (average FST = 0.015) with regard to farm 2 (FST (1 vs 2) = 0.030; FST (2 vs 3) = 0.036), either by historical connection or because similar management protocols or targets of selection are followed.
We looked for additional support to the signals of selection detected by analyzing the consequences of the NSVs detected on the 3D protein structure that could refine their function according to environmental variation. For this, the complementary approaches using protein models of related species and de novo models supported by artificial intelligence tools provided information on the putative action of selection on growth, circadian rhythm and osmoregulation related genes. Furthermore, we also explored functional changes of other NSVs regarding previous information in turbot or in other species to ascertain their putative role on adaptation not evidenced in our population genomics analyses.
IGF-I and IGF-II are important regulators of vertebrate growth and development, and their respective coding turbot genes display distinct expression patterns during metamorphosis94. The present turbot study revealed polymorphisms in both the IGF binding protein IGFBP2 and the receptor IGF1R. The binding proteins have a higher affinity for IGF than the receptors and can inhibit and/or enhance IGF actions depending on the physiological context95. Teleost fish possess multiple igfbp genes of which igfbp2 encodes a growth inhibitory protein96. A polymorphism in the chicken igfbp2 has been found to be associated with growth and body composition97. The Pro264Ser polymorphism of turbot IGFBP2 is positioned in the C-terminal domain, which in human IGFBP2 contributes to IGF-1 binding98. The C-terminus including Pro264 is highly conserved in teleost IGFBP2 and was monomorphic in all wild turbot populations examined, except for the Spanish west coast population and farm1 and farm3 that displayed the rare Ser264 variant. Most of the current turbot broodstock have originally been recruited from Spanish and French coasts18,99, which could explain the presence of the rare IGFBP2b variant in farmed turbot, but its presence could also be connected to selection for growth considering that this is the main target of breeding programs. Similarly, igf1rb showed the highest polymorphism in farm1 and farm3, while the alternative allele was missing in the Baltic Sea, Black Sea and Adriatic Sea. An igf1rb polymorphism was reported to be associated with growth traits in the freshwater goby Odontobutis potamophila100, and divergence and polymorphism analysis of igf1ra and igf1rb in the orange-spotted grouper (Epinephelus coioides) suggested their importance in growth regulation and breeding of this species101. Moreover, the involvement of igf1rb in growth during hypoxia was recently reported in a genome-wide association analysis of adaptation to oxygen stress in farmed Nile tilapia (Oreochromis niloticus)102. Our study revealed a very strong differentiation of the polymorphic HbαD subunit when comparing wild vs farm populations. We predict that the Thr44 variant identified in farm1 and farm3 increases the oxygen binding affinity similar to the human hemoglobin Kawachi (Pro44α → Arg) variant103 of importance during hypoxic conditions.
PAXBP1 is involved in skeletal muscle formation by linking the transcription factors PAX3 and PAX7 on chromatin to regulate the muscle progenitor cells proliferation. The pathogenic human variant Arg538Cys underlies syndrome of global developmental delay and myopathic hypotonia104, while the significant of the Pro47Leu substitution in turbot PAXBP1 is unknown.
Turbot is an active visual predator and shows circadian cycles of locomotor and food anticipatory activities together with rhythmic expression of core circadian clock genes105. Among the polymorphic turbot genes displaying high allelic diversity, we identified tshr, eya3 and ciart, which are involved in the regulation of circadian and seasonal rhythms. TSHR plays an important role in seasonal reproduction through the conserved EYA3-TSH pathway106,107. Polymorphisms in herring TSHR were shown to contribute to the regulation of spring or autumn spawning36, while the Leu339Glu polymorphism in turbot TSHR is positioned in a flexible region. Such intrinsically disordered regions are common in eukaryotic proteins and important biological functions have been associated with them, such as flexible linker, cellular signal transduction, protein phosphorylation108,109. It has been observed that function can arise directly from the disordered state whereas in other cases their function originates from binding-induced folding promoted by other proteins or RNA, DNA molecules110. Evidence of EYA3 as an integrator of photoperiodic cues and nutritional regulation was recently found in Atlantic cod (Gadus morhua)111. The Ser230Gly substitution in turbot EYA3 is positioned in the PST (Pro-Ser-Thr)-rich domain necessary for transcriptional activity of Drosophila EYA112,113, while both Ser and Gly were identified at the corresponding site in various teleost. The circadian-associated transcriptional repressor CIART is involved in the eye regression of cave molly (Poecilia mexicana)114, whereas turbot ciart proved to be differentially expressed in freshwater- versus seawater-acclimated fish115. A missense polymorphism in pig ciart was reported to be associated with backfat thickness116. Both the Asn and Ser residues in the polymorphic position 271 of turbot CIART are found in other teleost.
The important role played by the kidney in the osmoregulatory response of turbot to low salinity has been examined by transcriptome analysis25,115. SLC12A3, or the Na + Cl–cotransporter NCC1 paralog, is highly expressed in the kidney of fish acclimated to freshwater and is crucial for the ion reabsorption in the collecting duct117. Turbot slc12a3 showed highest polymorphic diversity in the Black Sea and Adriatic Sea, in contrast to the Baltic Sea. We noted that the Cys residue at position 938 in turbot slc2a3 is novel among marine fish, except for Antarctic fish. aqp8b is highly expressed in fish kidney tubuli serving as important pathways for reabsorbed water118. Turbot aqp8 was only polymorphic in the Spanish west coast population and in farm1 and farm3 as outlined before for igfbp2. The acidic Gln residue at position 36 is invariable in teleost AQP8, and the basic His replacement together with the novel Cys938 variant of Slc2a3 await further studies.
Turbot vipr1b showed high polymorphic diversity in both wild populations and farms examined, except in the Baltic Sea and Spanish west coast. A conserved role of the VIP neuropeptide in the immune system and inflammatory processes in olive flounder (Paralichthys olivaceus) was suggested by the significant changes in vip mRNA levels in spleen and head kidney when exposed to an artificial bacterial challenge by Edwardsiella tarda119. VIP binds to the N-terminal end of the receptor, which in turbot contains an Asn2Gln polymorphism. VIPR1 polymorphism has been linked to gastrointestinal dysmotility disorders in man120, but associated with reproductive traits in birds121. Two polymorphic hepcidins have been identified in turbot122 of which hep1 was highly polymorphic in all populations and farms examined, particularly in the Black Sea and farm 3. The Asn81Tyr substitution is positioned in the mature peptide, but it does not seem to affect the conserved Cys residues as shown by the polymorphic hep2122. Both hep1 and hep2 possess antimicrobial activity and promote resistance against bacterial and viral infection, but the antimicrobial activities of hep2 were significantly stronger than those of hep1 in vitro and in vivo123. However, only hep1 was upregulated after iron overloading that is consistent with the presence of a hypothetical iron regulatory sequence, which is lacking in hep2123.
Conclusions
We constructed the first atlas of NSVs in the turbot genome and designed a conservative pipeline to define a robust dataset that could be further validated for their implication on adaptation in the wild or farm conditions using population genomics or 3D functional approaches. This strategy enabled the identification of consistent or suggestive signals of selection related to growth, osmoregulation, hypoxia or immunity that might be further applied for functional and association studies using a robust and cost-effective genotyping methodology. Our study does not only provides a suitable strategy for turbot, but it could be expanded to other fish species considering the increasing genomic resources available in public databases.
Data availability
Resequencing data of five males and five females are available at NCBI databases BioProject PRJNA649485 (https://www.ncbi.nlm.nih.gov/bioproject/649485), accession number SRX8843737. Genotyping data used in this study is provided in Table S5 and the primer sets for SNP genotyping included in Table S4.
References
Ilker, E. & Hinczewski, M. Modeling the growth of organisms validates a general relation between metabolic costs and natural selection. Phys. Rev. Lett. 122, 238101 (2019).
Boltaña, S. et al. Influences of thermal environment on fish growth. Ecol. Evol. 7, 6814–6825 (2017).
Rosenfeld, J., Richards, J., Allen, D., Van Leeuwen, T. & Monnet, G. Adaptive trade-offs in fish energetics and physiology: Insights from adaptive differentiation among juvenile salmonids. Can. J. Fish. Aquat. Sci. 77, 1243–1255 (2020).
Robertson, D. R. & Collin, R. Inter- and intra-specific variation in egg size among reef fishes across the isthmus of Panama. Front. Ecol. Evol. 2, 84 (2015).
Zueva, K. J., Lumme, J., Veselov, A. E., Kent, M. P. & Primmer, C. R. Genomic signatures of parasite-driven natural selection in north European Atlantic salmon (Salmo salar). Mar. Genom. 39, 26–38 (2018).
Rajkov, J., El Taher, A., Böhne, A., Salzburger, W. & Egger, B. Gene expression remodelling and immune response during adaptive divergence in an African cichlid fish. Mol. Ecol. 30, 274–296 (2021).
Verhille, C. E. et al. Inter-population differences in salinity tolerance and osmoregulation of juvenile wild and hatchery-born Sacramento splittail. Conserv. Physiol. 4, 1–12 (2016).
Froese, R. & Pauly, D. FishBase (version Feb 2018). In: Species 2000 & ITIS Catalogue of Life, 2019 Annual Checklist (Roskov Y. et al.). (2018). www.catalogueoflife.org/annual-checklist/2019. ISSN 2405–884X.
Karås, P. & Klingsheim, V. Effects of temperature and salinity on embryonic development of turbot (Scophthalmus maximus L.) from the North Sea, and comparisons with Baltic populations. Helgolander Meeresuntersuchungen 51, 241–247 (1997).
Barbut, L. et al. How larval traits of six flatfish species impact connectivity. Limnol. Oceanogr. 64, 1150–1171 (2019).
Bouza, C., Presa, P., Castro, J., Sánchez, L. & Martínez, P. Allozyme and microsatellite diversity in natural and domestic populations of turbot (Scophthalmus maximus) in comparison with other Pleuronectiformes. Can. J. Fish. Aquat. Sci. 59, 1460–1473 (2002).
Nielsen, E. E., Nielsen, P. H., Meldrup, D. & Hansen, M. M. Genetic population structure of turbot (Scophthalmus maximus L.) supports the presence of multiple hybrid zones for marine fishes in the transition zone between the Baltic Sea and the North Sea. Mol. Ecol. 13, 585–595 (2004).
Vandamme, S. G. et al. Regional environmental pressure influences population differentiation in turbot (Scophthalmus maximus). Mol. Ecol. 23, 618–636 (2014).
Vilas, R. et al. A genome scan for candidate genes involved in the adaptation of turbot (Scophthalmus maximus). Mar. Genom. 23, 77–86 (2015).
Turan, C. et al. Genetics structure analysis of turbot (Scophthalmus maximus, Linnaeus, 1758) in the Black and Mediterranean Seas for application of innovative Management Strategies. Front. Mar. Sci. 6, 740 (2019).
Ivanova, P. et al. Genetic diversity and morphological characterisation of three turbot (Scophthalmus maximus L., 1758) populations along the Bulgarian Black Sea coast. Nat. Conserv. 43, 123–146 (2021).
do Prado, F. D. et al. Parallel evolution and adaptation to environmental factors in a marine flatfish: Implications for fisheries and aquaculture management of the turbot (Scophthalmus maximus). Evol. Appl. 11, 1322–1341 (2018).
do Prado, F. D. et al. Tracing the genetic impact of farmed turbot Scophthalmus maximus on wild populations. Aquac. Environ. Interact. 10, 447–463 (2018).
Robledo, D. et al. Integrating genomic resources of flatfish (Pleuronectiformes) to boost aquaculture production. Comp. Biochem. Physiol. Part D Genom. Proteom. 21, 41–55 (2017).
Sánchez-Molano, E. et al. Detection of growth-related QTL in turbot (Scophthalmus maximus). BMC Genomics 12, 473 (2011).
Rodríguez-Ramilo, S. T. et al. QTL detection for Aeromonas salmonicida resistance related traits in turbot (Scophthalmus maximus). BMC Genom. 12, 541 (2011).
Robledo, D. et al. Integrative transcriptome, genome and quantitative trait loci resources identify single nucleotide polymorphisms in candidate genes for growth traits in turbot. Int. J. Mol. Sci. 17, 243 (2016).
Sciara, A. A. et al. Validation of growth-related quantitative trait loci markers in turbot (Scophthalmus maximus) families as a step toward marker assisted selection. Aquaculture 495, 602–610 (2018).
Ma, A., Huang, Z., Wang, X. & Xu, Y. & Guo, X.,. Identification of quantitative trait loci associated with upper temperature tolerance in turbot, Scophthalmus maximus. Sci. Rep. 11, 1–12 (2021).
Cui, W. et al. Comparative transcriptomic analysis reveals mechanisms of divergence in osmotic regulation of the turbot Scophthalmus maximus. Fish Physiol. Biochem. 46, 1519–1536 (2020).
Martínez, P. et al. Identification of the major sex-determining region of turbot (Scophthalmus maximus). Genetics 183, 1443–1452 (2009).
Martínez, P. et al. A genome-wide association study, supported by a new chromosome-level genome assembly, suggests sox2 as a main driver of the undifferentiatiated ZZ/ZW sex determination of turbot (Scophthalmus maximus). Genomics 113, 1705–1718 (2021).
Martínez, P. et al. Turbot (Scophthalmus maximus) genomic resources:application for boosting aquaculture production. Genomics in Aquaculture (Elsevier Inc., 2016). https://doi.org/10.1016/B978-0-12-801418-9.00006-8.
Saura, M. et al. Disentangling genetic variation for resistance and endurance to scuticociliatosis in turbot using pedigree and genomic information. Front. Genet. 10, 539 (2019).
Aramburu, O. et al. Genomic signatures after five generations of intensive selective breeding: Runs of homozygosity and genetic diversity in representative domestic and wild populations of turbot (Scophthalmus maximus). Front. Genet. 11, 1–14 (2020).
Aramburu, O., Blanco, A., Bouza, C. & Martínez, P. Integration of host-pathogen functional genomics data into the chromosome-level genome assembly of turbot (Scophthalmus maximus). Aquaculture 564, 739067 (2023).
Saul, M. C., Philip, V. M., Reinholdt, L. G. & Chesler, E. J. High-diversity mouse populations for complex traits. Trends Genet. 35, 501–514 (2019).
Moen, T. et al. Epithelial cadherin determines resistance to infectious pancreatic necrosis virus in Atlantic salmon. Genetics 200, 1313–1326 (2015).
Pavelin, J. et al. The nedd-8 activating enzyme gene underlies genetic resistance to infectious pancreatic necrosis virus in Atlantic salmon. Genomics 113, 3842–3850 (2021).
Barson, N. J. et al. Sex-dependent dominance at a single locus maintains variation in age at maturity in salmon. Nature 528, 405–408 (2015).
Chen, J. et al. Functional differences between TSHR alleles associate with variation in spawning season in Atlantic herring. Commun. Biol. 4, 795 (2021).
Imsland, A. K., Brix, O., Nævdal, G. & Samuelsen, E. N. Hemoglobin genotypes in turbot (Scophthalmus maximus Rafinesque), their oxygen affinity properties and relation with growth. Comp. Biochem. Physiol. A Physiol. 116, 157–165 (1997).
Imsland, A. K., Foss, A., Stefansson, S. O. & Nævdal, G. Hemoglobin genotypes of turbot (Scophthalmus maximus): Consequences for growth and variations in optimal temperature for growth. Fish Physiol. Biochem. 23, 75–81 (2000).
Andersen, Ø., Rubiolo, J. A., De Rosa, M. C. & Martinez, P. The hemoglobin Gly16β1Asp polymorphism in turbot (Scophthalmus maximus) is differentially distributed across European populations. Fish Physiol. Biochem. 46, 2367–2376 (2020).
Torrisi, M., Pollastri, G. & Le, Q. Deep learning methods in protein structure prediction. Comput. Struct. Biotechnol. J. 18, 1301–1310 (2020).
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
AlQuraishi, M. Machine learning in protein structure prediction. Curr. Opin. Chem. Biol. 65, 1–8 (2021).
Powder, K. E., Cousin, H., McLinden, G. P. & Craig Albertson, R. A nonsynonymous mutation in the transcriptional regulator lbh is associated with cichlid craniofacial adaptation and neural crest cell development. Mol. Biol. Evol. 31, 3113–3124 (2014).
Lamichhaney, S. et al. Evolution of Darwin’s finches and their beaks revealed by genome sequencing. Nature 518, 371–375 (2015).
Gupta, A. M., Chakrabarti, J. & Mandal, S. Non-synonymous mutations of SARS-CoV-2 leads epitope loss and segregates its variants. Microbes Infect. 22, 598–607 (2020).
Verde, C. et al. Structure, function and molecular adaptations of haemoglobins of the polar cartilaginous fish Bathyraja eatonii and Raja hyperborea. Biochem. J. 389, 297–306 (2005).
Pearce, R. & Zhang, Y. Toward the solution of the protein structure prediction problem. J. Biol. Chem. 297, 100870 (2021).
Zhang, Y. I-TASSER server for protein 3D structure prediction. BMC Bioinf. 9, 40 (2008).
Pirolli, D. et al. Insights from molecular dynamics simulations: Structural basis for the V567D mutation-induced instability of zebrafish alpha-dystroglycan and comparison with the murine model. PLoS ONE 9, e103866 (2014).
Lee, J., Freddolino, P. L. & Zhang, Y. From Protein Structure to Function with Bioinformatics. In From Protein Structure to Function with Bioinformatics: Second Edition (ed. Rigden, D. J.) (2017). https://doi.org/10.1007/978-94-024-1069-3
Baek, M. et al. Accurate prediction of protein structures and interactions using a 3-track neural network. Science 373, 871–876 (2021).
Castro, J. et al. Potential sources of error in parentage assessment of turbot (Scophthalmus maximus) using microsatellite loci. Aquaculture 242, 119–135 (2004).
Chen, S., Zhou, Y., Chen, Y. & Gu, J. Fastp: An ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884–i890 (2018).
Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. ArXiv ID 1303.3997v2 00, 1–3 (2013).
Li, H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 27, 2987–2993 (2011).
Cingolani, P. et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly (Austin) 6, 80–92 (2012).
Vera, M. et al. Development and validation of single nucleotide polymorphisms (SNPs) markers from two transcriptome 454-runs of turbot (Scophthalmus maximus) using high-throughput genotyping. Int. J. Mol. Sci. 14, 5694–5711 (2013).
Ellis, J. A. & Ong, B. The MassARRAY® system for targeted SNP genotyping. Methods in molecular biology vol. 1492 (2017).
Choi, Y. & Chan, A. P. PROVEAN web server: A tool to predict the functional effect of amino acid substitutions and indels. Bioinformatics 31, 2745–2747 (2015).
Costello, M. J. Ecology of sea lice parasitic on farmed and wild fish. Trends Parasitol. 22, 475–483 (2006).
Blanchet, S., Rey, O. & Loot, G. Evidence for host variation in parasite tolerance in a wild fish population. Evol. Ecol. 24, 1129–1139 (2010).
Rousset, F. GENEPOP’007: A complete re-implementation of the GENEPOP software for Windows and Linux. Mol. Ecol. Resour. 8, 103–106 (2008).
Foll, M. & Gaggiotti, O. A Genome-scan method to identify selected loci appropriate for both dominant and codominant markers: A bayesian perspective. Genetics 993, 977–993 (2008).
Excoffier, L. & Lischer, H. E. L. Arlequin suite ver 3.5: A new series of programs to perform population genetics analyses under Linux and Windows. Mol. Ecol. Resour. 10, 564–567 (2010).
Narum, S. R. & Hess, J. E. Comparison of FST outlier tests for SNP loci under selection. Mol. Ecol. Resour. 11, 184–194 (2011).
Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucl. Acids Res. 25, 3389–3402 (1997).
Romero, P. et al. Sequence complexity of disordered protein. Prot. Struct. Funct. Genet. 42, 38–48 (2001).
Jones, D. T. & Cozzetto, D. DISOPRED3: Precise disordered region predictions with annotated protein-binding activity. Bioinformatics 31, 857–863 (2015).
Mészáros, B., Erdös, G. & Dosztányi, Z. IUPred2A: Context-dependent prediction of protein disorder as a function of redox state and protein binding. Nucl. Acids Res. 46, W329–W337 (2018).
Ishida, T. & Kinoshita, K. PrDOS: Prediction of disordered protein regions from amino acid sequence. Nucl. Acids Res. 35, W460-464 (2007).
Ito, N., Komiyama, N. H. & Fermi, G. Structure of deoxyhaemoglobin of the Anctartic fish Pagothenia bernacchi and structural basis of the root effect. J. Mol. Biol. https://doi.org/10.2210/pdb1hbh/pdb (1995).
Šali, A. & Blundell, T. L. Comparative protein modelling by satisfaction of spatial restraints. J. Mol. Biol. 234, 779–815 (1993).
Gou, X. et al. Whole-genome sequencing of six dog breeds from continuous altitudes reveals adaptation to high-altitude hypoxia. Genome Res. 24, 1308–1315 (2014).
Grossman, S. R. et al. Identifying recent adaptations in large-scale genomic data. Cell 152, 703–713 (2013).
Macpherson, J. M., Sella, G., Davis, J. C. & Petrov, D. A. Genomewide spatial correspondence between nonsynonymous divergence and neutral polymorphism reveals extensive adaptation in Drosophila. Genetics 177, 2083–2099 (2007).
Howe, D. G. et al. ZFIN, the Zebrafish model organism database: Increased support for mutants and transgenics. Nucl. Acids Res. 41, 854–860 (2013).
Huber, C. D., Kim, B. Y., Marsden, C. D. & Lohmueller, K. E. Determining the factors driving selective effects of new nonsynonymous mutations. Proc. Natl. Acad. Sci. USA 114, 4465–4470 (2017).
Stenson, P. D. et al. The Human Gene Mutation Database (HGMD®): Optimizing its use in a clinical diagnostic or research setting. Hum. Genet. 139, 1197–1207 (2020).
Naruse, K., Hori, H., Shimizu, N., Kohara, Y. & Takeda, H. Medaka genomics: A bridge between mutant phenotype and gene function. Mech. Dev. 121, 619–628 (2004).
Chintalapati, M. & Moorjani, P. Evolution of the mutation rate across primates. Curr. Opin. Genet. Dev. 62, 58–64 (2020).
Rodin, R. E. et al. The landscape of somatic mutation in cerebral cortex of autistic and neurotypical individuals revealed by ultra-deep whole-genome sequencing. Nat. Neurosci. 24, 176–185 (2021).
Cayuela, H. et al. Thermal adaptation rather than demographic history drives genetic structure inferred by copy number variants in a marine fish. Mol. Ecol. 30, 1624–1641 (2021).
Kess, T. et al. A putative structural variant and environmental variation associated with genomic divergence across the Northwest Atlantic in Atlantic Halibut. ICES J. Mar. Sci. 78, 2371–2384 (2021).
Le Moan, A., Bekkevold, D. & Hemmer-Hansen, J. Evolution at two time frames: ancient structural variants involved in post-glacial divergence of the European plaice (Pleuronectes platessa). Heredity (Edinb). 126, 668–683 (2021).
Ruigrok, M. et al. The relative power of structural genomic variation versus SNPs in explaining the quantitative trait growth in the marine teleost Chrysophrys auratus. Genes (Basel). 13, 1129 (2022).
De la Herran, R. et al. A chromosome-level genome assembly enables the identification of the follicle stimulating hormone receptor as the master sex determining gene in Solea senegalensis. Mol. Ecol. Resour. 00, 1–19 (2023).
Harrison, P. W. et al. The FAANG data portal: Global, open-access, “FAIR”, and richly validated genotype to phenotype data for high-quality functional annotation of animal genomes. Front. Genet. 12, 639238 (2021).
Figueras, A. et al. Whole genome sequencing of turbot (Scophthalmus maximus; Pleuronectiformes): A fish adapted to demersal life. DNA Res. 23, 181–192 (2016).
Moore, J. S. et al. Conservation genomics of anadromous Atlantic salmon across its North American range: Outlier loci identify the same patterns of population structure as neutral loci. Mol. Ecol. 23, 5680–5697 (2014).
Barrio, A. M. et al. The genetic basis for ecological adaptation of the Atlantic herring revealed by genome sequencing. Elife 5, e12081 (2016).
Pettersson, M. E. et al. A chromosome-level assembly of the Atlantic herring genome-detection of a supergene and other signals of selection. Genome Res. 29, 1919–1928 (2019).
Bo, J. et al. Opah (Lampris megalopsis) genome sheds light on the evolution of aquatic endothermy. Zool. Res. 43, 26–29 (2022).
Wang, S. et al. Resequencing and SNP discovery of Amur ide (Leuciscus waleckii) provides insights into local adaptations to extreme environments. Sci. Rep. 11, 5064 (2021).
Meng, Z., Hu, P., Lei, J. & Jia, Y. Expression of insulin-like growth factors at mRNA levels during the metamorphic development of turbot (Scophthalmus maximus). Gen. Comp. Endocrinol. 235, 11–17 (2016).
Duan, C., Ren, H. & Gao, S. Insulin-like growth factors (IGFs), IGF receptors, and IGF-binding proteins: Roles in skeletal muscle growth and differentiation. Gen. Comp. Endocrinol. 167, 344–351 (2010).
Duan, C., Ding, J., Li, Q., Tsai, W. & Pozios, K. Insulin-like growth factor binding protein 2 is a growth inhibitory protein conserved in zebrafish. Proc. Natl. Acad. Sci. USA 96, 15274–15279 (1999).
Furqon, A., Gunawan, A., Ulupi, N., Suryati, T. & Sumantri, C. A Polymorphism of Insulin-like growth factor binding protein 2 gene associated with growth and body composition traits in Kampong Chickens. J. Vet. 19, 183 (2018).
Kibbey, M. M., Jameson, M. J., Eaton, E. M. & Rosenzweig, S. A. Insulin-like growth factor binding protein-2: Contributions of the C-terminal domain to insulin-like growth factor-1 binding. Mol. Pharmacol. 69, 833–845 (2006).
Coughlan, J. P. et al. Microsatellite DNA variation in wild populations and farmed strains of turbot from Ireland and Norway: A preliminary study. J. Fish Biol. 52, 916–922 (1998).
Zhang, H. et al. Characterization and Identification of Single Nucleotide Polymorphism within the IGF-1R gene associated with growth traits of Odontobutis potamophila. J. World Aquac. Soc. 49, 366–379 (2018).
Guo, L., Yang, S., Li, M. M., Meng, Z. N. & Lin, H. R. 2016) Divergence and polymorphism analysis of IGF1Ra and IGF1Rb from orange-spotted grouper, Epinephelus coioides (Hamilton). Genet. Mol. Res. 15, 1. https://doi.org/10.4238/gmr15048768 (2016).
Yu, X. et al. Genome-wide association analysis of adaptation to oxygen stress in Nile tilapia (Oreochromis niloticus). BMC Genomics 22, 426 (2021).
Harano, T. et al. Hemoglobin Kawachi [α44 (CE2) Pro → Arg]: A new hemoglobin variant of high oxygen affinity with amino acid substitution at α1β2 contact. Hemoglobin 6, 43–49 (1982).
Alharby, E. et al. A homozygous potentially pathogenic variant in the PAXBP1 gene in a large family with global developmental delay and myopathic hypotonia. Clin. Genet. 92, 579–586 (2017).
Ceinos, R. M. et al. Differential circadian and light-driven rhythmicity of clock gene expression and behaviour in the turbot, Scophthalmus maximus. PLoS ONE 14, e0219153 (2019).
Nishiwaki-Ohkawa, T. & Yoshimura, T. Molecular basis for regulating seasonal reproduction in vertebrates. J. Endocrinol. 229, R117–R127 (2016).
Wood, S. H. et al. Circadian clock mechanism driving mammalian photoperiodism. Nat. Commun. 11, 4291 (2020).
Piovesan, D. et al. DisProt 7.0: A major update of the database of disordered proteins. Nucl. Acids Res. 45, 219–227 (2017).
Pajkos, M. & Dosztányi, Z. Chapter Two - Functions of intrinsically disordered proteins through evolutionary lenses. in Dancing Protein Clouds: Intrinsically Disordered Proteins in the Norm and Pathology, Part C (ed. Uversky, V. N. B. T.-P. in M. B. and T. S.) vol. 183 45–74 (Academic Press, 2021).
Malagrinò, F. et al. Understanding the binding induced folding of intrinsically disordered proteins by protein engineering: Caveats and pitfalls. Int. J. Mol. Sci. 21, 3484 (2020).
Doyle, A., Cowan, M. E., Migaud, H., Wright, P. J. & Davie, A. Neuroendocrine regulation of reproduction in Atlantic cod (Gadus morhua): Evidence of Eya3 as an integrator of photoperiodic cues and nutritional regulation to initiate sexual maturation. Comput. Biochem. Physiol. -Part A Mol. Integr. Physiol. 260, 111000 (2021).
Silver, S. J., Davies, E. L., Doyon, L. & Rebay, I. Functional dissection of eyes absent reveals new modes of regulation within the retinal determination gene network. Mol. Cell. Biol. 23, 5989–5999 (2003).
Jin, M. & Mardon, G. Distinct biochemical activities of eyes absent during drosophila eye development. Sci. Rep. 6, 23228 (2016).
McGowan, K. L., Passow, C. N., Arias-Rodriguez, L., Tobler, M. & Kelley, J. L. Expression analyses of cave mollies (Poecilia mexicana) reveal key genes involved in the early evolution of eye regression. Biol. Lett. 15, 20190554 (2019).
Cui, W. et al. Transcriptomic analysis reveals putative osmoregulation mechanisms in the kidney of euryhaline turbot Scophthalmus maximus responded to hypo-saline seawater. J. Oceanol. Limnol. 38, 467–479 (2020).
Mármol-Sánchez, E., Quintanilla, R., Cardoso, T. F., Jordana Vidal, J. & Amills, M. Polymorphisms of the cryptochrome 2 and mitoguardin 2 genes are associated with the variation of lipid-related traits in Duroc pigs. Sci. Rep. 9, 9025 (2019).
Takvam, M., Wood, C. M., Kryvi, H. & Nilsen, T. O. Ion transporters and osmoregulation in the didney of teleost fishes as a function of salinity. Front. Physiol. 12, 664588 (2021).
Engelund, M. B. & Madsen, S. S. The role of aquaporins in the kidney of euryhaline teleosts. Front. Physiol. 2, 51 (2011).
Nam, B. H. et al. Identification and characterization of the prepro-vasoactive intestinal peptide gene from the teleost Paralichthys olivaceus. Vet. Immunol. Immunopathol. 127, 249–258 (2009).
Paladini, F. et al. Age-dependent association of idiopathic achalasia with vasoactive intestinal peptide receptor 1 gene. Neurogastroenterol. Motil. 21, 597–602 (2009).
Hosseinpour, L., Nikbin, S., Hedayat-Evrigh, N. & Elyasi-Zarringhabaie, G. Association of polymorphisms of vasoactive intestinal peptide and its receptor with reproductive traits of turkey hens. South Afr. J. Anim. Sci. 50, 345–352 (2020).
Pereiro, P., Figueras, A. & Novoa, B. A novel hepcidin-like in turbot (Scophthalmus maximus L.) highly expressed after pathogen challenge but not after iron overload. Fish Shellfish Immunol. 32, 879–889 (2012).
Zhang, J., Yu, L., Ping, L., Fei, M. & Sun, L. Turbot (Scophthalmus maximus) hepcidin-1 and hepcidin-2 possess antimicrobial activity and promote resistance against bacterial and viral infection. Fish Shellfish Immunol. 38, 127–134 (2014).
Acknowledgements
This study was supported by Consellería de Educación, Universidade e Formación Profesional from Xunta de Galicia (Grant No. ED481A2020/119), which additionally supported Oscar Aramburu PhD Thesis with a fellowship (Grant No.: ED481A-2020/119). The authors wish to thank the provision of DNA samples and population information used in this study of the EU AQUATRACE (No. 311920) project and to the Flanders Research Institute for Agriculture, Fisheries and Food (ILVO, Belgium).
Author information
Authors and Affiliations
Contributions
P.M. and Ø.A. designed the research. D.R., O.A., C.B. and J.A.R. performed the genomic analyses to construct N.S.V. atlas. M.P. and P.M. carried out the population genomics analyses. M.C.D.R., D.P. and B.R. performed protein modelling. P.M., Ø.A. and M.C.D.R. wrote the paper. All authors have revised and approved the submitted version.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Andersen, Ø., Rubiolo, J.A., Pirolli, D. et al. Non-synonymous variation and protein structure of candidate genes associated with selection in farm and wild populations of turbot (Scophthalmus maximus). Sci Rep 13, 3019 (2023). https://doi.org/10.1038/s41598-023-29826-z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-023-29826-z
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.