Detection of selection signatures in farmed coho salmon (Oncorhynchus kisutch) using dense genome-wide information

Animal domestication and artificial selection give rise to gradual changes at the genomic level in populations. Subsequent footprints of selection, known as selection signatures or selective sweeps, have been traced in the genomes of many animal livestock species by exploiting variation in linkage disequilibrium patterns and/or reduction of genetic diversity. Domestication of most aquatic species is recent in comparison with land animals, and salmonids are one of the most important fish species in aquaculture. Coho salmon (Oncorhynchus kisutch), cultivated primarily in Chile, has been subjected to breeding programs to improve growth, disease resistance traits, and flesh color. This study aimed to identify selection signatures that may be involved in adaptation to culture conditions and traits of productive interest. To do so, individuals of two domestic populations cultured in Chile were genotyped with 200 thousand SNPs, and analyses were conducted using iHS, XP-EHH and CLR. Several signatures of selection on different chromosomal regions were detected across both populations. Some of the identified regions under selection contained genes such anapc2, alad, chp2 and myn, which have been previously associated with body weight in Atlantic salmon, or sec24d and robo1, which have been associated with resistance to Piscirickettsia salmonis in coho salmon. Findings in our study can contribute to an integrated genome-wide map of selection signatures, to help identify the genetic mechanisms of phenotypic diversity in coho salmon.

The implementation of genetic improvement programs in domestic populations to enhance desirable production traits, combined with the natural adaptation to new environments, has led to a variety of phenotypic variation between domestic populations of plants and animals 1 as well as a strong differentiation between wild and domestic populations. Both natural and artificial selection in domestic species have left footprints across the genome, known as selection signatures, which can point to regions harboring functionally important sequences of DNA 2,3 . Theoretically, a beneficial variant that has been under selection can show long-range linkage disequilibrium (LD) and high frequency over a long period of time 4,5 . Therefore, these patterns allow us to detect such selection signatures in genomes. When successful, selection signature identification can provide novel insights into mechanisms that create diversity across populations and contribute to mapping of genomic regions underlying selected traits or phenotypes 6,7 . Approaches for detecting selection signatures rely on scanning variants across the genome of interest, searching for regions in which (i) the allele frequency spectrum is shifted towards extreme (high or low) frequencies; (ii) there is an excess of homozygous genotypes; (iii) there are long haplotypes with high frequencies and (iv) there is an extreme differentiation among populations. Several statistical methods has been developed to search for selection signatures, such as extended haplotype homozygosity (EHH) 8 , integrated haplotype score (iHS) 9 , hapFLK 10 , Cross Population Extended Haplotype Homozogysity (XP-EHH) 11 , the composite likelihood ratio (CLR); runs of homozygosity (ROH) 12 and F ST statistics 13 .
In fishes, domestication is recent in comparison with other land animals 14 , although some evidence of fish farming dates back approximately 3,500 years ago 15 . An exponential development of aquaculture has occurred since 1960, relying on the domestication of a handful aquatic species 14 . Salmonid species in particular have seen very intensive production increases over the last four decades. Currently, the most important farmed salmonids species in the word are Atlantic salmon (Salmo salar), rainbow trout (Oncorhynchus mykiss) and coho salmon Scientific Reports | (2021) 11:9685 | https://doi.org/10.1038/s41598-021-86154-w www.nature.com/scientificreports/ (Oncorhynchus kisutch). Coho salmon belong to the Oncorhynchus genus and are naturally distributed across Pacific North American and Asian watersheds 16 . About 90% of farmed coho salmon production is based in Chile 17 . Farmed populations of coho salmon were established in Chile at the end the 1970′s, with the importation of ova from the Kitimat river (British Columbia) and the US state of Oregon 18 . During the 1990´s, a variety of breeding programs have been implemented for this species in Chile 18 , mainly focused on improving growth, disease resistance traits and flesh color 18 . The implementation of such breeding programs have likely, shaped the genetic diversity and haplotype structure of these populations, and their present genomes may contain traceable signatures of selection. Genomic scans for detecting selection signatures have been successfully applied to several domestic animals, including aquaculture species. For instance, selection signature scans have been carried out for Atlantic salmon [19][20][21][22][23][24] , Nile tilapia 25,26 ; and channel catfish 27 . To date, there are no studies focusing on the identification of selection signatures in farmed coho salmon populations. To increase the knowledge of domestication and selection effects on the genome diversity in farmed populations of this species, we searched for selection signatures in two groups comprising farmed individuals of coho salmon using different statistical approaches based in haplotype structure and allele frequency spectrum. Tracing signatures of selection in this species could provide further insights on genomic regions responsible for important aquaculture features of domestic Pacific salmon.

Methods
Populations. In this study we used samples representing two different lines that come from a breeding program of coho salmon maintained in Chile, hereinafter pupulations A (Pop-A) and B (Pop-B): we used 89 individuals for A and 43 individuals for B. These breeding populations were established in 1997 and 1998, belonging to even and odd spawning year, respectively 28 . According to previous analyses (Ben Koop, unpublished data), it is very likely that these two populations correspond to a mixture of the two original broodstocks (Kitimat river and Oregon). Both populations have been selected to improve growth rate for around eight generations by using Best Linear Unbiased Prediction 29,30 with genetic gains ranging between 6 to 13% per generation 31  Genotyping. Genomic DNA was extracted from fin clips of each individual and genotyped with a 200 K Thermo Fisher Scientific Axiom myDesign Custom Array developed by the EPIC4 genome consortium 32 (http:// www. sfu. ca/ epic4/).
Genotype calling was performed using Axiom Analysis Suite v3.1 (Thermo Fisher Scientific) following the Axiom Analysis user guide. After filtering, including removing markers with no position on the coho salmon reference genome (GCF_002021735.1) and markers that were identified as problematic (OTV, Call Rate Below Threshold, Other), 167,486 SNP were kept 32 . In this study, we removed markers positioned in scaffolds, which were not placed within chromosomes; therefore 136,500 markers were kept for downstream analyses. All these markers and samples passed the quality criteria (missing call rate ≤ 0.1).
Genetic diversity, linkage disequilibrium (LD) and population structure. A common subset of 72,616 SNPs with MAF ≥ 0.05 and no deviation from Hardy-Weinberg equilibrium in populations A and B were explored to describe the genetic diversity of each population. Three parameters, including the number of SNPs with MAF ≥ 0.05; observed heterozygosity (H O ), and expected heterozygosity (H E ) were calculated using PLINK 33 .
We used the squared correlation coefficient between SNP pairs (r 2 ) to measure LD 34 , which was calculated for all syntenic marker pairs. To enable a clear presentation of results showing LD in relation to physical distance between markers, SNP pairs were put into bins of 100 kb, and mean values of r 2 were calculated for each bin. The mean r 2 for each of the distance bins was then plotted against the distance bin range (Mb). This analysis was carried out on a chromosome-by-chromosome basis.
To visualize genetic differentiation between populations a Principal component analysis (PCA) was conducted using PLINK. In addition, ADMIXTURE 35 was employed to analyze the population structure, which was run with 10,000 iterations using the correlated allele model with K value from 1 to 20 to choose the optimal number of clusters at the lowest cross-validation error. Results were plotted using R 36 .
Selection signatures. We combined three methods (XP-EHH, iHS and CLR) which have shown to have a power > 70% to detect selection signatures in comparison with other combinations of statistical tests 5 . Removing markers deviating from Hardy-Weinberg equilibrium might be obfuscating the important deviations such as those resulting from evolutionary selection 37 , therefore, we did not apply Hardy-Weinberg equilibrium filtering for selection signatures analyses. In addition, different MAF thresholds were used for each selection signatures test, which are described below.
(1) XP-EHH. The XP-EHH (Cross Population Extended Haplotype Homozygosity) statistic compares the integrated extended haplotype homozygosity (EHH) profiles between two populations at the same SNP 11 . Therefore, the computation of EHH is required for each population. www.nature.com/scientificreports/ are normally distributed, a Z-test is applied to identify significant SNPs under selection. We used -log 10 (p value) = 3 (p value ≤ 0.001) as the threshold for considering XP-EHH score as significant evidence of selection and at least two SNPs ≤ 500 kb apart. (2) iHS. The iHS statistics was used to detect footprints of selection within the studied populations. This test is based on the standardized log ratio of integrals of the decay of the EHH, computed for both ancestral and derived alleles of the focal SNP. Selection induces hitchhiking (genetic draft), that leads to extended haplotypes for the selected allele and a slower fall-off of EHH on either side of the selected locus. Thus, a high iHS scoring SNP is typically associated with longer haplotypes and lower neighborhood diversity compared to the other SNPs 39 . Phased haplotypes were obtained using Beagle 40 . The ancestral and derived alleles of each SNP were inferred in two ways: (i) the SNP probes were compared with the genome of Atlantic salmon, which was used as an outgroup species; (ii) for SNPs that could not be compared, the ancestral allele were inferred as the most common allele in the total dataset, as suggested in other studies (e.g. 41,42) . SNPs with MAF < 0.05 were excluded from iHS analysis in accordance with 9 , therefore, we used 95,217 and 90,101 SNPs for populations A and B, respectively. The iHS score was computed for each autosomal SNP using the R package rehh 3.1.1 38 , using default options. Similar to XP-EHH, iHS values are normally distributed, and a Z-test is applied to identify significant SNPs under selection. We also used -log 10 (p value) = 3 (p value ≤ 0.001) as the threshold for considering iHS score as significant evidence of selection and at least two SNPs ≤ 500 kb apart. (3) CLR: We implemented CLR test in the software SweeD V3.3.2 43 , downloaded from https:// cme.h-its. org/ exeli xis/ web/ softw are/ sweed/. SweeD evaluates the variation in the site-frequency spectrum along the chromosome and implements the composite likelihood ratio (CLR) statistics 44 . CLR computes the ratio of the likelihood of a selective sweep at a given position to the likelihood of a null model without a selective sweep. Following other studies (e.i. [45][46][47], SNPs with MAF < 0.05 were excluded from CLR analysis as well, therefore, the same subset of SNPs were used as for iHS. We calculated the CLR within each population and for each chromosome separately at grid points for every 20 kb. We used the 99.5th percentile of the distribution of CLR scores as threshold for the detection of outliers.

Candidate genes and functional analysis. A genomic region was considered as being under selection
if it matched the criteria described above for iHS, XP-EHH and CLR. For each identified selective sweep region, we extended the region containing outlier scores by adding 250 kb to each end, to account for potential blocks of high linkage disequilibrium. Gene annotation was performed using GCF_002021735.1 coho salmon genome assembly (Accession PRJNA378663, Okis_V1; https:// www. ncbi. nlm. nih. gov/ assem bly/ GCF_ 00202 1735.1) 48 from the NCBI Eukaryotic genome annotation pipeline. The Database for Annotation, Visualization, and Integrated Discovery (DAVID) v6.8 tool 49 was used to identify Gene Ontology (GO) terms and KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways using a list of genes within significant regions based on iHS, XP-EHH, CLR values and the zebrafish annotation file as a reference genome.
Ethics approval and consent to participate. Coho salmon individuals and sampling procedures were approved by the Comité de Bioética Animal from the Facultad de Ciencias Veterinarias y Pecuarias, Universidad de Chile (certificate Nº No. 08-2015).  ,101). In addition LD presented a higher level in Pop-B (r 2 = 0.14) than Pop-A (r 2 = 0.10) (Fig. 1). LD decayed faster in Pop-A, reaching a level of 0.14 around 500 kb, while in Pop-B at the same distance LD reaches a value of 0.18. A similar pattern was observed for most of the chromosomes, except for, and Oki1, Oki21, Oki28, where LD decay was slightly higher in Pop-B and Oki-10, where both populations presented a similar pattern of LD decay ( Supplementary Fig. S1).

Results
Principal component analysis (PCA) was employed to explore the clustering of individuals from the two populations. The PCA revealed two distinct clusters corresponding to population A and B. Principal components (PCs) 1, 2 and 3 jointly accounted for 39.7% of total variance, with PC 1 (26.1%) separating A and B; PC 2 (7.6%) and PC 3 (6%) not clearly discriminating populations. However, Pop-B produced a much more heterogeneous group with individuals spread along both PC 2 and 3 (Fig. 2). ADMIXTURE analysis revealed that the lowest cross-validation error was reached at K = 10 ( Supplementary Fig. S2 and S3), showing 10 ancestral lineages for these two populations (   (Table S3) Table S4. We further investigated the functions associated with the putative genes undergoing positive selection by analyzing over-represented GO terms and pathways using DAVID. The identified GO terms and pathways are shown in Table S5. A total of 21 and 13 GO terms were observed for A and B, respectively. This includes 13 terms for biological process (BP), 3 for molecular function (MF), and 9 for cellular component (CC). The BP category is shown in Fig. 6; most of these terms are associated with basic metabolic processes, such as Developmental Process, Multicellular Organismal Process, Single-Organism Process, Response To Stimulus, Developmental Process, Presynaptic Process and Growth, among other. Additionally, 11 pathways were enriched (Table S5)

Discussion
Selective events are the major mechanism driving differentiation of populations. In this study, we assessed the genetic structure and detected selection signatures in two domestic populations of coho salmon by using three complementary statistical methods. First, we investigated basic population genetics statistics. Genetic diversity estimates, assessed by observed and expected heterozygosity, were in the upper limit of those reported in other salmon species introduced for farming purposes in Chile, such as Atlantic salmon, in which heterozygosity levels of these populations have been shown to range from 0.266 to 0.37 21,22 . In addition LD, which can be informative of population demography, decays similarly to what has been observed on previous studies in farmed coho salmon population from Chile 32 , and more rapidly than farmed Atlantic salmon 52 . Thus, the higher levels of genetic diversity and the lower levels of LD in farmed coho salmon populations suggest less impact of artificial selection on genome diversity, likely due to a shorter period of time of domestication in comparison with Atlantic salmon aquaculture populations from Chile. Genetic structure analyses revealed a high level of admixture in both populations. It is well known that demographic forces, such as admixture can mimic the patterns of genetic variation expected under selection 53,54 . Therefore, we cannot discard the presence of false positives in the detection of selection signatures in this study, due to a high degree of mixed ancestry of these populations. Several methods have been developed for detecting selection signatures in the genome. These methods can be grouped into three categories based on the type of genetic information exploited: population differentiation, site-frequency spectrum and linkage disequilibrium approaches 18 . In the present study we implemented three    www.nature.com/scientificreports/ different tests (iHS, XP-EHH and CLR) to comprehensively identify candidate regions of positive selection in coho salmon. As is shown in Fig. 7, different genomic regions were detected by these methods. CLR have the largest overall number of signals detected to be underlying selection, followed by iHS and XP-EHH. No overlap was observed among the three methods, however CLR and iHS (Fig. 7), showed five overlapping regions on Oki5, Oki6 and Oki14 spanning a total of 2.02 Mb. Low overlap among methods based on allele frequencies compared to those based on haplotype patterns has been observed in previous studies in Atlantic salmon 21,23 . Discrepancies between loci under putative selection detected by different methods are expected, given that they benefit from different sources of information from genome variation. For instance, iHS test has advantage in detecting selective sweep with variants at intermediate frequencies 11 , while XP-EHH is more powerful at detecting complete or nearly complete selective sweeps 11 . On the other hand, a low overlap between the two populations was observed (Fig. 7). Four chromosomes showed common regions harboring selection signatures: Oki5 (1.86 Mb); Oki14 (946 kb), Oki16 (41.5 kb) and Oki26 (57.3 kb). We suggest this low overlap among populations might be the effect of breeding program goals, which predominantly are focused on improving polygenic traits (e.g., growth), therefore, different genomics regions on different populations are affected by selection across the genome. The population-specific signals suggest that selection may have acted upon different genes, which has been already shown in other species and can be interpreted as a polygenic basis of parallel selection traits 55 . Several studies in fish have suggested that the same phenotype may arise from different genetic pathways among populations 21,23,[56][57][58][59] . Evolution of complex parallel phenotypes can indeed emerge from different evolutionary processes and this is likely to happen when inbreeding and genetic drift play a greater role than selection 21 . On the other hand, it is worth mentioning that a low coverage of the genome using a SNP array might have caused lacking of information of some important One of the goals of this study was to identify putative candidates genes involved in the domestication process and artificial selection of coho salmon. In accordance with the primary genetic improvement goal, where growth rate has been the main focus, we found a series of genes potentially relevant for this trait, including anapc2; alad; bdkrb2; fam98b; chp2; myn that were previously associated with body weight in a genome-wide association study in Atlantic salmon (GWAS) 62 and sytl5, txnrd1 genes associated with growth traits in juvenile farmed Atlantic salmon 63 . The chp2 gene was also found to be putatively under selection in an Atlantic salmon www.nature.com/scientificreports/ domestic population selected for fast growth 22 . We also identified the kdm6a gene, which has been involved in the regulation of anterior/posterior body axis development in zebrafish 50 and has been related to growth-related traits in pigs 64 . Furthermore, we identified the biological processes "Developmental Process" and "Growth", within the significant terms associated with the genes harbored in the identified genomic regions, suggesting that loci controlling growth and development are most likely under selection in these farmed coho salmon populations. On the other hand, host-pathogen interactions lead to strong selection in the genome of vertebrate species 65 . We suggest that adaptation to farming environment has imposed selection on genomic regions related to the immune system in coho salmon. Here we found the cnpy2, synrg and med10 genes, which were previously shown to be affected by parasite-driven selection in Atlantic salmon 66 . More importantly, we detected the genes sec24d and robo1 that have been associated with disease resistance in the face of a bacterial disease (Piscirickettsia salmonis) in coho salmon populations farmed in Chile 51 .
Artificial selection may be applied either inadvertently (unconsciously) or intentionally (consciously) 67 . For most livestock species, it is thought that the early stages of domestication involved unconscious selection for behavioral traits (e.g., for tameness and reduced aggression), followed by selection focused on breeding objectives 68 . In fish, behavioral traits such as swimming capacities, foraging, social interactions, reproduction, or personality and cognitive abilities, are also modified by domestication 69 . In the present study we identified the genes robo1 and dcdc2 which are associated to complex cognitive acquired skills, including spoken and written language in human 70 . We also identified the gtf2ird1 associated to aggression and social interaction in mice 71 and recently shown to be under selection in Nile tilapia 25 . We suggest that such genes could be associated with behavioral traits in coho salmon as well.
Sexual maturation and reproductive traits have profound importance from both evolutionary and economic perspective. In fish farming, maturation is frequently delayed by exposing fish to continuous light or light regimes, which are different to those, found in natural environment, affecting the perception of seasonality and circannual rhythms 72 . In this study, we identified the picalmb and tsku genes with evidence of selection. These genes were recently associated to maturation traits in Atlantic salmon 73 . Furthermore, we also detected the vgll4b, which has previously shown signs of selection in Atlantic salmon farmed populations 22 . It is important to mention that a gene from the VGLL (vgll3) family is associated with age to maturity in natural populations of Atlantic salmon 74,75 .

Conclusion
This study reported the identification of selection signatures in the genome of aquaculture lines of coho salmon. Our analyses of two domestic populations revealed putative selection signatures of genomic regions containing genes involved in growth, immune system, behavior and maturation traits. As expected, these findings are congruent with the breeding goal for the populations studied here, which includes the most highly selected trait, growth rate. In addition, the effect of domestication and adaptation to captive environment, most likely affecting loci controlling traits that have not been directly part of the breeding objectives, including immunity and, behavior. These results provide further insights into the genome diversity changes driven by domestication and selection mechanisms in farmed coho salmon populations.