Introduction

American mink (Neogale vison) is one of the most important animals in the global fur industry, yet requires highly efficient breeding programs to deal with challenges faced in mink production systems1. Several studies have been carried out to discern the genetics of complex traits affecting the sustainability of mink production, including growth2,3, feed efficiency4,5, disease resistance6,7,8,9, pelt quality10,11,12, and reproduction traits13,14,15. Understanding the genetic architecture underlying such traits using genome-wide association studies (GWAS) might contribute to expediting the genetic progress through selection, and therefore enhance the production efficiency of the mink industry.

Copy number variations (CNVs) refer to frequently observed structural variations in the form of deletions or duplications greater than 50 base pairs16, which cover more of the genome (total bases) and have a higher mutation rate than single nucleotide polymorphisms (SNPs)17. Similar to SNPs, CNVs can be applied to detect associations with traits of economic interest in livestock species, and therefore, are considered complementary sources to explain genetic variation contributing to differences in phenotypes18. Over the past decade, multiple studies have been carried out to examine the association between CNVs with several important phenotypes in livestock species, such as reproduction19,20, health21,22,23, feed efficiency and growth24,25,26, and performance traits27,28,29.

The availability of a high-quality chromosome-based genome assembly1 and a genome-wide SNP array for American mink facilitates the identification of genetic variations underlying economically important traits. Recently, Davoudi et al.4 characterized the CNVs in American mink using whole-genome sequencing data. However, up to now, the CNV-based GWAS with economically important traits has not been reported in mink. Therefore, this study aimed to identify CNV in a large sample of genotyped mink and perform CNV-based GWAS analyses for Aleutian disease tests, growth and feed efficiency, reproduction and pelt quality traits. In addition, we performed functional annotation of the associated CNV regions (CNVRs) to identify the potential candidate genes for these key traits.

Materials and methods

All procedures applied for this study were approved by the Dalhousie University Animal Care and Use Committee, and we adhered to the Code of Practice for the Care and Handling of Farmed Mink guidelines30 throughout all phases of the research.

Sampling, DNA extraction, and genotyping

The current study involved individuals from two separate farms: the Canadian Center for Fur Animal Research (CCFAR) at Dalhousie University, Faculty of Agriculture in Truro, Nova Scotia, Canada, comprising 1411 individuals, and Millbank Fur Farm (MFF) in Rockwood, Ontario, Canada, consisting of 1562 individuals. More details about the animals utilized in this study can be found in our previous work by Hu et al.31. Mink were raised in standard farming settings, receiving diets from the byproducts of human food production. These diets were adjusted to meet the specific needs of the animals in each production phase. Detailed information about the feed ingredients, chemical composition, and energy content of these diets during different periods can be found in our prior publication32.

DNA was extracted from tongue tissue of the animals utilizing the DNeasy Blood and Tissue Kit (Qiagen, Hilden, Germany) following the manufacturer’s instructions. Evaluation of DNA quantity and quality was conducted using a NanoDrop ND-1000 spectrophotometer (NanoDrop Technologies Inc., Wilmington, DE), with consistent 260/280 nm readings falling between 1.7 and 2.0 across all samples. After reaching a final concentration of 500 ng, the samples underwent quality checks for DNA and were subsequently genotyped using the Axiom Affymetrix Mink 70K array (Neogen, Lincoln, Nebraska, United States)33.

CNV detection

Initially, the Axiom™ Analysis Suite (Affymetrix®) was applied to perform quality control of raw intensity files and filter genotypes based on dish QC (DQC) values less than 0.82, and a minimum call rate of 97%, following the ‘Best Genotyping Practices’ Workflow described in Axiom™ Genotyping Solution Data Analysis Guide34. The SNPs in sex chromosomes were excluded and only those that passed the quality control were kept for further analyses. The final dataset contained 47,644 SNPs that were located on autosomal chromosomes.

The CNV detection was performed with PennCNV v.1.0.5 software35, using the signal intensity ratios (Log R Ratio, LRR) and allelic frequencies (B Allele Frequency, BAF) obtained from the Axiom® CNV Summary Tool software (Affymetrix®). Then, the PFB (Population Frequency of B allele) file was compiled based on the BAF of each marker in the whole population, using the PennCNV ‘compile_pfb.pl’ function. The GC content around each SNP marker is known to affect the signal strength through the potential interference of genomic waves36. Therefore, we first estimated the percentage of GC content of 1-Mb genomic regions surrounding each marker (500 kb on each side) using faToTwoBit and hgGcPercent tools provided by UCSC Genome Browser37 and the FASTA information of the American mink genome assembly1. Next, the GC content file was implemented in PennCNV by ‘-gcmodel’ function, which applies a regression model for adjusting the high GC content and recovers samples affected by genomic waves38. To achieve high-confidence CNV calls, quality control was applied with the following criteria: standard deviations for LRR < 0.35, BAF drift < 0.01, and waviness factor value between − 0.05 and 0.05. We only retained those CNVs longer than 1 kb in length including at least three consecutive SNPs located on autosomal chromosomes28. Finally, 2063 high-quality samples were kept for subsequent analyses.

Deregressed EBV values

The deregressed estimated breeding values (dEBVs) were calculated for 27 economically important traits, including 11 growth and feed efficiency traits, eight reproduction traits, five pelt quality traits, and three Aleutian disease tests. The growth and feed efficiency traits include harvest weight (HW), harvest length (HL), final body weight (FBW), final body length (FBL), daily feed intake (DFI), average daily gain (ADG), feed conversion ratio (FCR), residual feed intake (RFI), residual gain (RG), residual intake and gain (RIG), and Kleiber ratio (KR). The reproduction traits include gestation length (GL), total number of kits born (TB), number of kits alive at birth (LB), number of kits alive at weaning (LW), survival rate at birth (SB), average kit weight per litter at birth (AWB), average kit weight per litter at weaning (AWW), and survival rate at weaning (SW). The pelt quality traits include dried pelt size (DPS), overall quality of dried pelt (DQU), dried pelt nap size (DNAP), live grading overall quality of fur (LQU), and live grading nap size (LNAP). The Aleutian disease tests include counterimmunoelectrophoresis (CIEP), the Aleutian mink disease virus (AMDV) capsid protein-based enzyme-linked immunosorbent assay (ELISA-P), and the AMDV antigen-based enzyme-linked immunosorbent assay (ELISA-G). Breeding values were estimated for all individuals using different animal models for growth and feed efficiency described in detail by4, for reproduction traits described in detail by13, for pelt quality traits described in detail by10, and for Aleutian disease tests described in detail by7.

The EBV reliabilities were calculated using the following formula:

$$EBV \,\,reliabilities =1- \frac{prediction \,\,error \,\,variance}{additive\,\, genetic \,\,variance \,\,of \,\,the \,\,trait}$$

Next, the EBV reliabilities were applied to calculate the dEBVs using the method proposed by39. The calculations were performed by the ‘wideDRP’ function in DRP package40 in the R environment41, by setting the estimated heritability and the default value of 0.5 for the c parameter, which indicates the proportion of genetic variance not explained by markers. The descriptive statistics of the dEBVs for all traits are summarized in Table 1. We removed the animals with dEBV reliability lower than 0.20. The dEBVs were used as the pseudo-phenotype for the association analyses.

Table 1 Descriptive statistics of the deregressed EBV (dEBVs) for growth, feed efficiency, Aleutian disease tests, pelt quality and reproduction traits in American mink.

CNV association analysis

ParseCNV2 software42, which integrates PLINK43 for association analyses, was used to detect the association between CNV and dEBVs of the studied traits. ParseCNV2 software converts the CNV events into probe-based statistics for individual CNVs42. Since CNV boundaries differ among individuals, it may be difficult to determine the exact start and end points of CNVs, which makes it challenging to classify different CNVs. Therefore, the CNV association tests were conducted for deletions or duplications separately at the probe level. The following model was applied for association testing:

$${\varvec{y}}={\varvec{X}}{\varvec{b}}+{\varvec{e}},$$

where \({\varvec{y}}\) is the vector of dEBVs, \({\varvec{X}}\) is the design matrix relating dEBVs to fixed effect of one CNV at a time, \({\varvec{b}}\) is the fixed effect of CNV, and \({\varvec{e}}\) is the vector of random residual effects. The association test output was used to merge neighboring SNPs in proximity (less than 1 Mb apart) with comparable association significance (± 1 log p-value) into CNV regions (CNVRs), which constitute a genomic span of at least two consecutive probes. The local lowest P-value for identified probes was chosen to indicate the significant level of the whole CNVRs. To consider multiple testing correction, a threshold less than 5 × 10−4 was applied to consider a CNVR significantly associated with the phenotypes, as proposed by the ParseCNV2 developers42,44.

Gene annotation

The list of genes in the latest American mink reference genome (ASM_NN_V1)1 was downloaded from the NCBI and the ‘intersect’ function in Bedtools45 was used to detect the genes that overlapped with significant CNV regions. Finally, an extensive review of the literature was performed to investigate the biological function of identified candidate genes.

Results

CNV identification and distribution

Using the PennCNV software based on the Hidden Markov Model method35, a total of 10,137 CNV events were identified from 2063 individuals that passed the quality control criteria (Supplementary Table S1). While PennCNV is extensively utilized for CNV detection in genotyping array data, it is essential to note its limitations. The internal HMM model applied in the software specifically considers successive SNPs at each step, making it particularly sensitive to local noise. This sensitivity often results in false positives, over-segmentation (where a true CNV is incorrectly divided into smaller segments), and generally imprecise boundaries in the PennCNV calls46. Among the total identified CNVs, 6968 (68.74%) were duplications and 3169 (31.26%) were deletions, with a deletions/duplications CNV ratio of 0.45. The length of the CNV events ranged from 1.05 to 6148.34 kb, with an average size of 109.69 kb. Table 2 presents the descriptive statistics of the identified CNVs in the American mink genome. Analysis of the distribution of CNV size showed that approximately half of the CNVs ranged from 1 to 50 kb, with relatively rare CNV events (3.97%) larger than 500 kb (Fig. 1a). The number of CNVs on each chromosome and the chromosome length showed a strong positive linear correlation (Fig. 1c, r = 0.78), such that 1282 CNVs were identified for the largest chromosome (Chromosome 1) and 225 CNVs for the smallest chromosome (chromosome 14; Fig. 1b).

Table 2 Descriptive statistics of CNVs detected in American mink genome.
Figure 1
figure 1

Graphical representation of identified CNVs: (A) Distribution of CNV sizes, (B) numbers of CNVs identified across autosomal chromosomes, (C) correlation between CNV numbers and chromosome length.

Association analyses

In order to explore the effect of CNVs on the complex phenotypes, CNV-based association analyses were carried out for the 27 economically important traits in American mink. Association analyses revealed that 250 CNVRs (71 deletions and 179 duplications) were significantly associated with at least one of the studied traits (P < 0.0005). Manhattan plots for significant CNVRs across the autosomes associated with all studied traits are shown in Figs. 2, 3, 4, 5. These significant regions were identified across all 14 autosomes, while chromosome one showed the largest number (n = 53).

Figure 2
figure 2

Manhattan plots for CNV regions across the 14 autosomal chromosomes associated with feed efficiency and growth traits. The horizontal line in each plot represents the threshold for significance (P < 0.0005) suggested by ParseCNV2 developers. HW: harvest weight; HL: harvest length; FBW: final body weight; FBL: final body length; ADG: average daily gain; DFI: daily feed intake; FCR: feed conversion ratio; KR: Kleiber ratio; RFI: residual feed intake; RG: residual gain; RIG: residual intake and gain.

Figure 3
figure 3

Manhattan plots for CNV regions across the 14 autosomal chromosomes associated with reproduction traits. The horizontal line in each plot represents the threshold for significance ( P < 0.0005) suggested by ParseCNV2 developers. GL: gestation length; TB: total number of kits born; LB: number of kits alive at birth; LW: number of kits alive at weaning; SB: survival rate at birth; AWB: average kit weight per litter at birth; AWW: average kit weight per litter at weaning; SW: survival rate at weaning.

Figure 4
figure 4

Manhattan plots for CNV regions across the 14 autosomal chromosomes associated with pelt quality traits. The horizontal line in each plot represents the threshold for significance (P < 0.0005) suggested by ParseCNV2 developers. DPS: dried pelt size; DQU: overall quality of dried pelt; DNAP: dried pelt nap size; LQU: live grading overall quality of fur; LNAP: live grading nap size.

Figure 5
figure 5

Manhattan plots for CNV regions across the 14 autosomal chromosomes associated with Aleutian disease tests. The horizontal line in each plot represents the threshold for significance (P < 0.0005) suggested by ParseCNV2 developers. CIEP: counterimmunoelectrophoresis test; ELISA-P: VP2 based enzyme-linked immunosorbent assay test; ELISA-G: AMDV-G based enzyme-linked immunosorbent assay test.

The overview of the top significant CNVRs associated with each studied trait is shown in Table 3. Detailed information for all significant CNVRs with their overlapped candidate genes is provided in Supplementary Table S2. The highest number of significant CNVRs (n = 27) were associated with TB, compromising the most significant region (ID: CNVR54) with a p-value of 3.58 × 10–14. In addition, the average length of significant CNVRs was 66.2 kb, ranging from 1.23 to 444.54 kb.

Table 3 Overview of the top significant CNVRs associated with all studied traits in American mink.

Candidate genes within the significant CNVR

We further investigated the candidate genes encompassing the significant CNVRs. The results revealed that a total of 320 potential candidate genes overlapped with significant CNVRs based on the annotation of the American mink genome (Supplementary Table S2). The duplication CNVR on chromosome 7 (ID: CNVR143) overlapped with the highest number of genes (n = 13) while no genes identified within 80 significant CNVRs.

Using the information from the GeneCards database and an extensive literature review, several candidate genes were found to be related to growth and feed efficiency traits (ARID1B, APPL1, TOX, and GPC5), reproduction traits (GRM1, RNASE10, WNT3, WNT3A, and WNT9B), pelt quality traits (MYO10, and LIMS1), and Aleutian disease tests (IFNGR2, APEX1, UBE3A, and STX11).

Discussion

Genome-wide association studies using SNP markers have been instrumental in unraveling the underpinning of complex traits47. In recent years, CNVs have gained widespread utilization as a supplementary tool in association studies, adding in the identification of genetic variants associated with economically important traits and shedding light on the elucidating the genetic basis of these traits across different livestock species28,48,49,50,51. To the best of our knowledge, there is no prior research had delved into the realm of CNV associations with diverse phenotypes in American mink.

We conducted the CNV-based association studies using Affymetrix Mink 70K SNP array to identify potential genetic variants associated with dEBVs of 27 different traits such as growth and feed efficiency, reproduction, Aleutian disease tests, and pelt quality traits. In total, 10,137 CNVs were identified, with an average number of five CNVs per sample. Although the average number of detected CNVs per individual is substantially less than our previous study using whole-genome sequencing data (average number of 1647.3), it is in agreement with the results of other studies that used SNP genotyping data with a similar marker density21,52,53,54. It is well-known that the SNP genotyping density affects the number and length of the identified CNVs21. The average length of identified CNVs (109.69 kb) is much longer than our previous study with an average size of 7.4 kb, showing differences in resolution and coverage of genome between SNP array and whole genome sequencing data, yet falls within the range of other studies using comparable SNP array datasets55,56. A total of 250 significant CNVRs were associated with at least one of the studied traits (P < 0.0005), overlapping with 320 potential candidate genes.

For growth and feed efficiency traits, we identified 86 CNVRs associated with eleven traits. Within these significant CNV segments, we identified ARID1B, APPL1, TOX, and GPC5 genes, which might have large impacts on growth rate and feed efficiency in American mink. The ARID1B gene is overlapped with the duplication CNVR1 (Chr1:10,430,780–10,527,641), which was significantly associated with traits such as FBW, ADG, FCR, KR, and RIG. The ARID1B gene, which plays a key role in controlling the maturation of neurons during brain development57, is the commonly mutated gene in Coffin-Siris syndrome, a genetic disorder characterized by intellectual disability, developmental delay, and growth impairment58,59. Yu et al.60 reported that the ARID1B gene overlapped with identified CNVs in patients with short stature and developmental disorder, indicating the critical function of ARID1B mutations in human height regulation. Interestingly, Bovo et al.61 detected a region being targeted by selection pressure, harboring the ARID1B gene in different pig breeds that grouped by their size, supporting the effect of this gene on body size. The APPL1 gene located on duplication CNVR125 (Chr6:198,790,330–198,804,380) interacts with several proteins such as adiponectin receptors, AMPK, and Rab5 (a small GTPase downstream of APPL1) to regulate apoptosis, cell proliferation, metabolism and insulin sensitivity in energy homeostasis, resulting in increased glucose uptake and fatty acid oxidation62. Schweer et al.63 reported that the APPL1 gene is associated with feed efficiency traits in beef cattle through the regulation of glucose.

The TOX gene, located in the duplication CNVR75 (Chr4:75,332,844,330–75,473,471), which overlapped with the identified CNVR in our previous study in American mink64, was significantly associated with FBW and ADG. The TOX gene is a family member of high‐mobility group box proteins and serves as a regulator of gene expression, mostly through modifying the density of the chromatin structure65. In cattle, numerous studies demonstrated that the TOX gene is associated with feed efficiency66, growth67, carcass traits68,69,70, and development of puberty71. Furthermore, it is shown that the TOX gene is associated with weight gain, obesity, and metabolic syndrome-related phenotypes in humans72. The GPC5 gene, which mediates several functions in the control of cell division and growth regulation73, is found within a deletion CNVR96 (Chr5:149,166,233–149,199,404), associating with ADG, RIG, and KR. Congruent with these findings, GPC5 was reported as a candidate gene located within a significant SNP with effects on RIG, RFI, and efficiency of intake in beef cattle74. Moreover, other GWAS indicated the association of the GPC5 gene with human height75, body mass index76, and body size/body weight in chicken77,78.

Regarding the female reproduction traits, we found 168 significant CNVRs, overlapping with several functional genes, among which GRM1, RNASE10, WNT3, WNT3A, and WNT9B might be the candidate genes related to female reproduction in mink. The GRM1 gene, which was previously identified in a CNV study in American mink64, was located within the duplication CNVR15 (Chr1: 56,870,177–56,925,961), associated with AWB trait. The GRM1 was a gene of interest reported in several studies to be associated with female reproduction in different livestock species, such as seasonal reproduction in sheep79, litter size in goats80, number of teats and litter traits in pig81, and fertility-related traits in cattle82. Interestingly, it was reported that the GRM1 gene located within structural variations and runs of homozygosity regions associated with litter traits in pigs83,84, highlighting the hypothesis that this gene might be a candidate gene for female reproduction in American mink. It was suggested that RNASE10 gene action in the proximal epididymis is vital for the acquisition of spermatozoa adhesiveness, eventually affecting the mode of sperm transport in the female reproductive tract85.

In the current study, several WNT family genes were identified to be associated with reproduction traits in American mink, including WNT3 and WNT9B (both found within the duplication CNVR118), and WNT3A (overlapped with the deletion CNVR23). It is well-documented that the expression of WNT3 during the early pregnancy mediates the stromal cell proliferation and trophoblast invasion, eventually affecting the embryonic development86. Comparably, the WNT9B gene has been reported as one of the key genes associated with inducing the gonadotropin-releasing hormone secretion during follicular development in sheep87. Another gene, WNT3A, is known as the main regulator of reproductive behavior and follicular activity associated with estrus, which in turn may contribute to the reproductive efficiency in cattle88.

The gene annotation within significant CNVRs for pelt quality traits identified some functional candidate genes affecting fur characteristics, such as MYO10 and LIMS1. The MYO10 gene is an integral member of the myosin family, which is involved in various cellular processes such as dynamic actin remodeling, cell migration and adhesion, and filopodia formation89,90. Our findings are in accordance with previous studies indicating that the MYO10 gene plays a key role in mediating skin pigmentation through regulating melanosome transportation in the skin91,92,93. It was demonstrated that the melanocytes present in the skin control the quantity and types of melanosomes, ultimately determining the coat color94. Notably, it was shown that the MYO10 gene mutation altered the coat color pigmentation pattern in mice, further supporting its role in facilitating melanoblast migration95. Interestingly, another member of the myosin superfamily, the MYO5A gene, has been widely documented for its impact on coat color phenotype in different species96,97,98, and specifically in American mink64,99.

The LIMS1 gene is located within the duplication CNVR170 (Chr8: 65,501,492–65,735,953), associated with the DQU trait. The LIMS1 gene involved in the control of cell signaling, adhesion, migration, proliferation, and survival100. Several studies demonstrated that the LIMS1 gene regulates cell adhesion and spreading through the ternary protein complex of integrin-linked kinase (ILK), PINCH, and parvin101. To this end, ILK has been reported as a crucial factor for hair morphogenesis102. Interestingly, Endo et al.103 reported that the LIMS1 gene was associated with hair morphology and density in East Asians. Furthermore, it was shown that the loss of LIMS1 gene expression from mouse keratinocytes resulted in impaired hair follicle growth104, supporting the importance of this gene on fur development in mink.

Aleutian mink disease virus causes autoimmune disorders in mink by stimulating their immune response to produce antibodies and form immune complexes8,105. For CIEP, ELISA-P, and ELISA-G, we identified 9, 12, and 22 significant CNVRs, respectively, which overlapped with several immune-related genes such as IFNGR2, APEX1, UBE3A, and STX11. The IFNGR2 gene encodes IFNγR2, which is part of the IFN-γ receptor complex that is overexpressed in an inflammatory environment106. It is well established that IFNGR2 is an important regulator for IFN-γ-STAT1 signaling in T cells107, in turn, the dysregulation of the IFNGR2 gene is associated with a variety of autoimmune diseases108. The APEX1 and UBE3A genes overlapped with the duplication CNVR236 (Chr13: 96,258,171–96,346,188) and the deletion CNVR237 (Chr13: 125,264,701–125,304,748), respectively, both associated with ELISA_G, GL and AWB traits. In agreement with our results, Hu et al.7 indicated a favorable genetic correlation between Aleutian disease test and reproduction traits in American mink, which suggested the potential for genetic selection of Aleutian disease test traits to alleviate the adverse impact caused by Aleutian disease in mink farms.

The APEX1 gene (also called APE1) encodes a multifunctional protein that regulates the DNA base excision repair and redox activities, the latter demonstrated to be involved in mediating the T helper cell 1 (Th1) response109. In addition, it is well-documented that the APEX1 gene plays a proinflammatory function in stimulating cytokine and chemokine expression, eventually contributing to innate and adaptive immunity processes110. It has been confirmed that the UBE3A gene, present in both glutamatergic and GABAergic neurons in the brain, functions as a transcriptional regulator of the immune system within the brain111. Recently, Zhang et al.112 revealed that the UBE3A gene within a deletion CNV is associated with the enrichment levels of immune signaling pathways, eventually enhancing antitumor immunity and immunogenicity. The STX11 gene, which is a member of the SNARE family, is highly expressed in immune tissues such as the thymus, spleen, and lymph nodes, regulating the IFN-γ secretion from natural killer cells, consequently mediating the immune cell function113,114.

Conclusion

For the first time in American mink, the CNV-based GWAS were applied for economically important traits using the Affymetrix Mink 70K SNP array. We identified 10,137 CNVs, including 6968 duplications and 3169 deletions, among which 250 CNVRs were significantly associated with at least one trait. From this, we identified several candidate genes contributing to the growth and feed efficiency (ARID1B, APPL1, TOX, and GPC5), reproduction (GRM1, RNASE10, WNT3, WNT3A, and WNT9B), pelt quality (MYO10, and LIMS1), and Aleutian disease tests (IFNGR2, APEX1, UBE3A, and STX11). Overall, the associated CNVRs and respective candidate genes in the current study supply additional information, complementary to GWAS analyses solely based on SNP markers, further helping reveal the genetic basis of traits of economic interest in American mink.