Comparative Transcriptome Analysis of Mink (Neovison vison) Skin Reveals the Key Genes Involved in the Melanogenesis of Black and White Coat Colour

Farmed mink (Neovison vison) is one of the most important fur-bearing species worldwide, and coat colour is a crucial qualitative characteristic that contributes to the economic value of the fur. To identify additional genes that may play important roles in coat colour regulation, Illumina/Solexa high-throughput sequencing technology was used to catalogue the global gene expression profiles in mink skin with two different coat colours (black and white). RNA-seq analysis indicated that a total of 12,557 genes were differentially expressed in black versus white minks, with 3,530 genes up-regulated and 9,027 genes down-regulated in black minks. Significant differences were not observed in the expression of MC1R and TYR between the two different coat colours, and the expression of ASIP was not detected in the mink skin of either coat colour. The expression levels of KITLG, LEF1, DCT, TYRP1, PMEL, Myo5a, Rab27a and SLC7A11 were validated by qRT-PCR, and the results were consistent with RNA-seq analysis. This study provides several candidate genes that may be associated with the development of two coat colours in mink skin. These results will expand our understanding of the complex molecular mechanisms underlying skin physiology and melanogenesis in mink and will provide a foundation for future studies.


Results
Illumina paired-end sequencing and de novo assembly. To determine the skin transcriptomes of minks, six sequencing libraries were prepared from skin with black and white coat colours and sequenced using the Illumina paired-end technique. In total, 173,485,532 and 183,877,224 raw reads were generated from the skin with black and white coat colours, respectively, and yielded 21.28 and 22.52G clean bases, respectively ( Table 1). The combined numbers of Illumina read pairs and clean bases for the two types of mink skin were 350,424,598 and 43.80G. Of the raw reads from black mink, more than 93.63% of the bases had a Q value > 20 (an error probability of 0.045%), and for white mink, 94.05% had a Q value > 20 (an error probability of 0.042%). The GC-contents were 49.60% and 49.79% for black and white mink skin, respectively. After removing adaptors and   (Table 3).

Functional classification by GO, COG and KEGG.
To classify the functions of the predicted mink unigenes, a GO analysis, which is an internationally standardized gene functional classification system, was performed. In total, 41,522 unigenes with BLAST matches to known proteins were classified into 3 functional categories: Biological process (GO: 0008150), Cellular component (GO: 0005575) and Molecular function (GO: 0003674) (Fig. 2). For the biological process terms, these unique sequences were grouped into 21 classifications, and most were 'cellular process' , 'metabolic process' and 'single-organism process' . Cell, cell part and organelle were the most represented categories in cellular components. The most common molecular functions were binding, catalytic activity and transporter activity (Fig. 2). COG is a classification system based on orthologous genes, which have the same function and a common ancestor. In total, 14,445 unigenes were annotated to 26 groups with the COG database (Fig. 3). The (T) signal transduction mechanism category, with 2,680 annotated unigenes, was the largest category. The second category, (R) General function, contained 2,471 unigenes. (O) Posttranslational modification, protein turnover and chaperones was the third most common category and contained 1,505 unigenes. (X) Unnamed protein contained the fewest unigenes with 3 (Fig. 3).
The assembled unigenes were assigned to the biochemical pathways described in KEGG (Fig. 4). The KEGG database contains a systematic analysis of inner-cell metabolic pathways and gene product functions. Pathway-based analyses help to further determine the biological functions of genes. In total, 19,330 unigenes were assigned to 5 KEGG biochemical pathways: Organismal Systems (5,644 unigenes), Metabolism Pathway (4,647), Environmental Information Processing (3,872), Cellular Processes (2,700) and Genetic Information Processing (2,467). The largest group, Organismal Systems, was well represented among the 5,644 mink unigenes, with the most genes associated with the immune system (1,269), the endocrine system (1,205), the nervous system (770), the digestive system (622), the circulatory system (479), development (410), the sensory system (311), environmental adaptation (310) and the excretory system (268). Pathways related to metabolic pathways were the second most common, including genes involved in carbohydrate metabolism (704), lipid metabolism (688), amino acid metabolism (573), energy metabolism (557), glycan biosynthesis and metabolism (354) and metabolism of cofactors and vitamins (314). The third largest group consisted of environmental information processing, with   Table 3. Gene annotation by searching against public databases.
a majority of the proteins involved in signal transduction (2,674), signalling molecules and interaction (1,068) and membrane transport (130). Pathways related to cellular processes and genetic information processing were also well represented by mink unigenes. These results provide a valuable resource for investigating the metabolic pathways in the skin of minks with different coat colours.

Differentially expressed genes (DEGs) in mink skin with black and white coat colours. M vs A
plots can be used to determine any systematic bias that may be present between conditions (A: log 2 Foldchange; M: −log 10 padj). Therefore, we used the log 2 Foldchange value combined with -log 10 padj to assess every gene expressed between BLM_S and WHM_S. Volcano plots exploring the relationship between the fold change and the significance are shown in Fig. 5. We identified 12,557 unigenes as DEGs, including 3,530 up-regulated and 9,027 down-regulated unigenes. To better investigate the biological mechanism of melanogenesis, it is important to identify the DEGs between the two mink skins with different coat colour. A gene expression comparison showed that a total of 3,887 unigenes were differentially expressed between BLM_S and WHM_S when |log 2 Fold-change| > 1 and padj < 0.05 were used as the cut-off values (Fig. 6). Of these, 2,327 unigenes were significantly differentially expressed in both BLM_S and WHM_S. Moreover, 555 and 1,005 unigenes were uniquely expressed in BLM_S and WHM_S, respectively. All the DEGs are illustrated in Fig. 6. In mice, approximately 171 genes that affect pigmentation in different pathways have been cloned and identified, and another 207 coat colour-associated   Table 4). The coat colour genes in the 'Eumelanin and Pheomelanin' functional category showed higher expression in white mink skin. Among the coat colour genes showing higher expression in black mink skin, the TYRP1 gene showed the highest expression in black mink skin versus white mink skin, and it was followed by the Pax3, dopachrome tautomerase (DCT), Sox10, LYST, KITLG and Rab27a genes. Interestingly, in white mink skin, the highest expression was for the Oca2 gene, which is associated with oculocutaneous albinism type 2 (Oca2), a pink-eyed dilution (p) locus.
Validation of the differentially expressed mRNAs in mink skin. We further validated eight DEGs using qRT-PCR with gene-specific primers to confirm the gene expression patterns. These eight genes were designated as likely to be involved in melanogenesis or the transcriptional regulation of the melanogenesis pathway based on previous publications or best blast matches to ferret. The qRT-PCR gene expression patterns were  Comparison of expression patterns of differential unigenes identified between two mink skins with black and white coat colour. The X axis indicates gene expression changes in different samples, and the Y axis indicates the significant degree of gene expression changes. Scattered points represent each gene, the red dots represent differentially up-regulated genes, the green dots represent differentially down-regulated genes, and the blue dots represent no significant difference gene. In total, 12,557 unigenes were identified as differentially expressed between skins with two coat colours, including 3,530 genes that were up-regulated and 9,027 downregulated genes.
compared with the data obtained from the comparative transcriptome analysis. The results showed that there was a strong correlation between the RNA-seq data and qRT-PCR results (Fig. 7).

Discussion
Mammalian coat colour exhibits a wide range of shades and is controlled by melanin production in melanocytes (melanogenesis). Melanogenesis involves intricate molecular regulation 27 , and a number of recent publications have summarized our current understanding of the process of melanin production in skin and coats [28][29][30] .
To clarify the complicated molecular mechanisms of coat colour formation, previous studies have reported the global gene expression profiles in the skin of chickens 31 , sheep 32 [37][38][39] . To further investigate the genes that may play important roles in mink skin, particularly in coat pigmentation, over 350 million clean paired-end transcriptome reads were generated from mink skin with black and white coat colours using an   Illumina HiSeq. 2500 system. Our study offers new information related to the gene expression profiles for mink skin with two different coat colours. From these clean reads, 403,725 known unigenes were identified as expressed in mink skin, and 12,557 were differentially expressed in black versus white mink skin. Although the study design was not optimal because single pooled samples (n = 3 per coat colour) were used in transcriptome sequencing analysis, with the same three samples from white and black mink skin used individually for quantitative real-time PCR validation of sequencing results, which led to limited biological replication, the results of this experiment have significantly enhanced our understanding of the composition of the mink skin transcriptome and the potential differences in gene expression associated with coat colour, which will provide a foundation for future studies. The GO and KEGG pathway analyses of DEGs revealed that most were associated with the metabolic process (MP), binding (B) and catalytic activity (CA) ontology categories. Of particular interest in our dataset were pathways related to catabolic, metabolic and biosynthetic pigment processes. Of the DEGs, the genes in the categories related to 'the components of melanosomes and their precursors' and 'melanocyte development' , including the genes TYRP1, DCT and KITLG, were up-regulated in the skin from minks with black coat colour. The function of genes in 'the components of melanosomes and their precursors' and 'melanocyte development' categories are melanin synthesis and differentiation of relevant cells 3 . The darker pigmentation of hair and skin is associated with a higher number of melanosomes, although the number of melanocytes remains constant 40 . Melanocytes in black hair follicles contain the highest number of melanosomes, whereas fewer melanosomes are found in white hair bulbs 41 . The relationship of less melanin with the lighter hair phenotype has been identified in several species, including mice 28,42,43 , horses 44 , sheep 45 , alpacas 46 and humans 43 . In this study, these results provide strong evidence that there is a significant difference in the level of pigmentation and melanogenesis between black and white coat colours of mink skin. However, further investigation is still needed to confirm the regulatory relationships of these genes.
The colour of skin, coat, feathers, and shells in mammals, birds, and bivalve molluscs is determined mainly by two melanins: eumelanin and pheomelanin [47][48][49] . For melanogenesis, TYR, TYRP1, and dopachrome tautomerase (DCT) were directly involved in the synthesis of melanins, and TYRP1 and DCT are involved in the distal eumelanic pathway and are active in the eumelanogenic pathway. In this study, Illumina HiSeq. 2500 pair-end sequencing indicated high expression of TYRP1 and DCT in mink with a black coat colour from skin with the pure black phenotype. These results demonstrated that a lack of TYRP1 and DCT expression led to a deficiency in the biosynthesis of melanin in mink skin with the white coat colour phenotype and may be the direct cause of white coat colour formation. This result is similar to observations in the American mink with the palomino phenotype, which is caused by a large insertion in intron 2 of the TYRP1 gene 8 , which is consistent with studies in domestic sheep, where the TYRP1 and DCT genes showed higher expression in black sheep skin 32 . KITLG, also known as stem cell factor (SCF) and mast cell growth factor, binds to and activates KIT and plays a crucial role in the development and maintenance of the melanocyte and melanin synthesis in fish, birds and mammals 50,51 , although its expression varies among species. Our RNA-seq and qRT-PCR results all showed that the TYRP1, DCT and KITLG genes were significantly up-regulated in the skin of mink with black hair, indicating that these genes may be responsible for black hair pigmentation in mink. However, the details of the interactions of the regulatory pathways in mink skin remain to be further investigated.
The premelanosome protein (PMEL, also known as SILV or PMEL17), a key component of mammalian melanosome biogenesis, is required for the generation of cylindrical melanosomes in zebrafish, which in turn is required for melanosome movement into the apical processes and maintenance of photoreceptor integrity 52 . Mutations in PMEL have previously been shown to regulate hypopigmented phenotypes in many vertebrate animals, such as in mice 53 , chickens 54 and dogs 55 . Lymphoid enhancer-binding factor 1 (LEF 1) is a member of the LEF/T-cell-specific factor (TCF) family of the high mobility group domain transcription factors, and it is a downstream nuclear Wnt signalling pathway mediator 56 . Wnt signalling plays an important role in melanocyte development and differentiation and suppresses melanin synthesis and TYR and MITF expression 57 . In our study, the PMEL and LEF1 genes were significantly up-regulated in the skin of mink with white coats. The results of our RNA-seq analysis and subsequent qRT-PCR validation were consistent. In a more recent paper, TYR showed higher expression in the skin of black-coated sheep versus white-coated sheep, which is similar to results observed in chickens. However, in our study, the expression of TYR was not significantly different between the two coat colours. Although the results of our RNA-seq analysis and subsequent qRT-PCR validation were consistent, these results require additional research. TYR may play an important role in the pigmentation of mink hair; thus, further investigations are required to confirm the mechanism of TYR.

Materials and Methods
Ethics statement. All  cDNA library construction and Illumina deep sequencing. A total of 3.0 μg RNA per skin sample per mink was used as the input material for the RNA sample preparations. Sequencing libraries were generated using the NEBNext ® Ultra ™ RNA Library Prep Kit for Illumina ® (NEB, USA) following the manufacturer's recommendations, and index codes were added to attribute sequences to each sample. Briefly, mRNA was purified from total RNA using poly-T oligo-attached magnetic beads. Fragmentation was conducted using divalent cations under elevated temperature in NEBNext First Strand Synthesis Reaction Buffer (5x). First-strand cDNA was synthesized using random hexamer primers and M-MuLV Reverse Transcriptase (RNaseH − ). Second strand cDNA synthesis was subsequently performed using DNA Polymerase I and RNase H. Remaining overhangs were converted into blunt ends via exonuclease/polymerase reactions. After adenylation of the 3′ ends of DNA fragments, NEBNext Adaptors with hairpin loop structures were ligated to prepare for hybridization. To preferentially select 150-200 bp cDNA fragments, the library fragments were purified with the AMPure XP system (Beckman Coulter, Beverly, USA). Then, 3 μl USER Enzyme (NEB, USA) was used with size-selected, adaptor-ligated cDNA at 37 °C for 15 min followed by 5 min at 95 °C before PCR. The PCR assay was then performed with Phusion High-Fidelity DNA polymerase, Universal PCR primers and an Indexed (X) Primer. Finally, the PCR products were purified (AMPure XP system), and the library quality was assessed on an Agilent Bioanalyzer 2100. The library preparations were sequenced on an Illumina HiSeq. 2500 platform with 100 bp paired-end reads by Novogene Bioinformatics Technology Co., Ltd. (Beijing, China). Quality control. Raw data (raw reads) in fastq format were first processed through in-house Perl scripts. In this step, clean data (clean reads) were obtained by removing low-quality reads and those containing adapters or poly-Ns. At the same time, Q20, Q30, GC-content and sequence duplication levels of the clean data were calculated. All downstream analyses were based on clean data with high quality. Trinity (v2012-10-05) was used for assembly 58 , in which clean reads of different isoforms derived from one gene were assembled into distinct transcripts but with the same subcomponent (which can be regarded as a gene), and the longest transcript of each subcomponent was defined as the 'unigene' for functional annotation.
Transcriptome annotation of unigenes. For functional annotation, all assembled unigenes that might putatively encode proteins were searched against the Nr (http://www.ncbi.nlm.nih.gov/), Swiss-Prot (a manually annotated and reviewed protein sequence database, http://www.expasy.ch/sprot/), KEGG (http://www.genome. jp/kegg/) and COG (http://www.ncbi.nlm.nih.gov/cog/) databases using the BLASTX algorithm. A typical cut-off value of E-value < 1e-5 was used. With Nr annotations, the Blast2GO program 59 was used to assign GO (Gene Ontology) annotations to the unigenes according to the component function, biological process and cellular component ontologies. After obtaining GO annotations for all unigenes, the software WEGO 60 was used to assign GO functional classifications to all unigenes and to understand the distribution of gene functions for the species at the macro level. Predictions of possible functional classifications and molecular pathway assignments were performed by sequence searches against the KEGG database 61 using the BLASTX algorithm with an E-value threshold of 10 −5 .
Differential expression, cluster analysis and Gene Ontology (GO) enrichment analysis. Different expression analysis of the two samples with different coat colours was performed using the DESeq R package (1.10.1). P values were adjusted using the Q value 62 . Genes found by DESeq with an adjusted P-value < 0.05 were assigned as differentially expressed. The identified DEGs were used for GO and KO enrichment analyses performed using the GOseq R package, which is based on the Wallenius non-central hypergeometric distribution 63 ; thus, it can adjust for gene length bias in DEGs.

Validation of DEGs by qRT-PCR.
To validate our Illumina sequencing data, eight DEGs involved in the melanogenesis pathway (KITLG, LEF1, DCT, TYRP1, PMEL, Myo5a, Rab27a and SLC7A11) that were identified by the above method were randomly chosen for quantitative RT-PCR analysis using the same RNA samples as for the transcriptome profiling. In all cases, the primers were designed for qRT-PCR using the Primer Express 3.0 that spanned exon-exon boundaries and the assembled β-actin unigene (c94862_g2) was selected as an internal control. qRT-PCR was performed using the SYBR select Master Mix kit (ABI, Life, America) on an ABI 7500 Real-Time System (Applied Biosystems, USA), with the first-strand cDNA serving as the template. The qRT-PCR assay was performed in 20.0 µL reaction mixtures containing 12.5 µL 2 x SYBR select Master Mix (ABI, Life, USA), 1.0 µL of cDNA template, 1.0 µL of each corresponding forward and reverse primer for the gene of interest, and 4.5 µL PCR-grade water. The reaction was performed using the following conditions: 50 °C for 2 min and 95 °C for 30 s, followed by 40 cycles of 95 °C for 15 s and 62 °C for 1 min, with a final extension at 72 °C for 5 min. All reactions were performed with three technical replicates using one biological sample and included negative controls with no template. The relative amount of mRNA expression of each gene (expressed as adjusted Ct value) was analysed using the Stratagene Iq5 system. Adjusted cycle threshold (C(t)) values were calculated using the following equation: Ct value = 2 −ΔΔCt64 . Differences in mRNA abundance for the genes were determined by an analysis of variance, which was performed with SPSS version 19.0. Differences were considered significant at P values < 0.05.
Descriptions of the genes mentioned above are shown in Supplementary Table S1.