Identification of candidate genes associated with porcine meat color traits by genome-wide transcriptome analysis

Meat color is considered to be the most important indicator of meat quality, however, the molecular mechanisms underlying traits related to meat color remain mostly unknown. In this study, to elucidate the molecular basis of meat color, we constructed six cDNA libraries from biceps femoris (Bf) and soleus (Sol), which exhibit obvious differences in meat color, and analyzed the whole-transcriptome differences between Bf (white muscle) and Sol (red muscle) using high-throughput sequencing technology. Using DEseq2 method, we identified 138 differentially expressed genes (DEGs) between Bf and Sol. Using DEGseq method, we identified 770, 810, and 476 DEGs in comparisons between Bf and Sol in three separate animals. Of these DEGs, 52 were overlapping DEGs. Using these data, we determined the enriched GO terms, metabolic pathways and candidate genes associated with meat color traits. Additionally, we mapped 114 non-redundant DEGs to the meat color QTLs via a comparative analysis with the porcine quantitative trait loci (QTL) database. Overall, our data serve as a valuable resource for identifying genes whose functions are critical for meat color traits and can accelerate studies of the molecular mechanisms of meat color formation.

Scientific RepoRts | 6:35224 | DOI: 10.1038/srep35224 linkage mapping methods [4][5][6][7] , and the fine mapping work for meat color QTLs were subsequently conducted 8,9 . Recently, with the development of high-density SNP chips and genome sequencing, genome-wide association studies (GWAS) have provided a more precise method for identifying the genomic regions and markers associated with meat color [10][11][12][13][14] . Previously, a handful of causative genes and quantitative trait nucleotides (QTN) were demonstrated to influence meat color. These include PRKAG3, a major gene responsible for acid meat [15][16][17] , and RYR1, a major gene responsible for Pale, Soft, Exudative (PSE) meat 18,19 . In addition, some novel genes, such as NUDT7 8 , EDN3, and PHACTR3 20 , were also associated with meat color. Despite this progress, our knowledge of the genetic factors underlying variation in meat color remains incomplete.
In this study, to identify the candidate genes that influence meat color traits, we performed a comparative analysis of the whole transcriptomes of biceps femoris (Bf; white muscle) and soleus (Sol; red muscle), which are characterized by obvious color differences, using RNA-seq technology. We identified a series of differentially expressed genes (DEGs) between these two tissue types, which represent potential candidate genes affecting meat color traits. Using these data, we identified several important GO terms and metabolic pathways associated with meat color. Overall, our data provide a solid foundation for identifying the critical genes whose functions affect meat color traits, and can facilitate studies of the molecular regulatory mechanisms underlying meat color.

Results and Discussion
Phenotypic confirmation. To avoid individual differences between animals to the greatest extent possible, the subjects for this study were three full-sib female pigs with similar performances (Supplementary Table S1). The source of tissues were derived from Bf and Sol in this study. Bf (light or pale) is a typical fast-twitch fiber characterized by more higher glycolytic potential, and Sol (dark or red) is a typical slow-twitch fiber characterized by more higher oxidative potential 21 . The differences in meat color between Bf and Sol muscles were obvious, and were further confirmed by quantitation of myoglobin, MyHC-I, and MyHC-IIb expression (Fig. 1a). The results revealed that these genes were expressed at significantly different levels in Bf and Sol muscles (P < 0.01) ( Fig. 1b-d). Taken together, these results suggest that Bf and Sol muscles are suited for candidate gene screening of meat color.
Overall statistics of sequencing data. We constructed six cDNA libraries (three from Bf muscles: Bf28, Bf35, and Bf36, and three from Sol muscles: Sol28, Sol35, and Sol36) and sequenced them on a HiSeq 4000 platform. A summary of sequencing data indicated that each library had highly consistent statistical parameters ( Table 1). The GC content of each library was more than 54%, and the ratio of Q20 bases (those with a base quality > 20 and error rate < 0.01) was higher than 95%, indicating that high-quality raw reads were obtained from each library. After strict filtering, more than 6 Gb clean bases were obtained for each library, and the clean base ratio of each library was approximately 99.6%. When the data were mapped onto the reference genome, the mapped reads ratio for each library was greater than 72%.
Identification of novel transcripts and refinement of gene structures. In total, 311 novel transcripts were predicted from six cDNA sequencing libraries (Supplementary Table S3), significantly fewer than the number of novel transcripts predicted from porcine muscle cDNA libraries by Zhu et al. 22 . These differences may be due to differences in software used in both studies, or the relatively stringent criteria used in this study. MyHC-IIb genes between Bf and Sol muscles, respectively. Realtime PCR is used to detect the expression level, and the gene expression levels are normalized by reference gene GAPDH. The unpaired Student's t-test is used to evaluate the statistical significance of differences, *P ≤ 0.05, **P ≤ 0.01. All data are presented as mean ± standard error (SE).
Protein-coding and non-coding transcripts were distinguished with CNCI software. Among the novel transcripts, 61 were protein-coding, 250 were non-coding; and the coding transcripts had a higher CNCI score than the non-coding transcripts, which is consistent with the results obtained from a previous study 22 . To improve gene annotation information in the current database, the 5′ and 3′ boundaries of known genes were refined by alignment of known transcripts with reconstructed transcripts from transcriptome sequencing data. In total, 1707 genes were thus refined, including 1252 genes refined at the 5′ region, 1195 genes refined at the 3′ region, and 740 genes refined at both end regions. Detailed annotation information regarding structurally refined genes is provided in Supplementary Table S4.
Alternative splicing analysis. Alternative splicing (AS) generates multiple transcripts from the same gene, some of which may perform different functions. We used the rMATS software to analyze the AS events in each library 23 . Seven types of AS events were detected: skipping exon (SE), alternative first exon (AFE), alternative 3′ splice site (A3SS), mutually exclusive exon (MXE), alternative last exon (ALE), intron retention (RI), and alternative 5′ splice site (A5SS). The most common of these was SE event, and consistent with the results of previous studies, although the ratio of SE to all AS events was very different from that obtained by Zhu et al. 22 . In the present study, SE accounted for more than 85% of all AS events in each library ( Fig. 2 and Supplementary Table S5).
Validation of DEGs and cluster analysis. To screen for critical candidate genes related to meat color traits, we identified DEGs between Bf and Sol muscles using two methods, DEseq2 and DEGseq. DEseq2, which is suitable for analyzing data from biological repeated experiments, identified 138 DEGs in this study ( Fig. 3  differences due to genetic background, we used the DEGseq method, which is suitable for analyzing data from the experiments with no biological repeats. In particular, DEGseq was used for the following comparisons: Bf28 vs Sol28, Bf35 vs Sol35, and Bf36 vs Sol36, yielding 770, 810 and 477 DEGs, respectively ( Fig. 3 and Supplementary  Table S8). Furthermore, 52 overlapped DEGs were filtered out from the DEGs identified by both methods ( Fig. 4 and Table 2). To experimentally validate the DEGs identified from the sequencing data, we analyzed 12 DEGs (six up-regulated and six down-regulated) using real-time PCR. The results confirmed that the expression patterns of these DEGs were consistent with those obtained from transcriptome sequencing data (Fig. 5). Furthermore, the fold_change values from the two methods were highly correlated (Pearson correlation coefficient R = 0.97) at a high level of statistical significance (P < 0.01). These results indicate that the DEGs identified in the genome-wide transcriptome sequencing data are reliable.
To obtain insight into the expression patterns of DEGs in the six libraries, we performed a hierarchical cluster analysis based on expression abundance using the Pheatmap software in R package (Supplementary Table S9). As shown in Fig. 6, Bf28, Bf35, and Bf36 were clustered into a group, in which the DEGs exhibited similar expression patterns, and Sol28, Sol35, and Sol36 clustered into another group. Thus the cluster analysis indicated a significant difference in gene expression between Bf and Sol tissue.  GO and KEGG pathway enrichment analysis of DEGs. The DEGs identified above represent candidate genes critical for the formation of meat color. To further elucidate the functional roles of DEGs in meat color, we performed GO enrichment analysis of DEGs obtained by the DEseq2 and DEGseq methods using the GOseq software. In total, 13 GO terms were significantly enriched in DEGs obtained by the DEseq2 method (P < 0.05), including three terms in cellular component (CC), nine terms in biological process (BP) and one term in molecular function (MP) (Fig. 7). The most abundant term was "extracellular vesicular exosome, " containing 28 DEGs. Other enriched terms, including "troponin complex, " "calcium ion-transporting ATPase complex, " "transition between fast and slow fiber, " "regulation of systemic arterial blood pressure by ischemic conditions, " "regulation of striated muscle contraction, " and "troponin T binding, " refer to properties and pathways potentially associated with meat color (Supplementary Table S10). The most notable term was related to "transition between fast and slow fiber". Myoglobin is the sarcoplasmic heme protein primarily responsible for the meat color, and the content of myoglobin is higher in type I muscle fiber than in type II b muscle fiber. Indeed, we validated this phenotypic variation in this study (Fig. 1). Accordingly, the difference between muscle fiber types is considered to be a critical factor determining meat color 1 . The Bf and Sol muscles, which are characterized by remarkable differences in contractile and metabolic properties, comprise different muscle fiber types. Several terms related to the contractile and metabolic properties of muscles were also enriched, including "calcium ion-transporting ATPase complex" and "regulation of striated muscle contraction".
We also performed GO enrichment analysis of DEGs from the DEGseq comparisons Bf28 vs Sol28, Bf35 vs Sol35, and Bf36 vs Sol36. Notably, the term "transition between fast and slow fiber" was significantly enriched in all three comparisons ( Supplementary Fig. S2, Supplementary Fig. S3 and Supplementary Fig. S4). The four redox states of myoglobin, namely deoxymyoglobin (DeoxyMb), oxymyoglobin (OxyMb), carboxymyoglobin (COMb), and metmyoglobin (MetMb), are critical for meat color, and the generation of these four compounds is closely related to the significantly enriched term "oxidation-reduction process, " (Supplementary Fig. S3 and Supplementary Fig. S4). Additionally, the states of myoglobin are largely determined by the concentration of O 2 , which is controlled via oxygen consumption 24 ; O 2 metabolism mainly occurs within the mitochondria. Consistent with this, the term "mitochondrion, " containing 48 DEGs, was significantly enriched in the present study ( Supplementary Fig. S3). The detailed results of GO enrichment analysis of DEGs from the Bf28 vs Sol28, Bf35 vs Sol35, and Bf36 vs Sol36 comparisons are provided in Supplementary Table S11.
Meat color is a complex trait; its formation may involve coreaction among many intracellular signaling pathways. To better understand the biological functions and interaction of genes, we conducted KEGG pathway enrichment analysis of DEGs identified by the DEseq2 and DEGseq methods. In total, 115 DEGs obtained from the DEseq2 method were mapped to 95 KEGG pathways, and 12 of these KEGG pathways were significantly enriched (q ≤ 0.1); "metabolic pathway (ko01100)" contained the most DEGs (20 unigenes) (Supplementary Table S12 and Fig. 8). Of these pathways, "vitamin B6 metabolism, " "glutathione metabolism, " "biosynthesis of unsaturated fatty acids, " and "fatty acid elongation" could all be related to the formation of meat color. The "vitamin B6" pathway (ko00750) plays an important role in amino acid, glucose, lipid metabolism, and hemoglobin synthesis 25 , and thus may induce several endogenous factors contributing to meat color, such as pH, and lipid oxidation 24 . Antioxidants and the reactive products of lipid oxidation are also known to influence meat color stability 26 . Glutathione is an important antioxidant in animals, and the "glutathione metabolism" pathway was also enriched in the present study. Moreover, "biosynthesis of unsaturated fatty acids" and "fatty acid elongation, " which are involved in lipid oxidation, also influence meat color.

DEGs involved in meat color formation.
To date, more than 500 QTLs in the Pig QTL Database have been shown to affect meat color traits, and the number continues to increase. To further identify candidate genes associated with meat color, we performed an integrated analysis of the DEGs and the information in the Pig QTL Database. In total, 114 non-redundant DEGs were mapped to the meat color QTLs (Supplementary  Table S14). Transferrin is critical in controlling the level of free iron in biological fluids, and iron is essential for heme synthesis; the transferring receptor (TfR) is a carrier protein for transferrin. Accordingly, the DEG TfR (ENSSSCG00000011848) may be an important gene influencing meat color. The anaerobic glycolysis reaction of glycogen takes place if the muscles are no longer supplied with oxygen, and the closely related indexes glycolytic potential and free glucose content can influence meat color 24 . Notably, glycogen synthase 2 (GYS2) (ENSSSCG00000000577), a limiting enzyme in glycogen synthesis, was previously reported to be expressed only in liver and adipose tissue 27 ; however, we detected it in all six skeletal muscle libraries, and it was significantly differentially expressed between Bf and Sol muscles. The function and regulatory mechanisms of GYS2 related to meat color remain to be elucidated. Moreover, DEGs involved in fat metabolism, including omega-3 fatty acid receptor 1 (O3FAR1) (ENSSSCG00000010478), leptin (LEP) (ENSSSCG00000016588) and ELOVL fatty acid elongase 6 (ENSSSCG00000025541), should be examined closely, because lipid oxidation can facilitate myoglobin oxidation, thereby enhancing meat discoloration 28 . Additionally, the glutathione peroxidase 2 gene (gastrointestinal) (GPX2) (ENSSSCG00000002279), a member of the glutathione peroxidase family, that plays a major role in protecting the organism from oxidative damage. Notably, the log 2 (fold_change) of GPX2 between Bf and Sol was − 3.5 in the present study. Several previous studies showed that antioxidants can efficiently protect lipid and RPKM values are used to calculate the gene expression in RNA-seq and normalize the expression of one group to "1". In real-time PCR, relative expression levels are calculated using ∆ ∆ Ct value method and normalized by reference gene GAPDH, and similarly normalize the expression of one group to "1". The data showed in Y axis represented the fold change. The unpaired Student's t-test is used to evaluate the statistical significance of differences between the two groups, *P ≤ 0.05, **P ≤ 0.01. All data are presented as mean ± standard error (SE).
Scientific RepoRts | 6:35224 | DOI: 10.1038/srep35224 myoglobin from oxidation in vitro and vivo (e.g., α -tocopherol) 28 , suggesting that the endogenous antioxidant factor GPX2 may play a critical role in the formation of meat color.
In conclusion, our data provide a comprehensive overview of the transcriptome of Bf and Sol muscles, which exhibit obvious meat color differences, and identify several potential metabolic pathways and many interesting candidate genes potentially involved in meat color traits. Overall, our results lay a solid foundation for elucidating the mechanisms of meat color formation, and the DEGs identified represent the potential valuable candidate genes for meat color traits.

Materials and Methods
Animal sources, tissues collection, and phenotypic characterization. The experimental population consisted of 48 Duroc × Meishan pigs derived from crossing offspring of a Duroc boar with eight Meishan sows. All pigs lived under the same condition and were raised under a standardized feeding regimen with free access to water. All procedures involving animals were carried out according to the Guide for the Care and Use of Laboratory Animals formulated by the Institutional Animal Care and Use Committee of Nanjing Agricultural University, Nanjing, China. All experiments were approved by the Institutional Animal Care and Use Committee of Nanjing Agricultural University. Main growth and carcass traits were recorded. To improve reliability and reduce inter-individual differences, three full-sib female pigs with similar growth and carcass traits were selected.
To identify candidate genes affecting meat color traits, Bf (samples Bf28, Bf35, and Bf36) and Sol (samples Sol28, Sol35, and Sol36) muscles, which exhibit obvious meat color differences, were dissected and snap frozen in liquid nitrogen prior to use. Furthermore, phenotypic differences were confirmed by quantitating the expression of myoglobin, and myosin heavy chain (MyHC) isoforms MyHC IIb and MyHC I. Sequences of quantitative primers are provided in Supplementary Table S2.
Library construction and sequencing. Total RNA was extracted using the Trizol reagent (Invitrogen, Life Technologies, CA, USA) from Bf and Sol of three female Duroc × Meishan pigs. The RNA samples were quantitated and subjected to quality inspection. Briefly, RNA degradation and contamination were monitored on 1% agarose gels, and RNA purity and concentration were measured using a NanoPhotometer ® spectrophotometer (IMPLEN, CA, USA) and a Qubit ® 2.0 Fluorometer (Life Technologies, CA, USA), respectively; RNA integrity was assessed on an Agilent Bioanalyzer 2100 (Agilent Technologies, CA, USA).  In total, 1.5 μ g of RNA for each sample was used to construct sequencing libraries, which were generated using NEBNext ® Ultra TM RNA Library Prep Kit for Illumina ® (NEB, USA) according to the manufacturer's recommendations, and then subjected to quality assessment on an Agilent Bioanalyzer 2100. Each muscle from each of the three experimental pigs was used to prepare a sequencing library; thus six sequencing libraries were generated in total. The libraries were sequenced on an Illumina HiSeq 4000 platform after clustering and 150 bp paired-end reads were generated.
Raw data processing and alignment analysis. High-quality clean reads were obtained by trimming the adapter sequences and removing the invalid reads containing poly-N and low-quality reads from the raw data. Meanwhile, the Q20 and GC content were calculated. All downstream analyses were conducted based on the high-quality clean reads. The clean reads were mapped to the Sus scrofa genome (Sus scrofa 10.2, http:// asia.ensembl.org/Sus_scrofa/Info/Index) using HISAT (0.1.6) 29 with default parameters. Transcripts were reconstructed using the StringTie software (1.0.4) 30 ; novel transcripts were predicted by comparing reconstructed transcripts with known transcripts using the Cufflinks software (2.1.1) 31 ; and coding analysis of novel transcripts was performed using the CNCI software 32 . Gene structure refinement was conducted by comparing known transcripts with reconstructed transcripts from the six transcriptome sequencing datasets using the BLAST software. AS events were detected using the rMATS software 23 .

Identification of DEGs.
First, the expression level of each gene was calculated using HTseq software (0.6.1) 33 , and normalized using the reads per RPKM method 34 . Subsequently, DEGs were identified using the R packages DEseq2 (1.4.5) 35 and DEGseq (1.18.0) 36 , which are designed for use with biological and non-biological replicates, respectively. Supplementary Fig. S1 shows the screening strategy for DEGs. The corrected P-value (Q-value), False Discovery Rate (FDR), was used to screen the DEGs; "FDR ≤ 0.01 and absolute value of log 2 (fold_change) ≥ 1" was set as the threshold to judge significance of differential gene expression. In addition, overlapping DEGs obtained from both methods were identified.

Confirmation of DEGs.
To validate the transcriptome sequencing data, a subset of DEGs were confirmed by real-time PCR. For this purpose, RNA derived from the same muscle samples used for transcriptome sequencing were subjected to quantitative analysis of gene expression. In total, 12 DEGs were randomly selected for expression analysis. Gene-specific primers (Supplementary Table S2) were designed using Primer 3 (http://bioinfo.ut.ee/ primer3/). Real-time PCR assays were conducted using AceQ ® qPCR SYBR ® Green Master Mix (Vazyme, China) on a Step-One Plus Real-time PCR System (Applied Biosystems, CA, USA). All reactions were performed in triplicate and the reference gene GAPDH was used to normalize gene expression levels. Relative gene expression levels were calculated using the comparative Ct (∆ ∆ Ct) value method 37 . All statistical analyses were performed using SPSS v20.0. The unpaired Student's t-test was used to evaluate the statistical significance of differences between the two groups and correlation between the fold_change values of DEGs from transcriptome sequencing and real-time PCR was estimated with Pearson correlation coefficient; P < 0.05 was considered statistically significant difference. All data are presented as means ± standard error (SE).
Cluster analysis and functional annotation of DEGs. Cluster analysis was conducted using the Pheatmap software in R package. Gene Ontology (GO) enrichment analysis of DEGs obtained from DEseq2 and DEGseq methods were conducted using GOseq software 38 , respectively. Meanwhile, pathway enrichment analysis was performed using the KEGG database (http://www.genome.jp/kegg/), and the hypergeometric test was used to identify significantly enriched pathways. QTL location analysis of DEGs. We performed QTL mapping of DEGs by comparative analysis of DEGs and porcine QTL chromosome positions using BEDTools 39 . Furthermore, the DEGs mapping to meat color QTLs were refined.