Genome-wide association studies for fatty acid metabolic traits in five divergent pig populations

Fatty acid composition profiles are important indicators of meat quality and tasting flavor. Metabolic indices of fatty acids are more authentic to reflect meat nutrition and public acceptance. To investigate the genetic mechanism of fatty acid metabolic indices in pork, we conducted genome-wide association studies (GWAS) for 33 fatty acid metabolic traits in five pig populations. We identified a total of 865 single nucleotide polymorphisms (SNPs), corresponding to 11 genome-wide significant loci on nine chromosomes and 12 suggestive loci on nine chromosomes. Our findings not only confirmed seven previously reported QTL with stronger association strength, but also revealed four novel population-specific loci, showing that investigations on intermediate phenotypes like the metabolic traits of fatty acids can increase the statistical power of GWAS for end-point phenotypes. We proposed a list of candidate genes at the identified loci, including three novel genes (FADS2, SREBF1 and PLA2G7). Further, we constructed the functional networks involving these candidate genes and deduced the potential fatty acid metabolic pathway. These findings advance our understanding of the genetic basis of fatty acid composition in pigs. The results from European hybrid commercial pigs can be immediately transited into breeding practice for beneficial fatty acid composition.

a list of candidate genes for the identified loci. We further built a potential network and pathway in which these candidate genes were involved. Our findings provide insight into the genetic mechanism of fatty acid composition in pork.

Materials and Methods
Ethics statement. All the experiments that involved animals were carried out in accordance with the approved guidelines by the Ministry of Agriculture of China. Approval was obtained from the ethics committee of Jiangxi Agricultural University before this study.

Statistical analyses.
To investigate phenotypic correlations between the metabolic traits measured, we corrected phenotypes by treating sex and slaughter batch as fixed effects, and polygenic effects as random effects under a linear mixed model implemented in polygenic function in R package GenABEL 17 . The phenotypic correlation matrix was calculated by Pearson product-moment correlation coefficients using "corrplot" package in R. The polygenic function of GenABEL package was applied to estimate heritabilities of fatty acid metabolic traits 17 . The phenotypic variance explained by each top GWAS SNP was calculated by (V reduce − V full )/V reduce , where V reduce and V full are residual variances of ordinary linear models with and without including SNP genotypes as predictor variables, respectively.
For GWAS, single-marker association was conducted using polygenic and mmscore function of GenABEL. A generalized linear mixed model (see equation (6)) was explored to evaluate the association between qualified SNPs and phenotypic values. = + + + y Xb Sa Zu e (6) where y is the vector of phenotypes; b is the vector of fixed effects including sex and batch effects; a is the allelic substitution effects; u is the vector of random additive genetic effects following the multinormal distribution u ~ N (0, Gσ α 2 ), in which G is the genomic relationship matrix that was constructed based on qualified SNPs, σ α 2 is the polygenetic additive variance. X and Z are the incidence matrices for b and u, respectively; S is the incidence vector for a, and e is a vector of residual errors with a distribution of N (0, Iσ e 2 ); where I is the identity matrix and σ e 2 is the residual variance. Population stratification was accounted by adjusting the kinship covariance matrix estimated from SNP data using "ibs" function in GenABEL. Genomic inflation factors (λ ) were calculated as median (obsPvals)/median (expPvals), where "obsPvals" and "expPvals" represent the observed and expected P values in the association analysis, respectively. Sex and batch were fitted as fixed effects in the model. Bonferroni methods were used to correct the genome-wide significant (0.05/N) and suggestive (1/N) thresholds, where N is the number of SNPs used for analyses.
A meta-analysis was conducted using a Z-score method implemented in METAL software 18 , which combined the P-value and allelic effect of each SNP. A total of 28,124 common SNPs across the five populations were analyzed in the meta-analysis.
Gene function, pathway and network analyses. To highlight functionally plausible candidate genes at genome-wide significant loci, we operationally searched for annotated genes within a 1 Mb region centering each top SNP on the pig reference genome assembly (Build 10.2). The homologous segments in other mammals including mouse, cattle and human were investigated to characterize unannotated genes in target regions. LASTZ, a sequence alignment program 19 , was explored to align pairwise genomic sequences of different species. Candidate genes were determined by their functions related to fatty acid metabolism through literature mining. Enriched functional connections and networks involving the highlighted candidate genes were investigated by using GeneMANIA 20 . A potential fatty acid metabolic pathway was deduced from the known biochemical functions of the highlighted candidate genes. Table 1 shows phenotypic values of 33 fatty acid metabolic traits measured in the five populations, including number of individuals, mean, standard deviation and estimated heritabilities (h 2 ). Most (F 2 longssimus muscle, 70%; F 2 abdominal fat, 85%; Sutai, 42%; Erhualian, 82%; Laiwu, 82%; DLY, 55%) of the metabolic indices displayed moderate to high heritability estimates ranging from 0.25 to 0.65, indicating that genetic components significantly contribute to fatty acid metabolic traits. We estimated phenotypic correlations between the 33 analyzed traits ( Supplementary Fig. 1 Single-population GWAS. We conducted single-marker association studies on the 33 fatty acid metabolic traits measured in the five populations. In total, we detected 11 genome-wide significant loci on nine chromosomes (Table 1) and 12 suggestive regions on nine chromosomes ( Table 2). Of the 11 significant loci, four were population-specific and novel loci that were not identified in previous studies [5][6][7] , including the Sus scrofa chromosome (SSC) 2 locus (9.13-12.86 Mb) for C20:3n6/C18:2n6 and C20:4n6/C20:3n6 in Erhualian, the SSC12 locus (57.80-61.49 Mb) Scientific RepoRts | 6:24718 | DOI: 10.1038/srep24718 for C20:4n6/C20:3n6, PUFA/SFA and SFA in Laiwu, the SSC13 locus (28.52 Mb) for C16:1n7/C16:0 in DLY and the SSCX locus (124.40-125.87 Mb) for C20:2n6/C18:2n6 and C20:4n6/C20:2n6 in Laiwu. The other seven loci corresponded to previously identified loci for fatty acid composition [5][6][7]21,22 . These loci were distributed on SSC2, 7, 8, 12, 14, 16 and X and showed pleiotropic effects on more than one fatty acid metabolic trait ( Table 1). Three of these loci were also evidenced as pleiotropic QTL for fatty acid composition in both abdominal fat and longissimus dorsi tissues in the F 2 population in our previous studies 5 29 Mb on SSC14 affected C16:1n7/C16:0, apparently indicating the polygenic basis of these traits.

Summary of phenotype and genotype statistics. Supplementary
In the F 2 population, a total of 441 SNPs on seven chromosomal regions were significantly associated with 25 fatty acid metabolic traits. We detected a genome-wide significant locus for abdominal fat weight in the F 2 population 23 . Considering that fatty acid composition in abdominal fat tissues was correlated with abdominal fat weight in this population 24 , we herein treated abdominal fat weight as a fixed effect in the GWAS model and found no obviously different results (data not shown). This indicates that the loci affecting abdominal fat weight have no significant effect on fatty acid composition in abdominal fat tissues. In our previous study, we detected only one suggestive locus at 52.35 Mb on SSC16 for C16:0 content and did not find any significant loci for C18:0 content in abdominal fat tissues in the F 2 population 5 . When we used the ratio of C18:0/C16:0 as a phenotype, we herein detected a region at 126.83-129.56 Mb on SSC8 that showed genome-wide significant (P < 8.91E-7) association with the metabolic trait with the top SNP (ASGA0039796) at 129.44 Mb on this chromosome (Fig. 1a). We previously identified a significant (P = 5.07E-08) locus around 34.80 Mb on SSC7 for C18:1n9 but not for C16:1n7  19 , we found that the genomic segment containing the two significant SNPs was inaccurately assembled, and the most likely location of this segment was in the vicinity of the 121.35 Mb region on SSC14 that was also significantly associated with the four metabolic traits mentioned above ( Supplementary Fig. 2).
In the Erhualian population, we identified seven genome-wide significant loci for 17 fatty acid metabolic traits on five chromosomes. By applying GWAS for fatty acid contents in this population, we have previously detected only one significant SNP (DIAS0001596) at 32.15 Mb on SSC8 for C18:2n6 content and did not identify any significant signals for C20:2n6 or C20:3n6 content 7 . In this study, a novel locus at 9.13-12.86 Mb on SSC2 containing 11 significant SNPs was detected for C20:3n6/C18:2n6 and C20:4n6/C20:3n6 (Fig. 2a). The top SNP (ASGA0008884, P = 3.55E-11) locates in the fifth intron of the FADS2 (fatty acid desaturase 2) gene. This Erhualian-specific locus has not ever been reported in any association studies for fatty acids in pigs. Moreover, we identified two regions (36-56 Mb and 134.1-134.5 Mb) on SSC7 and one region (0.4-4 Mb) on SSC12 that were associated with C20:2n6/C18:2n6 at the genome-wide significant and suggestive levels, respectively (Fig. 2b). The 0.4-4 Mb region on SSC12 has been reported to be associated with multiple fatty acid contents, such as C14:0, C16:0, C16:1n7, C18:1n9, C20:1n9, but not with C18:2n6 or C20:2n6 content 7 . The locus at 134.1-134.5 Mb on SSC7 affects both fatty acid composition traits and metabolic traits in F 2 and Erhualian pigs.
In the Laiwu population, we detected seven significant loci affecting 10 traits, which confirmed five previously reported QTL and revealed two novel loci. One novel chromosomal region at 57.80-61.49 Mb on SSC12 was associated with multiple metabolic traits, where six genome-wide SNPs for C20:4n6/C20:3n6, FattyTI, SFA and seven suggestive SNPs for PUFA/SFA, UFA/SFA, DBI and UI were found (Fig. 2c and Table 1). This pleiotropic locus was observed only in the Laiwu population. At 124.40-125.87 Mb on SSCX, we identified another novel locus associated with C20:2n6/C18:2n6 and C20:4n6/C20:2n6 (Table 1). This locus exists only in Laiwu pigs and have not been reported to associate with any fatty acid traits before this study.

Meta-analysis of GWAS.
We conducted a meta-analysis of GWAS for each fatty acid metabolic trait across the five populations. Genomic inflation factors (λ ) in the meta-analysis were between 0.961 and 1.047, indicating that population stratification was properly adjusted. The meta-analysis integrates all association signals from the five populations. Therefore, when two or more populations show consistent association directions with target traits, the detection power for these traits will be improved by the meta-analysis of GWAS. By applying this analysis, we did not identify new loci but observed enhanced association strength at five loci, especially at the locus around 121 Mb on SSC14 and the locus around 43 Mb on SSC16 (Fig. 3). The most significant SNP for C18:1n9/   Table 2). This could be due to population heterogeneity across the F 2 , Sutai, DLY, Laiwu and Erhualian populations. Conditional GWAS for pleiotropic, linked or common QTL. In the single-population GWAS, the significant loci that we identified were usually associated with more than one fatty acid metabolic trait. To test if there is only one pleiotropic QTL or more linked QTL at these loci, we conducted GWAS by including the most significant SNPs (shown in Table 1 Fig. 3). This could be explained by the existence of multiple linked QTL in this region. Thus, higher-density markers or sequence data should be explored to clarify this issue in the near future. Of note, the association signals for C20:0/C18:0 on SSC16 vanished in Erhualian and Laiwu populations when corrected for the effect of the top GWAS SNPs (ASGA0072949 in Erhualian and ASGA0072968 in Laiwu). These SNPs were located in an adjacent region of ~500 kb (Table 1). This leads us to assume that a common causal variant at the SSC16 locus is responsible for C20:0/C18:0 in the two populations. Given short linkage disequilibrium (LD) extents across Chinese pig populations 25 , the lead SNPs are most likely very close to the causal variant.
Plausible candidate genes at significant loci. A list of strong candidate genes have been highlighted by our and other previous studies [4][5][6][7]21,22 , such as SCD on SSC14 for C18:0, ELOVL5 on SSC7 for C20:1n9, ELOVL6 on SSC8 for C16:1n7, ELVOL7 on SSC16 for C20:0, FASN on SSC12 for C16:0, ACSBG1 on SSC7 for C20:2n6 and MTTP on SSC8 for C14:1n5. By conducting GWAS on fatty acid metabolic traits, we herein proposed three novel candidate genes at the genome-wide significant loci. At 9.13-12.86 Mb on SSC2, we detected a significant locus comprising 11 consecutive SNPs for C20:3n6/ C18:2n6 and C20:4n6/C20:3n6 in Erhualian pigs. The top SNP (ASGA0008884) at this locus locates in the fifth intron of fatty acid desaturase 2 (FADS2) gene. FADS2 is a rate-limiting enzyme in the synthesis of long-chain polyunsaturated fatty acids through the introduction of double bond into delta-6 carbons in mammals. C20:3n6 and C18:2n6 are delta-6 unsaturated fatty acids. A nucleotide insertion in the transcriptional regulatory region of FADS2 causes human fatty acid delta-6-desaturase deficiency 26 . Variants of FADS2 are known to affect omega-6 and omega-3 milk fatty acids in cows 27 . Thus, FADS2 is a promising candidate gene for the SSC2 locus.
At 57.80-61.49 Mb on SSC12, a significant locus for C20:4n6/C20:3n6, PUFA/SFA and SFA was found in the Laiwu population. The top SNP (ALGA0067099) locates between the MYH13 and MYH1 genes that have no obvious function related to fatty acid metabolism. However, comparative genomic analysis between human and pig shows that an interesting gene, SREBF1 (sterol regulatory element binding transcription factor 1), is located in a homologous region (around 63.20 Mb) 1.8 Mb upstream of the locus. SREBF1 is a transcription factor involved in fatty acid synthesis regulation. It initiates the transcription of more than 33 target genes that participate in synthesis of cholesterol, fatty acids, triglycerides, and phospholipids in mice 28 . So further studies are worthwhile for the SREBF1 gene as a candidate for the SSC12 locus.
Gene networks and metabolic pathway. We explored GeneMANIA 20 to characterize enriched functional processes and gene networks involving the candidate genes highlighted for both significant (Table 1) and suggestive ( Table 2, also see below) loci. Most of candidate genes were enriched in biological processes related to fatty acid metabolism, such as fatty acid metabolic process (FDR = 1.17E-18), acylglycerol and neutral lipid metabolic process (FDR = 3.16E-16), triglyceride metabolic process and long-chain fatty-acyl-CoA metabolic process (FDR = 6.68E-15), fatty-acyl-CoA metabolic (FDR = 1.46E-14) and fatty-acyl-CoA biosynthesis process (FDR = 9.02E-14, Supplementary Table 3). A total of 404 links were observed in the gene networks related to fatty acid metabolism, which included additional 19 genes that were not identified in this study ( Supplementary Fig. 4, Supplementary Tables 3 and 4). The 19 genes are also functionally related to fatty acid metabolism. Moreover, many genes co-occurred in multiple networks. For example, the ELOVL family genes and FADS family genes appeared in many pathways, including fatty acid metabolic and biosynthetic process, carboxylic acid biosynthetic process and long-chain fatty acid metabolic process. This emphasizes the importance of the ELOVL and FADS family genes as key driver enzymes in fatty acid elongation and desaturation. Altogether, the interactive analysis of candidate genes gives us a visually understanding of metabolic pathway for fatty acid composition (Fig. 4).
Suggestive loci. The Bonferroni-corrected threshold is too stringent to detect a significant locus with moderate or small effects 30 . Thus, we defined 12 suggestive loci on nine chromosomes (Table 2)

Discussion
GWAS on metabolic traits identifies novel loci for fatty acid composition. To our knowledge, although QTL for fatty acid indices have been reported in previous studies 4,6,21,35,36 , this is the first GWAS on fatty acid metabolic traits in multiple diverse pig populations. In total, we identified 867 genome-wide significant SNPs, 97.25% of which locate in noncoding regions, indicative of complex regulatory mechanisms for fatty acid metabolism. Of note, our findings not only replicated the previously reported loci for fatty acid composition, but also detected four novel significant loci, including the SSC2 locus (9.13-12.86 Mb) for C20:3n6/C18:2n6 in Erhualian, the SSC12 (57.80-61.49 Mb) locus for C20:4n6/C20:3n6 in Laiwu, the SSC13 locus (28.52-31.47 Mb) for C16:1n7/C16:0 in DLY and the SSCX locus (124. 40-125.87 Mb) for C20:4n6/C20:2n6 in Laiwu. The four loci are population specific loci (Figs 1 and 2). This supports the assumption that GWAS on metabolic ratios can increase the power of GWAS compared to that of association of individual metabolites. As pointed by Petersen et al. 37 , if two metabolites are a product-substrate of an enzymatic reaction, then ratios between their concentrations are potential proxies of the enzymatic reaction rate. When the SNPs in an enzyme-coding gene show much The illustration shows the step-by-step synthesis of fatty acids. Significant loci have been identified for the ratios of substrates to products linked by red arrows. Plausible candidate genes that we highlighted for significant loci in this study are indicated alongside red arrows. For fatty acids linked by blue arrows, the loci for the metabolic ratios between these fatty acids remain unexplored. stronger (ten times or more) association with the metabolic ratio than with two individual metabolites in GWAS, then the enzyme is likely the one responsible for conversion of substrate into product, even if the molecular function of that enzyme was not already known. So using metabolic ratios as quantitative traits for GWAS may allow us to deduce new enzymatic activities. For example, variants proximal to the FADS2 gene are significantly associated with the ratio of C20:4n6/C20:3n6. The findings enable us to deduce that FADS2 may catalyze the conversion of C20:4n6 from C20:3n6, which remained unknown before this study.

GWAS on metabolic traits increases association strength at several significant loci.
In this study, GWAS on fatty acid metabolic traits increased association strength at several significant loci for fatty acid contents. For example, we have previously identified a significant locus on SSC16 for C20:0 content across the five populations. The top SNPs at 43.53 Mb (DRGA0016155, P = 4.32E-31), 34.71 Mb (ASGA0072949, P = 6.97E-13) and 45.31 Mb (DRGA0016169, P = 7.41E-14) on SSC16 explained 31.79%, 12.07% and 25.50% phenotypic variance in the DLY, Erhualian and Laiwu populations, respectively. Interestingly, by conducting the GWAS for C20:0/ C18:0, more SNPs on SSC16 (N = 62, 51 and 61) were detected in the three populations. The top SNPs at 43.53 Mb (DRGA0016155, P = 7.69E-37), 34.71 Mb (ASGA0072949, P = 3.92E-18) and 35.19 Mb (ASGA0072968, P = 3.50E-17) explained 39.97%, 20.21% and 31.16% of phenotypic variance in these populations, respectively. This again illustrates that GWAS on fatty acid metabolic traits could result in more associated SNPs with stronger association strength explaining more phenotypic variance. This is consistent with previous reports that GWAS on metabolic ratios as intermediate phenotypes can reduce the variance of the dataset and yield enhanced association statistics 10,37 . The deduced metabolic pathway of fatty acids advances our understanding of the genetic mechanism of fatty acid composition. GWAS on fatty acid metabolic traits provide us important clues to deduce the fatty acid metabolic pathway (Fig. 4), in which different genes are responsible for the step-by-step synthesis of fatty acids. We noted that several loci were population specific, indicating that variants in some genes in the pathway may exert their effects on fatty acid metabolism in a population-specific manner. For example, the locus at 134.54 Mb on SSC7 containing ELOVL5 was detected for C20:1n9/C18:1n9 and C20:2n6/C18:2n6 in F 2 and Erhulian pigs. So the ELVOL5 is more likely to elongate C18:1n9 and C18:2n6 acyl-CoAs to C20:1n9 and C20:2n6 acyl-CoAs in the two populations. In Erhualian population, the same locus was also associated with C20:3n3/C18:3n3; thus, ELOVL5 could specifically elongate C18:3n3 acyl-CoA to C20:3n3 acyl-CoA in this population. Some genes appeared to exhibit pleiotropic effects in multiple metabolic steps. For example, the ELOVL6 gene could not only catalyzes two carbons to C16:0 acyl-CoA to synthesize C18:0 acyl-CoA, but also mediates the synthesis of C18:1n9 acyl-CoA from C16:1n7 acyl-CoA. This is in conflict with the previous assumption that ELOVL6 is only involved in the elongation of C12-16 saturated fatty acid to C18 saturated fatty acid in mammals 38 . Although the deduced metabolic pathway allows us to better understand substrate-product relationships and the genetic mechanism of fatty acid compositions, more experimental assays are needed to validate the deduced pathway.
Biological, mediate and spurious pleiotropy. Pleiotropy widely exists in species. In the human genome, 16.9% of genes and 4.6% of SNPs show pleiotropic effects 39 . Generally, there are three kinds of pleiotropy: biological, mediate and spurious pleiotropy 40 . We detected a significant locus at 121.35 Mb on SSC14 for C16:1n7/ C16:0 and C18:1n9/C18:0 in the F 2 and DLY populations. Interestingly, the candidate gene SCD within this region catalyzes conversion of both C16:0 and C18:0 to C16:1n7 and C18:1n9. Thus, it is likely that a common causative mutation with the SCD gene underlie the pleiotropic locus on SSC14. The locus at 27.45-80.45 Mb on SSC7 containing 210 SNPs influences 12 metabolic traits in abdominal fat and longissimus dorsi muscle in the F 2 population (Table 1). We found high LD extents within the chromosome region harboring these 210 SNPs in this population ( Supplementary Fig. 5a). Thus, it is conceivable to observe that the association signals of all these 210 SNPs disappeared when conditional on the effect of the top SNP (ASGA0033714) at this locus by treating it as a fixed effect in the GWAS model ( Supplementary Fig. 3). This observation support the locus as a biological pleiotropy. However, we cannot rule out the possibility that different causative mutations having strong LD with the top SNP may co-localize at this locus. We note that the top SNP has moderate LD (r 2 < 0.5) with these 210 SNPs ( Supplementary Fig. 5a), indicating that the causative mutation underlying the SSC7 locus is likely located in the vicinity of the top SNP. The pleiotropic QTL with a beneficial allelic effect on more than one trait warrant further investigations to characterize the underlying mutations, which would enable us to develop novel molecular breeding tools for optimizing fatty acid compositions in pork.
The 0.4-8.29 Mb region on SSC12 was significantly associated with seven metabolic traits in the Erhualian population. When the top SNP (MARC0063090) was fixed in the GWAS model, the association signal for C18:1n9/C16:1n7 still surpassed the genome-wide significant threshold with the top SNP (ASGA0052511) at 2.62 Mb on this chromosome ( Supplementary Fig. 3). We found that the LD between the two top SNPs (MARC0063090 and ASGA0052511) is relatively low (r 2 = 0.38, Supplementary Fig. 5b). This supports our assumption that the locus is a spurious pleiotropic one and different causative variants are in LD with the lead SNPs.