Copy number variation-based genome wide association study reveals additional variants contributing to meat quality in Swine

Pork quality is important both to the meat processing industry and consumers’ purchasing attitude. Copy number variation (CNV) is a burgeoning kind of variants that may influence meat quality. In this study, a genome-wide association study (GWAS) was performed between CNVs and meat quality traits in swine. After false discovery rate (FDR) correction, a total of 8 CNVs on 6 chromosomes were identified to be significantly associated with at least one meat quality trait. All of the 8 CNVs were verified by next generation sequencing and six of them were verified by qPCR. Only the haplotype block containing CNV12 is adjacent to significant SNPs associated with meat quality, suggesting the effects of those CNVs were not likely captured by tag SNPs. The DNA dosage and EST expression of CNV12, which overlap with an obesity related gene Netrin-1 (Ntn1), were consistent with Ntn1 RNA expression, suggesting the CNV12 might be involved in the expression regulation of Ntn1 and finally influence meat quality. We concluded that CNVs may contribute to the genetic variations of meat quality beyond SNPs, and several candidate CNVs were worth further exploration.

As a global food source, pork has unquestionable significant implications on the livelihood of human beings. Pork quality is important to the meat processing industry and consumers' purchasing attitude 1 , thus, it is one of the main selection objective in breeding plans for most of the pig breeding organizations and enterprises 2 . Meat quality traits are complex and usually influenced by multiple genes or QTLs, therefore, the genetic improvement of these traits is rather slow. However, genome-wide association studies (GWAS), linkage mapping, and selective sweep analysis have been successfully employed to investigate significant gene markers for pH, tenderness, meat color, intramuscular fat (IMF) content [1][2][3][4][5][6][7][8][9] .
To date, a total of 12,618 QTLs from 461 publications have been recorded in PigQTLdb (http://www. animalgenome.org/cgi-bin/QTLdb/SS/index, released at Feb 11, 2015) 10 . Among these QTLs, more than half (7,014) were associated with meat and carcass quality traits. However, the limited density of microsatellite (SSR) markers resulted in inaccurate QTL mapping (Some segments are larger than 20 cM 11 , even cover a whole chromosome 12 ). Long-term fine mapping experiments are needed to refine their locations and investigate causative variants 13 . Additionally, although genome wide association studies identified significant SNPs associated with meat quality traits, these SNPs explain a small portion of the genetic variance 2,4 . Alternative variances, which could explain the "missing heritability" of meat quality traits, were urgently needed 14 .
Copy number variations (CNVs) are currently accepted as a common source of genetic variation, and the "missing heritability" could partially be explained by CNVs as reported in several human studies [15][16][17][18][19] . In animals, there is strong evidence of the effects CNVs may have on disease resistance and economically important traits, such as milk production 13 , gastrointestinal nematodes resistance 20 , residual feed intake 21 , Marek's disease-resistance 22 , late-feathering 23 , and squamous cell carcinoma of the digit 24 . Min pig is well known in China for its excellent meat quality (with 5% IMF in LM at 240-day-old) and good cold resistance capacity, and Large-white pig is a normal lean meat type breed with normal meat quality. The intercross of Min × Large-white is a good model to analyze the genetics of meat quality 2 . The objective of this work was to perform systematic CNV association analysis with meat quality traits using the Porcine SNP60 Genotyping BeadChip (Illumina, San Diego, CA, USA), analyze the joint or independent effects of CNVs and SNPs, and provide some helpful information to identify genetic markers that may be suitable for inclusion in genetic improvement program.

Results
Trait properties and correlations. Two IMF traits (IMF content and marbling), 6 meat color traits (color L6*, color a6*, color b6*, color L24*, color a24*, and color b24*), 2 pH traits (pH24 and pH6), moisture, and shearing force (SFN) were analyzed in this paper. The summary of means, standard deviations (SD), genetic and phenotype correlations of the traits are provided in Table 1. From the result, there was a small difference between genetic and phenotype correlations. As respect, the genetic correlations between marbling and IMF, between each trait for meat color, between 2 pH values were high. The correlations between different kinds of traits such as color with IMF, pH with IMF, were low except the moisture and IMF.
CNV segmentation and genotyping. In the 678 samples of the three-generation pigs, a total of 32,544 distinct segments were detected using the multivariate method of CNAM in SVS. After merging across samples, 48 nonredundant CNVs were left for subsequent association test (Table S1). Within these 48 segments, each sample was genotyped (i.e., called as a loss, neutral or gain event) according to a three state model with strict threshold levels of marker mean ± 0.5. Since the multivariate CNAM method was developed to identify common CNVs, only those segments with frequencies above 0.4% were retained for further analysis in order to filter away false positive calls. A total of 15 CNVs ranging in size from

Quality assessment of CNVs by using qPCR and NGS data.
We first compared all of the 15 CNVs with the results from 7 previously published reports [25][26][27][28][29][30][31] . Twelve out the 15 CNVs (80%) were found previously reported ( Table 2 and Table S2), and the remaining 3 non-overlapping CNVs (CNV2, 6 and 13) had high frequencies (ranging from 26 to 83%) and large marker mean changes (ranging from − 0.39 to − 1.24, deviated from 0), suggesting they are probably real CNVs (Table 2). Two complementary methods, quantitative PCR (qPCR) and next generation sequencing (NGS), were also performed to confirm the existence of detected CNVs. In the qPCR validation, we systematically assessed the overall agreement rate of detected CNVs with qPCR results. All the primers and results of qPCR are listed in Table S3 and Table S4. Totally, the detection power for the qPCR validation is 73.3% (11/15). In the NGS validation, we chose 12 F0 pigs for CNV calling, the sequencing depth of coverage for each animal varied from 4.7× to 8.4× . All of the NGS-based CNVs which were overlapped with significantly associated CNVs are listed in Table S5. All of the CNVs were confirmed with the Read depth (RD) method ranged from 0 to 4.53 32 . Detailed NGS analyses will be presented in a separate manuscript.

CNV association analysis.
All of the 602 F2 generation pigs were employed to test association between CNVs and rEBV data. We identified a total of 8 CNVs that were significantly associated with at least one trait using a linear regression model ( Fig. 1 & S1 and  was significantly associated with pH6, IMF and Marbling, three QTLs for CIE-color a* (QTL 21403) 35 and Percentage type IIb fibers (QTL 7036 and 7021) 36 were overlapped. The CNV9 (chr10: 9,369,752-9,462,206) was also near the previously reported meat quality QTL regions (pH 24 hr post mortem (ham QTL 18702) 9 , pH 24 h post mortem (ham QTL 18681) 9 and CIE-color a* (QTL 3067) 37 ). Based on the Sscrofa10.2 sequence assembly, pig gene annotations within the 8 CNVs and flanking regions (500 Kb in both downstream and upstream directions) were summarized in Table S8. Among the 80 genes retrieved from the 8 regions, 22 cases were uncharacterized protein coding genes, 7 cases were RNA coding genes, and 51 cases were known protein-coding genes. Since only a limited number of genes in the pig genome have been annotated, we converted the pig official gene symbols to orthologous human genes by BioMart before Gene Ontology (GO) and pathway analysis. Ten statistically significant GO terms (P < 0.05, Table S9) and none significant Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways were identified. A total of 12 detected genes were associated with GO terms consisting of secretion by cell, G-protein signaling, coupled to cAMP nucleotide second messenger, cAMP-mediated signaling, secretion, G-protein signaling, coupled to cyclic nucleotide second messenger, cyclic-nucleotide-mediated signaling, neuromuscular junction development, positive regulation of peptide secretion, positive regulation of cell proliferation, cell death, and death.
All of the 51 protein coding genes were searched on GenBank for their other functional information. NOD1 (the nucleotide-binding oligomerization domain-containing protein 1 gene) was found overlapped with CNV15 (chr18: 46,776,812-46,983,072), and was expressed increasingly in the adipose tissue of women with gestational diabetes 38 . Approximately 25 Kb downstream of the same CNV, pleckstrin homology domain containing, family A (phosphoinositide binding specific) member 8 gene (PLEKHA8) encodes the protein of the four-phosphate adaptor protein 2 (FAPP2) which could connect vesicular transport with lipid synthesis 39 . Netrin-1 (Ntn1) gene which covers CNV12 (chr12: 56,893,678-57,020,468), has been reported to be highly expressed in obese but not lean adipose tissue of humans and mice 40 . Table S10. A total of 24 SNPs were genome-wide significantly associated with IMF, Marbling, Moisture, SFN, Ph6, a6* and a24*. Among the 24 significant SNPs (Table 3 and Fig. S2), 18 SNPs were located on Chromosome 12, one SNP located on Chromosomes 2 and 3 respectively. Eight SNPs were significantly associated with more than one trait. There were many overlapping significant SNPs for IMF and moisture (5/5 of Relationship between associated CNVs and associated SNPs. For all of the 8 possible combinations of CNVs and traits, we found no significant SNPs directly overlapping with them. Haplotype analysis for the CNV regions which included 25 SNPs both downstream and upstream of associated CNVs were used to detected the linkage relationship between CNVs and neighboring SNPs. And the results (Figs 2 and 3) showed four cases (CNV1, CNV12, CNV13, and CNV15) were enclosed in a haplotype block with other SNPs, none cases where CNVs directly overlapped with significantly associated SNPs, and only the block containing CNV12 is adjacent to IMF and marbling significantly associated SNPs (about downstreams of 300 kb).

Discussions
In conventional CNV discovery studies, researchers usually try to identify as many CNV regions as possible. But in this CNV-based GWAS, the algorithm intended to identify the common CNVs shared among samples in order to detect associations with meat quality traits. Thus, only 48 CNVs were detected and only 15 CNVs were retained after quality control. However, the validation results indicate that all the 15 CNVs may be real. Our results indicated that there is a small discrepancy (27.6%) between qPCR and SVS 8.2 CNV callings. As we know, small variations such as SNPs, small indels may influence the hybridization of the qPCR primers and finally influence the amplification efficiency. Moreover, in this study, the overlapping rate between NGS RD-based CNV and SVS 8.2 CNV callings (100%) is better than previous NGS reports, and this is probably because we use the F0 generation pigs.
Illumina Porcine SNP60 BeadChip is designed using Duroc, Landrace, Pietran, and Large White SNP information. GWAS research using this chip have been successfully carried out in intercross population such as Large white × Min, Large white × Erhualian, Iberian × Landrace 2,4,41 . In the results of CNV-based association, we found 8 CNVs were significantly associated with at least one meat quality trait. These  results indicate that CNVs may contribute to the difference of meat quality. The results reveal alternative variations can explain the missing heritability of complex traits. Among these 8 CNVs, one of the interesting CNVs is CNV15. When analyzing genes within and near CNVs, we found two genes, one gene (NOD1) within the CNV and one gene (PLEKHA8) near the CNV. The PLEKHA8 gene encodes the protein of the four-phosphate adaptor protein 2 (FAPP2) which could connect vesicular transport with lipid synthesis 39 . FAPP2 has a pleckstrin homology (PH) domain, which can bind selectively phosphatidylinositol-4-phosphate 42 . And as CNV15 was significantly associated with pH24, we proposed that it might have some relationship with FAPP2. Another interesting CNV was CNV12. After retrieving the sequence of CNV12 and blast the sequence with NCBI database, we found that CNV12 was located in the predicted gene of Ntn1. As the Blast results match one 853 bp pig EST (BW980235 full-length enriched swine cDNA library, adult intestine Sus scrofa cDNA clone ITT010048D02 5′ , mRNA sequence), we inferred that there may be some problems in the genome assembling. In order to explore the relationship between CNV12 and Ntn1, we first carried out qPCR to investigate the relationship between copy number and EST expression of CNV12 and then investigate the correlation between the expressions of CNV12-EST and Ntn1-RNA. The primers were shown in Table S11, and the results were shown in Table 4 and Fig. 4. The expression pattern was similar between the DNA dosage and EST expression, and similar pattern appeared in the expressions of CNV12-EST and Ntn1-RNA, suggesting CNV12 might be involved in the regulation of Ntn1 directly. Moreover, 20 individuals from high (> 5.3) and low (< 1.4) IMF groups were used to explore the relationship between Ntn1-RNA expression and IMF. Figure 4 also showed that the fold changes of Ntn1-RNA expression is significant different between the two groups (with means of 1.49 and 0.76, P values < 0.05). In previous reports, Ntn1 was highly expressed in obese but not lean adipose tissue of humans and mice 40 which is consistent with our results, and one of Ntn1 receptors, adenosine a2b receptor (a2bR) could inhibit adipogenesis 43 . Thus, we inferred that increased Ntn1 may bind with a2bR, and the decreasing of a2bR may lead to less inhibition effects on adipogenesis.
In current study, we carried out the first genome wide CNV association analysis using CNVs in pig population. However, as the Multi-variate algorithm are limited to detect small, common CNVs, some CNVs beyond the detection of Multi-variate method and the rare CNVs involved in complex traits are remain largely unknown 44 . Further, the estimation effect of CNV contributing to complex trait by integrating both SNPs, CNVs and other genomic variants will further help comprehensively understand the mechanism underline complex quantitative trait 45 . In the analysis of relationship between associated CNVs and associated SNPs, we found only one block is adjacent to tagged SNPs. Our results revealed a relatively lower CNV tagging rate comparing to cattle study 13 , this finding may be due to the limited sample size for common CNVs detection in CNV analysis. Thus, further analyses with a large sample size are needed to explore the precise relationship between CNVs and neighboring SNPs in pig.
In summary, previous SNP-based GWAS have been successfully used to identify significant genes or loci for complex traits. In this CNV-based GWAS study, our results indicate that meat quality traits were probably influenced by CNVs, and some of the CNVs may independently play their roles in determining   the traits. These results provide us with an added source of variation to explain the missing heritability of complex traits, and the joint analysis of CNVs and SNPs could be a powerful way to potentially identify the causes of complex traits. The several candidate CNVs associated with meat quality such as CNV12 in SSC12 reported in this paper are worth further exploration.

Methods
The methods were carried out in accordance with the approved guidelines.
Ethics statements. All  Association studies based on SNPs. Mix Model and Regression-Genomic Control (GRAMMAR-GC) method was used for the associated analysis between SNPs and meat quality traits 46,47 . The genome-wide significance threshold was decided using Bonferroni correction, in which P-value (0.05) was divided by the number of effective SNPs (12,039) estimated using simple M method 48 .  51 . And as the average distance between SNPs was about 40 K in the PorcineSNP60 chip, 25 SNPs on the both upstream and downstream were selected directions of each CNV. All of the significant combinative CNVs (P-values < 0.05 after FDR correction) and SNPs (P-values < 4.15E-6) were used to analyze the relationship between significantly combinative CNVs and SNPs following the method described by Xu et al. 13 .

Haplotype block analysis and relationship between significantly combinative CNVs and
Validation of qPCR and next generation sequencing. The qPCR amplification was performed using an ABI 7900HT instrument (Applied Biosystems, Inc., Foster City, CA) in 384-well optical PCR plates. SYBR ® Green primers were designed to query CNVs using the Primer 6 software. A 15-mlsystem containing 15 ng of genomic DNA, 150 nM each for the primers, and SYBR ® Select Master Mix (ABI part number 4472908) were used for the reaction. The glucagon gene (GCG) 52 was used as control of single copy control. Copy number was calculated by the method of 2 −ΔΔCT 53,54 , where Δ CT is the differential value of target region cycle threshold (CT) of and the control region CT. And moreover, 2 −ΔΔCT stand for the comparison of the Δ CT value of samples with CNV to those without CNV. The PCR cycle was: 2 min at 50 °C, 10 min at 95 °C, 40 cycles of 15 sec at 95 °C and 1 min at 60 °C. A list of these 16 pair of primer sequences (CNVs and GCG) is shown in Table S3. Twelve pigs (8 Min pigs and 4 Large White pigs) in the F0 generation were sequenced using Illumina HiSeq 2000. Every pig was sequenced with paired-end reads (100 bp) in two 500 bp insert size genomic DNA libraries. All paired-end reads were mapped to the genome Sscrofa10.2 (ftp://ftp.ensembl.org/ pub/release-77/fasta/sus_scrofa/dna/Sus_scrofa.Sscrofa10.2.dna.toplevel.fa.gz) using BWA v0.5.9 55 with the parameter(− t 4 − k 32 − M and − R), and the BAM data were merged and sorted using Samtools v0.1.18 56 . The CNVs were detected using CNVnator v0.3 software according to the previous studies with the parameter (-call 100) 32 . CNV calls were filtered by the criteria of P-value < 0.01and size > 1 Kb. Calls overlapping with gaps which are larger than or equal to 5 bp in the reference genome were also excluded.
Gene content and functional analysis. The pig annotated genes were downloaded from BioMart (http://www.biomart.org/). Genes overlapping with detected CNVs and near the detected CNVs (< 100 Kb) were picked out for further analysis. Annotation analysis were performed with the DAVID (http://david.abcc.ncifcrf.gov/) 57 for Gene Ontology (GO) terms 58 and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis 59 to provide insight into the functional enrichment of copy number variable genes.