Genomic analysis reveals selection in Chinese native black pig

Identification of genomic signatures that help reveal mechanisms underlying desirable traits in domesticated pigs is of significant biological, agricultural and medical importance. To identify the genomic footprints left by selection during domestication of the Enshi black pig, a typical native and meat-lard breed in China, we generated about 72-fold coverage of the pig genome using pools of genomic DNA representing three different populations of Enshi black pigs from three different locations. Combining this data with the available whole genomes of 13 Chinese wild boars, we identified 417 protein-coding genes embedded in the selected regions of Enshi black pigs. These genes are mainly involved in developmental and metabolic processes, response to stimulus, and other biological processes. Signatures of selection were detected in genes involved in body size and immunity (RPS10 and VASN), lipid metabolism (GSK3), male fertility (INSL6) and developmental processes (TBX19). These findings provide a window into the potential genetic mechanism underlying development of desirable phenotypes in Enshi black pigs during domestication and subsequent artificial selection. Thus, our results illustrate how domestication has shaped patterns of genetic variation in Enshi black pigs and provide valuable genetic resources that enable effective use of pigs in agricultural production.

From early domestication to modern breeding practices, artificial selection for agriculturally important traits has shaped the genomes of domestic pigs. Guided by the reference genome of the domestic Duroc pig, considerable attempts have been made to unveil the genetic bases of phenotypes by using whole-genome wide SNP chip and resequencing approaches [1][2][3][4]  Enshi black pigs, which comprise a typical native black breed in China, are best known for their fat storage ability and cold-wet tolerance. They are mainly raised in mountainous regions with an average altitude of 800 m above sea level in southwest China 5 (Enshi Tujia minority and Miao minority autonomous prefecture) (Fig. 1). Enshi black pigs have undergone fewer systematic genetic improvement programs compared to other breeds since the 17 th century and are characterized by their average-sized head, concave and wrinkled face, well-developed limbs, concave back, tilted haunch and big belly 6 . Although they have adapted to the harsh local environment, Enshi black pigs are markedly different from wild boars, especially in terms of fat storage ability. Historically, Enshi black pigs have been intensively used in dry-cured ham production, leading to preferential selection based on their meat and fat quality traits, such as lean percentage, fat content, and eating quality 5,6 (juiciness, flavor, tenderness, pink hue and heavy marbling) (Fig. 2).
To identify signatures of selection resulting from domestication, we performed whole-genome pooled resequencing of three representative populations of Enshi black pigs (~202.21 Gb in total; ~24.07× coverage per population). Together with 96 publicly available genomes of pig breeds found in Asia (13 Chinese wild boars, 10 Korean wild boars and 73 Chinese domestic pigs) and using a total of ~4.00 trillion bases of resequencing data [7][8][9][10][11] (Supplementary Table S1), we conducted a comprehensive analysis of genetic diversity and sought to identify genomic regions under selection in the Enshi black pig.

Results and Discussion
Sequencing, SNP calling and annotation. Pooled paired-end sequencing of the three populations of Enshi black pigs generated 70.64, 67.86 and 63.71 Gb data for the genomes of the Lvcongpo (LCP), Yetinglu (YTL) and Zhongbaozhen (ZBZ) populations, respectively. To accurately detect genomic footprints left by selection, we downloaded the publicly available genome data of 13 Chinese wild boars (Supplementary Table S1). To reduce possible bias resulting from differences in sequencing coverage between the Enahi black pigs (~24.07× coverage per population) and the Chinese wild boars (~13.19× coverage per individual) and to accurately detect genomic footprints left by selection, we randomly selected ~4.80 Gb of high-quality sequencing data from each individual of 13 Chinese wild boars 7,9,10 , which simulated pooled sequencing of a Chinese wild boar population, yielding a total of 61.82 Gb of sequencing data.
A total of 248.91 Gb high-quality sequencing data from three Enshi black pig populations and a Chinese wild boar population were aligned against the reference pig genome (Sscrofa10.2) using BWA (v0.7.8) 12 . For each population, ~94.29% of the high-quality reads were mapped to the reference pig genome, of which ~81.82% were uniquely mapped, with comparable genome coverage (~13.32× to 15.70× ) between the three populations of Enshi black pig and one population of Chinese wild boar (Supplementary Table S2).
We identified ~14.09 M SNPs in four populations that were concurrently cataloged by two currently dominant algorithms (i.e. SAMtools 13 and GATK 14 ), which accounted for 88.12% and 86.76% of SNPs that were identified by SAMtools and GATK, respectively, and were used for subsequent analyses (Supplementary Figure S1, Supplementary  Figure S2), which indicated substantial genomic similarity between Enshi black pigs from the three major distributed locations (Fig. 3). Furthermore, the neighbor-joining tree of the four populations revealed two clusters (three Enshi black pig populations and a Chinese wild boar population)   Table 1). The top 1,053 genes containing the highest number (n ≥ 7) of nonsynonymous SNPs were mainly over-represented in the highly variable sensory perception categories (Supplementary Table S4), which reflects the strong reliance of pigs on their sense of smell while scavenging for food and other odor-driven behavior.  Figure S6 -8). The large overlap between the H P and F ST regions indicated that the two statistical methods detected the same events. SNPs from these CDRs formed two distinct clusters (i.e. three Enshi black pig populations and a Chinese wild boar population) (Supplementary Figure S3b).

Selective sweep analysis.
We detected 417 candidate selected genes (CSGs) in 185 CDRs (30.91 M, Supplementary Data S1), which were shared among the three populations. Most of the selected genes presented a lower degree of haplotype sharing between the Enshi black pigs and the Chinese wild boar breeds, and highly similarity of haplotypes among three Enshi black pig populations (Fig. 4). These CSGs were mainly overrepresented in developmental processes (hemopoiesis, P = 1.39 × 10 −11 ; cell differentiation, P = 2.27 × 10 −9 ; system development, P = 3.36 × 10 −2 ), metabolic processes (regulation of phosphate metabolic process, P = 7.68 × 10 −10 ; protein phosphorylation, P = 3.47 × 10 −7 ; biosynthetic process, P = 4.04 × 10 −4 ; phosphate-containing compound metabolic process, P = 1.44 × 10 −4 ; and cellular protein modification, P = 8.34 × 10 −3 ), and response to stimulus (natural killer cell activation, P = 4.76 × 10 −11 ; response to stress, P = 1.63 × 10 −4 ) ( Table 2). These rapidly evolved genes in Enshi black pigs may be responsible for the dramatic phenotypic changes that are of economic value, such as growth rate, fat storage ability and disease resistance. Six CSGs were found to be related to lipid transport and metabolism   Figure S9). The central node genes were those involved primarily in cellular and immune-related processes.
Domestication genes. To identify the key genes that play an important role in shaping the domestication of Enshi black pigs, we performed statistical analysis (Student's t test) using the identity score (IS) of the two conditions (IS between the Enshi black pigs and the Chinese wild boar breeds or IS among three Enshi black pig populations). We then selected top 10 genes (ITPR3, RPS10, ERF, INSL6, ENSSSCG00000014230, TBX19,  SFT2D2, VASN, GSK3A, and ORMDL1) with the lowest P value for further study.
To analyze the signs of selection in detail, we detected SNPs from the regions spanning these genes with highly significant effects (SNPs located in untranslated regions (UTRs), exon, and downstream/upstream of the gene). We found 34 SNPs with significantly different mutation frequency between Enshi black pigs and Chinese wild boars (Fig. 5) in RPS10, GSK3A, INSL6, TBX19, and VASN. These SNPs may affect protein coding, gene splicing, transcription factor binding, and regulation of gene expression directly or indirectly. Furthermore, we genotyped the 34 SNPs in 15 pig breeds that represent a wide range of Chinese domestic pig populations (73 individuals), 13 Chinese wild boars, and 10 Korean wild boars 11 (Fig. 6). The results demonstrated strong signatures of selection at these loci across Chinese domestic pigs that are used for pork production (i.e. muscle growth and adipose deposition). By combining the mutation frequency and genotyping results, we found that five genes were extremely different between the domestic pig breeds and Chinese wild boars. These genes may be responsible for the marked phenotypic changes produced by domestication of Enshi black pigs.
Among these five genes, two genes may be associated with body size and immunity. The first gene is ribosomal protein S10 (RPS10), which harbors three mutations in the 3′ UTR, and has also been reported to be commonly mutated in Diamond-Blackfan Anemia 16,17 . This gene provides instructions for producing ~80 different ribosomal proteins, which are vital components of cellular structures called ribosomes 18 . RPS10 may be associated with body height, body length, and longissimus muscle weight in pigs 19,20 . Interestingly, a previous genome-wide association study (GWAS) also showed that RPS10 may be associated with limb bone length (which is associated with body height and body length) 21 . Our study confirmed this conclusion. The second gene is Vasorin (VASN), which has three nonsynonymous mutations. This gene may be involved in modulating arterial response to injury by inhibiting the TGF-β signaling pathway [22][23][24] . VASN is highly expressed in vascular smooth muscle cells (hence the name) and the developing skeletal system 25 . This expression pattern indicates that VASN may indirectly influence the body size of pigs during embryonic development.
As a typical meat-lard pig breed, Enshi pigs exhibit strong selection signals in the obesity-related gene glycogen synthase kinase 3 (GSK3). GSK3 is a constitutively active, proline-directed serine/threonine kinase that   participates in a number of physiological processes that range from glycogen metabolism to gene transcription 26 .
The gene exists in two isoforms, GSK3A and GSK3B 27 . GSK3A knockout experiment in mice displayed improved glucose tolerance in response to glucose load and elevated hepatic glycogen storage and insulin sensitivity 28 , which may result in obesity 29,30 . Association analysis revealed that the GSK3A Hin1I and MspI polymorphisms were significantly associated with loin muscle area, average back fat thickness, thorax-waist fat thickness, and buttock fat thickness 31 . In addition, the gene could influence the muscle-to-meat process via glycolysis, reduced pH, and pale color 32,33 . The selection of the GSK3 gene may contribute to enhanced fat storage ability and relatively high intramuscular adipose content. The remaining two genes were involved in developmental processes and male fertility. TBX19 (known as TPIT) is a member of the T-box family of transcription factors, which are involved in the regulation of developmental processes 34 . Numerous reports demonstrated that mutation in this gene may isolate deficiency of pituitary POMC-derived ACTH [35][36][37][38] . ACTH deficiency is characterized by adrenal insufficiency symptoms, such as weight loss, lack of appetite, obesity, hypocortisolism, and low blood pressure 39,40 . Nonsynonymous mutations of this gene may exist in Tongcheng pigs 3 , and a strong selection signal was detected in all Chinese native pig in our analysis. These findings suggested that the selection of this gene exerted a comprehensive influence on all Chinese indigenous pigs. The final gene, insulin-like 6 (INSL6), is a member of the insulin superfamily, which comprises a group of structurally related proteins 41 . INSL6 is predominantly expressed in the testis, and it is necessary for the progression of spermatogenesis. Deficiency in INSL6 will cause varying levels of male infertility [42][43][44][45][46] .
These positively selected genes may serve as important genetic foundation of the evolutionary scenarios triggered by artificial selection for agricultural production of Enshi black pigs. Further studies are required to define the signaling cascades associated with these genes and their regulatory mechanisms in order to facilitate a better understanding of their roles in the formation of economically important native breeds.

Conclusions
This study detected the genomic signatures that may have shaped the domestication of Enshi black pigs in China. The genes found to be positively selected in Enshi pigs are involved in crucial biological processes such as body size and immunity (RPS10 and VASN), obesity (GSK3), male fertility (INSL6), and early development (TBX19). In addition, important mutations within these genes were also identified to enrich the pool of markers that can be used to further refine selection in these pigs in the future. Our research methods and findings should also be helpful in deciphering genomic footprints left by selection and domestication in other livestocks [47][48][49] .

Materials and Methods
Ethics statement. Animals care and all the experimentation in this study were carried out in accordance with the pre-approved guidelines from Regulation Proclamation No.5 of the Standing Committee of Hubei People's Congress, and all experimental procedures and sample collection methods were approved by the Institutional Animal Care and Use Committee of Huazhong Agricultural University, Wuhan, China (permit HZAUSW2015-0003). Animals and tissue collection. Three representative populations of Enshi black pig were raised in counties of Enshi Tujia and Miao Autonomous Prefecture in Hubei Province, China (LCP, YTL, and ZBZ; Fig. 1). Blood samples were obtained from 75 female individuals (31 from the LCP, 32 from the YTL, and 12 from the ZBZ).
Sequencing data. For each population, DNA samples from 12-32 individuals were pooled in equimolar quantities that were used to construct pair-end sequencing libraries with an insert size of 300 bp. The libraries were sequenced by Illumina Hiseq 4000 (Illumina, San Diego, CA, USA) with 150 bp paired-end reads according to the manufacturer's instructions. We also downloaded 96 publicly available pig genomes in Asia (Supplementary  Table S1), of which 13 Chinese wild boars were used to test for differentiation and possibly selection, and 10 Korean wild boars and 73 Chinese domestic pigs were used to investigate the patterns of selected loci.
To avoid reads with low-quality, we removed the following types of reads: (a) reads with ≥ 10% unidentified nucleotides (N); (b) reads with > 10 nt aligned to the adapter, allowing ≤ 10% mismatches; and (c) reads with > 50% bases having phred quality < 5; and (d) putative PCR duplicates generated by PCR amplification in the library construction process. After the low-quality reads were excluded, the remaining high-quality reads were aligned against the Sscrofa10.2 reference sequence by using Burrows-Wheeler Aligner (BWA, v0.7.8), with the BWA command of "mem -t 4 -k 19 -M -w 200". The uniquely aligned reads with ≤ 5 mismatches were used for SNP calling.

SNP calling and variation annotation.
To obtain highly confident SNPs, we employed both SAMtools (v0. 1.19) and GATK tool (v3.3) variant calling pipelines to process each pool of samples respectively. In SAMtools, base calling was conducted by using the "mpileup" command and the "-q 1 -C 50 -S -D -m 2 -F 0.002" parameters of SAMtools. The "view" command of BCFtools was used to convert the BCF files to VCF files. The VCF files were then filtered by the "vcfutils.pl" script with the use of the "varFilter -Q 20 -d 4 -D 1000" option and vcftools using the "-thin 4" option, and high-quality SNPs (coverage depth ≥ 4 and ≤ 1000, RMS mapping quality > 20, the distance between adjacent SNPs ≥ 5 bp) were retained for subsequent analysis. For GATK, the settings were used according to the GATK best practice online documentation. Results that were obtained by the two pipelines were compared using the BEDTools (v2.21.0) "intersectBed" module 50 , and only concordant variations were processed further.
To determine novel variants in our sequence data, we compared the identified SNPs with the dbSNP (Build 143) data using BEDTools and annotated the detected genetic variants using ANNOVAR 51 . Gene ontology analysis was performed by the web-based software PANTHER 52 . Functional interaction network of CSGs was performed by the Reactome FI Plugin in the Cytoscape software environment 53 .
Selective-sweep analysis. For each population, we used allele counts at variable sites to identify signals of selection in 100 kb windows (with a step size of 50 kb) through two approaches: for each window, we calculated (1) the average pooled heterozygosity, H P , and (2) the average fixation index, F ST , between Enshi black pig and Chinese wild boar. We calculated F ST using PoPoolation2 54 and H P as follows: where nMAJ is the most frequently observed allele, and nMAJ is the least frequently observed allele. Putatively selected regions in each group were located by extracting windows from the tails of the Z-transformed H P and F ST distributions by applying a threshold of two standard deviations (P ≤ 0.05, Z(H P ) < − 2, Z(F ST ) > 2). We disregarded the selection signals on chromosome X because of the extremely low rate of recombination 55 and ancient interspecies introgression 7 .
Calculation of identity score (IS). We calculated ISs to visualize haplotype sharing in pairwise comparisons at the selected genes. For each identified SNP, we determined the fraction of reads that corresponded to the reference genome allele, F, in each pig population. The IS values of individual SNPs were then calculated as IS = 1 − (|F Population1 − F Population2 |) (2), with SNPs assessed only if a minimum of one read was obtained in each population. The IS value for a gene was the mean of all IS values observed in the gene for a specific comparison.
Sequencing of the complete mitochondrial DNA (mtDNA) sequences. Ear tissue samples of five individuals were collected from the three counties (1 from LCP, 2 from YTL, and 2 from ZBZ; Supplementary Figure S4). DNA was isolated using a MicroElute Genomic DNA kit (OMEGA, USA). Nineteen pairs of primers (Supplementary Table S5) were used to amplify and sequence the complete mitochondrial DNA sequences. PCR reactions were performed using LA taq (TaKaRa, Dalian, China). PCR products were purified following agarose gel electrophoresis and then sequenced using the ABI 3730 DNA sequencer (Applied Biosystems, Foster City, CA, USA). Sequences of other representative domestic pigs and wild boars were downloaded from GenBank. The detailed information on the pig breeds are shown in Supplementary Table S6. Phylogenetic analysis. To explore the genetic relationship of the three populations of Enshi black pigs and other pig breeds, we used the five newly generated complete mtDNA sequences, together with 18 downloaded complete mitochondrial DNA sequences, to perform phylogenetic analysis; the neighbor-joining tree was constructed using MEGA (v5.0) 56 .
To explore the genetic relationship of the 4 pooled group, we also constructed another two phylogenetic trees by using SNPs in whole genome and SNPs in selected regions respectively. Homologous regions among different pig populations were identified and extracted by SNPhylo (v20160204) 57 . The corresponding SNPs in homologous regions were utilized to construct a neighbor-joining tree using MEGA.