Scanning of selection signature provides a glimpse into important economic traits in goats (Capra hircus)

Goats (Capra hircus) are one of the oldest livestock domesticated species, and have been used for their milk, meat, hair and skins over much of the world. Detection of selection footprints in genomic regions can provide potential insights for understanding the genetic mechanism of specific phenotypic traits and better guide in animal breeding. The study presented here has generated 192.747G raw data and identified more than 5.03 million single-nucleotide polymorphisms (SNPs) and 334,151 Indels (insertions and deletions). In addition, we identified 155 and 294 candidate regions harboring 86 and 97 genes based on allele frequency differences in Dazu black goats (DBG) and Inner Mongolia cashmere goats (IMCG), respectively. Populations differentiation reflected by Fst values detected 368 putative selective sweep regions including 164 genes. The top 1% regions of both low heterozygosity and high genetic differentiation contained 239 (135 genes) and 176 (106 genes) candidate regions in DBG and IMCG, respectively. These genes were related to reproductive and productive traits, such as “neurohypophyseal hormone activity” and “adipocytokine signaling pathway”. These findings may be conducive to molecular breeding and the long-term preservation of the valuable genetic resources for this species.

The differences in allele frequencies were correlated with the effects of those SNPs on various production traits. For instance, Xu et al. identified some of less-known genes under positive selection correlated with milk production traits such as LAP3 and SAR1B, associated with hair length, namely, FGF5 6 , and related to reproduction traits such as TSHR, GHR and BMP15 7,8 . However, studies of selection signature on goats are still incipient compared with studies of other livestock. The goal of the present study was to identify a number of candidate genes for available to study based on allele frequency differences and population genetic differentiation.

Results
Sequencing. The genomes of 12 goats from two genetically diverse and geographically distinct indigenous breeds consists of DBG and IMCG ( Fig. 1) were sequenced to average 6.02-fold coverage. A total of 192.74 G raw data generated 191.57 G clean data after a quality check and filtering by removing the adapter of paired reads or low quality reads. The effective rate and the ratio of clean data to raw data were 98.79% to 99.64%, respectively. We also measured the GC content distribution (45.06% of our data versus 41.75% of the reference sequence). Effective sequencing reads were aligned to CHIR_1.0 reference genome, resulting in ~100 million mapped reads. The mapping rate in different individuals were varies from 98.11% to 98.69%. The percentages of reads sequenced once or four times per bp were > 97.8% and 62.72%, respectively (Supplementary Table 1).
Variation discovery. After assembly and mapping of clean reads, we identified 5.03 million SNPs and 334,151 Indels (insertions and deletions) as average. Furthermore, the distributions of the SNPs were determined in the goat genome that there were 37,684 (0.75%) and 38,976 (0.77%) SNPs in the exon in DBG and IMCG, respectively, while the intronic regions contained 28.39% and 28.04% of the SNPs in DBG and IMCG, respectively. SNPs within intergenic regions accounted for up to 69.51% and 68.09% of the SNPs in DBG and IMCG, respectively ( Table 1). The results indicated that similar heterozygosity rates existed between the two populations (1.1622 of the DBG population versus 1.1605 of the IMCG population). The number of SNPs in each category of mutations in the goat genome are shown in Table 2. As shown in Table 2, the ratio of transition (ts) to transversion (tv) was estimated to be 2.3347 in the DBG goats, slightly lower than that in the IMCG goats (2.3552). The detection of variation type showed that the classes of T: A > C: G or C: G > T: A were the primary types of variation (Supplementary Figure 1). In addition, the results showed that intergenic regions contained 66.44% of the Indels, intronic regions contained 30.31% of the Indels, and exons contained only 0.21% of the Indels ( Table 1). The heterozygosity rate of Indels between the two goat populations was remarkably consistent (0.0602 of the DBG population versus 0.0600 of the IMCG population).
Population structure analysis. Population substructure was investigated using Clustering, MEGA and ADMIXTURE softwares based on using genomic SNPs. We ran Admixture 1.22 for determining genetic backgrounds of samples and indicated two breeds been almost diverged but there is a little gene introgression in DBG when K = 2 (Fig. 1b). The results of principal component analysis (PCA) analysis showed that three principal components (PC1, PC2 and PC3) differentiated DBG and IMCG individuals (Fig. 1c). The neighbor-joining (NJ) tree confirmed these results and successfully divided into two populations displaying genetically distinct clusters (Fig. 1d). The clear genetic divergence between DBG and IMCG showed that the individuals chosen could be used for further exploring their genomic features.
Analysis of Selective sweeps. Those genomic regions with ZHp value exceeding the empirical threshold level of − 5 across the two populations were obtained, that contain 155 and 294 candidate regions in the DBG and IMCG populations, respectively. The top regions were located on chromosome 8 in DBG goats (41890001-41950001 bp) and on chromosome 2 in IMCG goats (113850001-113900001 bp), respectively ( Fig. 2a,   the first generation goat genome annotation information to identify candidate genes that underwent sweeps. A total of 86 and 97 genes in DBG goats and IMCG goats respectively were identified (Supplementary Table 5). Next, Fst values were measured, which a special statistic was used to detect the selection signature based on genetic differentiation that resulted from genetic drift and selective pressure. The result showed that a total of 368 putative selective sweep regions containing 164 candidate genes exceeding the empirical threshold level of 4.5 were obtained (Fig. 3, Supplementary Tables 3 and 5). To further explore the sweep regions, those regions that contained both low heterozygosity and high genetic differentiation were scanned and contained a total of 239 regions that ranked in the top 1% based on the |ZHp| and ZFst value in both groups. There were 239 and 176 outliers detected in the DBG and IMCG, respectively ( Fig. 4a,b, Supplementary Table 4). There are 135 and 106 gene candidates in DBG and IMCG, respectively (Supplementary Table 5). Gene ontology (GO) and pathway analysis (KEGG) for candidate genes under selection were performed in order to further explore the functions of the selective sweep gene candidates in detail. All P < 0.05 GO terms and KEGG pathways were deposited in Supplementary Tables 6 and 7, respectively. In total, we annotated 386 GO entries (Supplementary Table 6) and 31 KEGG pathways (Supplementary Table 7).

Discussion
Selection, which appear to have left detectable signatures of selection within the animal genomes is a vital driving force of evolution. Since the dawn of agriculture, artificial selection has continuously added to the existing pool of phenotypic variation as same as natural selection fuels the generation of biodiversity on earth. The programs of selection signature have identified hundreds of regions or genes targeted by important traits that formed by natural or artificial selection inchicken 8 , cattle 6 , swine 11 and sheep 12 . It's essential to identify candidate genes of important economic traits used for further studies in goats.
Hair length is a characteristic trait in cashmere goats in comparison to meat and milk goats. The hair length on IMCG (14.92 ± 4.41 cm) used in this study is significantly longer than DBG (3.92 ± 0.76 cm). Hair growth was considered to be regulated by hairless gene, FOXI3 13 , hypertrichosis gene, FGF13, Trps1, Sox9 14,15 , and hair overgrowth gene, ABCA5 16 . However, natural long hair, well described angora phenotype, results from a regulator gene, FGF5, in many kinds of mammals [17][18][19][20][21] . Here FGF5 have been included in the candidate genes sets based on low heterozygosity and high genetic differentiation in IMCG population (Fig. 4, Supplementary Table 5). We detected three synonymous mutations: g. 92103757C > T (rs653636435), g. 92111741A > T (rs672199363), g. 92124227A > T (rs665580959), and four mutations that were absent from goat dbSNP database, namely g. 92124086T > G, g. 92124131A > G, g. 92124134G > C and g. 92124199C > T within the FGF5 exon sequence. Only g. 92124199G > C candidate mutation could lead to a serine-to-leucine transitionat the 224 th site in FGF5 amino acid sequence and need to further verification, although He et al. demonstrated that the expression of FGF5s, a mRNA alternative splicing of FGF5, probably act as main cause of increasing the length of cashmere fiber 22 .
The seasonal reproduction is one of the most conspicuous characters in goats, which is archived by alternation between long day and short day. Light information received by photoreceptor organ uniquely positioned to eye in mammals is transmitted to the pineal gland through the suprachiasmatic nucleus and then give rise to hormone secretion, especially melatonin 23 . Consequently, light is one of the important factors affecting goat reproduction, which is important component of efficiency in goat production system 24 . BIRC6 gene acted as "photoreceptor activity" or "blue light photoreceptor activity" was obtained based on heterozygosity as a result of differences of annual insolation duration between two populations (IMCG, 3000-3400 h; DBG, 1279 h) 1 (Supplementary  Table 5). Previous research has reported polymorphisms in the BIRC6 gene were associated with hereditary eye diseases, pseudoexfoliative glaucoma 25 . And the mutations of BIRC6 would result in abnormal embryonic development 26,27 . Walsh et al. determined blue light was indeed associated with melatonin expression in horses 28 . Goat reproductive traits are the result of the combined effects of multiple organ, such as, the above-mentioned photoreceptor uniquely positioned to eye, pituitary and even olfactory. The male goats secret pheromone, which is received by the pheromone receptor located at the olfactory epithelium or the vomeronasal organ. The pheromone signal is integrated and then induce hypothalamic-pituitary hormone cascade, finally induce an out-of-seasonal ovulation in anoestrous females, so-called "male effect" 29,30 . Two terms, "neurohypophyseal hormone activity" and "pheromone activity" (Supplementary Table 6) may probably illustrate the mechanism of this effect. The complement of these analyses may therefore accelerate the pace of improvements of goat genetics.
Goats are browsing animals, digesting crude fiber in a four-chambered stomach compared with monogastric livestock, for example swine. The rumen engaged in lipid metabolism implemented by numerous symbiotic microbialfloras 31 . The level of lipid metabolism is closely related to fat deposition, such as IKBKG 9 , LOC102190823 (Supplementary Table 5); muscle mass for example PLD2 10 (Supplementary Table 5); and milk production 32 . IDH1 gene (Supplementary Table 5), including in the term, "glutathione metabolism" (Supplementary Table 7) is significantly associated with increased milk citrate content 33 . Dissecting major-genes of production traits are difficult only depending on sequencing data. The results provided by this study can provide some routes for future researches.
In summary, the study presented here has generated 192.747 G raw data based on whole-genome re-sequencing. A total of more than 5.03 million SNPs and 334,151 Indels were identified by resequencing 12 goat samples. The selective sweep analysis revealed some interesting candidate genes and pathways affected by natural or artificial selection involving reproductive or productive traits. The works performed here provided an important resource for future quantitative trait loci mapping of the goat breeding.

Materials and Methods
Experimental samples. This study was carried out in strict accordance with the recommendations in the Guide for the International Cooperation Committee of Animal Welfare (ICCAW), which is responsible for animal care and use in China. The experimental conditions were approved by the Committee on the Ethics of Animal Experiments of Southwest University (No. [2007] 3). All experimental goats were fed a daily ration of 300-500 gconcentrate and were allowed to an unrestricted access to straw, mineral saltlick and water.
We collected a total of 12 blood samples of unrelated individuals from 6 Dazu black female goats (DBG) and 6 Inner Mongolia cashmere female goats (IMCG) (Fig. 1a). All the goats were housed in Dazu Black Goat Farm at Southwest University, Chongqing, China. Genomic DNA was extracted using a Tiangen DNA isolation kit (Tiangen Biotech, Beijing, China). In addition, we calculated the average hair length of 20 IMCG and 5 DBG, respectively with software SPSS 19.0.
Sequencing and variation calling. Each DNA sample was used to construct 350 bp paired-end sequencing libraries. Each library was sequenced with Illumina HiSeq PE150 platform according to the manufacturer's instructions at Novogene (Tianjin, China). Raw data were processed to filter out the adaptors and low quality reads resulted in clean reads using BWA software 34 . After alignment, variation calling was performed per individual using SAMtools package 35 . All the called variants were annotated using the ANNOVAR package with the gene-based, or region-based, or filter-based optionsfor further analysis 36 . Population structure analysis. We used the Cluster 3.0 software for performing PCA analysis with the population scale SNPs 37,38 . Genetic structure was estimated using the program ADMIXTURE Version 1.22 39 , which was run at K = 2 on the basis of the whole SNP dataset. The neighbor-joining (NJ) tree was constructed using MEGA 5.2 procedure based on genome-wide SNPs 40,41 . Analysis of selective sweep. Selective sweep analyses were performed by calculating heterozygosity (Hp) and population differentiation (Fst). The Hp and Fst values, respectively were converted to a standard normal distribution, denoted as ZHp and ZFst as described 42,43 . In addition, those regions that have low ZHp value and high ZFst value were scanned as candidates. To know the biological function of genes within candidate regions, the analysis of Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways were performed using Goseq (Bioconductor 2.12) and KOBAS (kobas2.0-20120208), respectively.