Introduction

There are over 300 distinct breeds of goat (Capra hircus) in the world, which are distributed over all types of ecological areas with more concentrated in the tropics, dry zones and developing countries. Goats have been used for their milk, meat, hair, or cashmere and skins over much of the world. In 2011, there were more than 1.4 billion live goats around the globe, according to the UN Food and Agriculture Organization (http://kids.fao.org/glipha/). China is rich in goat genetic resources, and there are about 69 goat breeds in this country, including 58 local breeds, 8 improved breeds and 3 introduced breeds. Especially, Inner Mongolia cashmere goat (IMCG), mainly lived in Alashan, Inner Mongolia, in the north of China (~1304 m) coated white long hair, which produced primarily cashmere and had relatively low fecundity (~105%) compared with Duzu black goat (DBG, ~272%), black short hair, which are raised in Dazu County of Chongqing in the southwest of China (267~934 m) and are primarily used for meat production1 (Fig. 1a).

Figure 1
figure 1

Samples distribution and population structure.

(a) Distribution of goats sampled. The IMCG, which are white in color and are used in cashmere production, were from Alashan, Inner Mongolia, in the north of China (~1304 meters’ altitude). IMCG had relatively low fecundity (~105%) compared with DBG (~272%), which are primarily used for producing meat and are raised in Chongqing in the southwest of China (267~934 meters altitude). The geographic map was drawn by R software (https://www.r-project.org/). (b) PCA plot of two goat breeds. (c) Geneticstructure of two populations. Each color represents a cluster and each bar represents an individual. (d) NJ phylogenetic tree of 12 individuals. All pictures were modified by Adobe Photoshop CS5 (http://www.adobe.com/).

Goat is one of the earliest animals domesticated by humans. The most recent genetic analysis confirms the archaeological evidence that the wild Bezoar ibex of the Zagros Mountains are the likely origin of almost all domestic goats today. But the morphologicaland behavioral characteristics of modern domestic goats have greatly changed compared with wild progenitor, the Bezoar goat (Capra aegagrus)2. Compared with the Bezoar goat, modern goats exhibit a reduction in body size, a more docile demeanor, a series of coat color variants and the ability to adapt after domestication and breed formation, which have left detectable signatures of selection within the genomes3,4. Hence, scanning of selection signature can’t only provide straightforward insights into the mechanism of evolution events but also use for identifying functional genes of important economic traits or causative variations of phenotypic diversity.

Based on populations differentiation reflected by Fst statistic, Gibbs et al. detected some genomic regions embracing some SNPs in genes associated with meat quality (SPOCK1) and feed efficiency (ZRANB3, R3HDM1)5. The differences in allele frequencies were correlated with the effects of those SNPs on various production traits. For instance, Xu et al. identified some of less-known genes under positive selection correlated with milk production traits such as LAP3 and SAR1B, associated with hair length, namely, FGF56, and related to reproduction traits such as TSHR, GHR and BMP157,8. However, studies of selection signature on goats are still incipient compared with studies of other livestock. The goal of the present study was to identify a number of candidate genes for available to study based on allele frequency differences and population genetic differentiation.

Results

Sequencing

The genomes of 12 goats from two genetically diverse and geographically distinct indigenous breeds consists of DBG and IMCG (Fig. 1) were sequenced to average 6.02-fold coverage. A total of 192.74 G raw data generated 191.57 G clean data after a quality check and filtering by removing the adapter of paired reads or low quality reads. The effective rate and the ratio of clean data to raw data were 98.79% to 99.64%, respectively. We also measured the GC content distribution (45.06% of our data versus 41.75% of the reference sequence). Effective sequencing reads were aligned to CHIR_1.0 reference genome, resulting in ~100 million mapped reads. The mapping rate in different individuals were varies from 98.11% to 98.69%. The percentages of reads sequenced once or four times per bp were >97.8% and 62.72%, respectively (Supplementary Table 1).

Variation discovery

After assembly and mapping of clean reads, we identified 5.03 million SNPs and 334,151 Indels (insertions and deletions) as average. Furthermore, the distributions of the SNPs were determined in the goat genome that there were 37,684 (0.75%) and 38,976 (0.77%) SNPs in the exon in DBG and IMCG, respectively, while the intronic regions contained 28.39% and 28.04% of the SNPs in DBG and IMCG, respectively. SNPs within intergenic regions accounted for up to 69.51% and 68.09% of the SNPs in DBG and IMCG, respectively (Table 1). The results indicated that similar heterozygosity rates existed between the two populations (1.1622 of the DBG population versus 1.1605 of the IMCG population). The number of SNPs in each category of mutations in the goat genome are shown in Table 2. As shown in Table 2, the ratio of transition (ts) to transversion (tv) was estimated to be 2.3347 in the DBG goats, slightly lower than that in the IMCG goats (2.3552). The detection of variation type showed that the classes of T: A > C: G or C: G > T: A were the primary types of variation (Supplementary Figure 1). In addition, the results showed that intergenic regions contained 66.44% of the Indels, intronic regions contained 30.31% of the Indels, and exons contained only 0.21% of the Indels (Table 1). The heterozygosity rate of Indels between the two goat populations was remarkably consistent (0.0602 of the DBG population versus 0.0600 of the IMCG population).

Table 1 Characteristics of SNPs and indels identified in the two study cohorts.
Table 2 The number of SNPs in the genomes of the two goat breeds.

Population structure analysis

Population substructure was investigated using Clustering, MEGA and ADMIXTURE softwares based on using genomic SNPs. We ran Admixture 1.22 for determining genetic backgrounds of samples and indicated two breeds been almost diverged but there is a little gene introgression in DBG when K = 2 (Fig. 1b). The results of principal component analysis (PCA) analysis showed that three principal components (PC1, PC2 and PC3) differentiated DBG and IMCG individuals (Fig. 1c). The neighbor-joining (NJ) tree confirmed these results and successfully divided into two populations displaying genetically distinct clusters (Fig. 1d). The clear genetic divergence between DBG and IMCG showed that the individuals chosen could be used for further exploring their genomic features.

Analysis of Selective sweeps

Those genomic regions with ZHp value exceeding the empirical threshold level of −5 across the two populations were obtained, that contain 155 and 294 candidate regions in the DBG and IMCG populations, respectively. The top regions were located on chromosome 8 in DBG goats (41890001–41950001 bp) and on chromosome 2 in IMCG goats (113850001–113900001 bp), respectively (Fig. 2a,b, Supplementary Table 2). To obtain the available candidate genes, all candidate regions were annotated using the first generation goat genome annotation information to identify candidate genes that underwent sweeps. A total of 86 and 97 genes in DBG goats and IMCG goats respectively were identified (Supplementary Table 5). Next, Fst values were measured, which a special statistic was used to detect the selection signature based on genetic differentiation that resulted from genetic drift and selective pressure. The result showed that a total of 368 putative selective sweep regions containing 164 candidate genes exceeding the empirical threshold level of 4.5 were obtained (Fig. 3, Supplementary Tables 3 and 5). To further explore the sweep regions, those regions that contained both low heterozygosity and high genetic differentiation were scanned and contained a total of 239 regions that ranked in the top 1% based on the |ZHp| and ZFst value in both groups. There were 239 and 176 outliers detected in the DBG and IMCG, respectively (Fig. 4a,b, Supplementary Table 4). There are 135 and 106 gene candidates in DBG and IMCG, respectively (Supplementary Table 5). Gene ontology (GO) and pathway analysis (KEGG) for candidate genes under selection were performed in order to further explore the functions of the selective sweep gene candidates in detail. All P<0.05 GO terms and KEGG pathways were deposited in Supplementary Tables 6 and 7, respectively. In total, we annotated 386 GO entries (Supplementary Table 6) and 31 KEGG pathways (Supplementary Table 7).

Figure 2
figure 2

The Manhattan plot bsed on allele frequency differences in DBG breeds (a) and IMCG breeds (b).

Figure 3
figure 3

The distribution of Fst value across the genomes of the two goat breeds.

Figure 4
figure 4

Identification of genomic regions with low heterozygosity and high genetic differentiation in DBG (a) and IMCG (b).

Candidates affecting goat reproduction

Reproduction trait is an important component of efficiency in goat production system24. A plurality of reproduction-related categories were identified, such as, (1) Hp values: “neurohypophyseal hormone activity” (3 genes, PAIP2B, CCDC64, EPB41L5), “photoreceptor activity” and “blue light photoreceptor activity” (BIRC6) in DBG goats; “meiosis” and “meiotic cell cycle” (4 genes, TAOK1, C6H4orf22, SGOL1, SLC33A1), “meiosis I” (3 genes, SLC33A1, C6H4orf22, TAOK1) in IMCG population; (2) Fst values: “sex determination” and “mating type determination” (2 genes, PAIP2, CLEC16A), “pheromone activity” (4 genes, ZNF280D, CLEC16A, ARID1B, PAIP2). (3) Combination |ZHp| with Fst values: “gamete generation” (8 genes, SBF1, PRTG, PARD3B, KDM4C, FAT1, DMD, TPPP3, DACH2), “neurohypophyseal hormone activity” (4 genes, KDM4C, RYBP, FARP1, CELF2),“spermatid development”(3 genes, PARD3B,FAT1,KDM4C), “spermatid nucleus differentiation”,“sperm chromatin condensation”, “spermatid differentiation” and “spermatid differentiation” (3 genes, PARD3B, FAT1, KDM4C), “spermatogenesis” and “male gamete generation” (5 genes, FAT1, KDM4C, TPPP3, SBF1, PARD3B) (Supplementary Tables 6 and 7).

Candidates related to productive traits

The rumen, which is the largest compartment, encompasses numerous symbiotic microbial floras to ferment the feed for lipid metabolism10. We identified a total of 3 pathways, as followings: “adipocytokine signaling pathway” (2 genes, IKBKG, LOC102190823), “ether lipid metabolism” (2 genes, PLD2, PLA2G1B), “Glycosphingolipid biosynthesis-ganglio series” (SLC33A1), and 8 GO terms, such as, glycosphingolipid/galactolipid metabolic process (LOC102171901), galactolipid/glycolipid/sphingolipid/membrane lipid/glycosylceramide/glycosphingolipid catabolic process (LOC102171901) associated with putative lipid metabolic pathways. In addition, There were “Ras signaling pathway” (5 genes, RASGRP2, RRAS, PLA2G1B, KDR, IKBKG), “ubiquitin mediated proteolysis” (4 genes, STUB1, MID2, LOC102174728, BIRC6, UBE3B), “VEGF signaling pathway” (2 genes, PXN, KDR), “MAPK signaling pathway” (4 genes, RASGRP2, RRAS, PTPN7, IKBKG), “glutathione metabolism” (2 genes, LOC102188087, IDH1), “ribosome” (10 genes, LOC102178382, RPLP0, LOC102181054, LOC102173128, MROH8, LOC102190542, LOC102175291, LOC102176839, LOC102176407, LOC102181499), “RNA polymerase” (2 genes, POLR3F, POLR2C), “purine metabolism” (3 genes, AK3, POLR3F, POLR2C), “biosynthesis of amino acids” (2 genes, PYCR1, LOC102171894), “pyrimidine metabolism” (2 genes, POLR3F, POLR2C) and “metabolic pathways” (19 genes, LOC102179840, PANK2, LOC102178836, COX2, COX3, CYTB, COX1, ATP6, NFS1, ND4L, ATP8, LOC102171894, GLDC, ND1, ND3, ND2, ND5, ND4, ND6) (Supplementary Tables 6 and 7). Our study suggests that the genes included these terms or pathways may be most likely related to productive traits.

Discussion

Selection, which appear to have left detectable signatures of selection within the animal genomes is a vital driving force of evolution. Since the dawn of agriculture, artificial selection has continuously added to the existing pool of phenotypic variation as same as natural selection fuels the generation of biodiversity on earth. The programs of selection signature have identified hundreds of regions or genes targeted by important traits that formed by natural or artificial selection inchicken8, cattle6, swine11 and sheep12. It’s essential to identify candidate genes of important economic traits used for further studies in goats.

Hair length is a characteristic trait in cashmere goats in comparison to meat and milk goats. The hair length on IMCG (14.92 ± 4.41 cm) used in this study is significantly longer than DBG (3.92 ± 0.76 cm). Hair growth was considered to be regulated by hairless gene, FOXI313, hypertrichosis gene, FGF13, Trps1, Sox914,15, and hair overgrowth gene, ABCA516. However, natural long hair, well described angora phenotype, results from a regulator gene, FGF5, in many kinds of mammals17,18,19,20,21. Here FGF5 have been included in the candidate genes sets based on low heterozygosity and high genetic differentiation in IMCG population (Fig. 4, Supplementary Table 5). We detected three synonymous mutations: g. 92103757C > T (rs653636435), g. 92111741A > T (rs672199363), g. 92124227A > T (rs665580959), and four mutations that were absent from goat dbSNP database, namely g. 92124086T > G, g. 92124131A > G, g. 92124134G > C and g. 92124199C > T within the FGF5 exon sequence. Only g. 92124199G > C candidate mutation could lead to a serine-to-leucine transitionat the 224th site in FGF5 amino acid sequence and need to further verification, although He et al. demonstrated that the expression of FGF5s, a mRNA alternative splicing of FGF5, probably act as main cause of increasing the length of cashmere fiber22.

The seasonal reproduction is one of the most conspicuous characters in goats, which is archived by alternation between long day and short day. Light information received by photoreceptor organ uniquely positioned to eye in mammals is transmitted to the pineal gland through the suprachiasmatic nucleus and then give rise to hormone secretion, especially melatonin23. Consequently, light is one of the important factors affecting goat reproduction, which is important component of efficiency in goat production system24. BIRC6 gene acted as “photoreceptor activity” or “blue light photoreceptor activity” was obtained based on heterozygosity as a result of differences of annual insolation duration between two populations (IMCG, 3000–3400 h; DBG, 1279 h)1 (Supplementary Table 5). Previous research has reported polymorphisms in the BIRC6 gene were associated with hereditary eye diseases, pseudoexfoliative glaucoma25. And the mutations of BIRC6 would result in abnormal embryonic development26,27. Walsh et al. determined blue light was indeed associated with melatonin expression in horses28. Goat reproductive traits are the result of the combined effects of multiple organ, such as, the above-mentioned photoreceptor uniquely positioned to eye, pituitary and even olfactory. The male goats secret pheromone, which is received by the pheromone receptor located at the olfactory epithelium or the vomeronasal organ. The pheromone signal is integrated and then induce hypothalamic-pituitary hormone cascade, finally induce an out-of-seasonal ovulation in anoestrous females, so-called “male effect”29,30. Two terms, “neurohypophyseal hormone activity” and “pheromone activity” (Supplementary Table 6) may probably illustrate the mechanism of this effect. The complement of these analyses may therefore accelerate the pace of improvements of goat genetics.

Goats are browsing animals, digesting crude fiber in a four-chambered stomach compared with monogastric livestock, for example swine. The rumen engaged in lipid metabolism implemented by numerous symbiotic microbialfloras31. The level of lipid metabolism is closely related to fat deposition, such as IKBKG9, LOC102190823 (Supplementary Table 5); muscle mass for example PLD210 (Supplementary Table 5); and milk production32. IDH1 gene (Supplementary Table 5), including in the term, “glutathione metabolism” (Supplementary Table 7) is significantly associated with increased milk citrate content33. Dissecting major-genes of production traits are difficult only depending on sequencing data. The results provided by this study can provide some routes for future researches.

In summary, the study presented here has generated 192.747 G raw data based on whole-genome re-sequencing. A total of more than 5.03 million SNPs and 334,151 Indels were identified by resequencing 12 goat samples. The selective sweep analysis revealed some interesting candidate genes and pathways affected by natural or artificial selection involving reproductive or productive traits. The works performed here provided an important resource for future quantitative trait loci mapping of the goat breeding.

Materials and Methods

Experimental samples

This study was carried out in strict accordance with the recommendations in the Guide for the International Cooperation Committee of Animal Welfare (ICCAW), which is responsible for animal care and use in China. The experimental conditions were approved by the Committee on the Ethics of Animal Experiments of Southwest University (No. [2007] 3). All experimental goats were fed a daily ration of 300–500 gconcentrate and were allowed to an unrestricted access to straw, mineral saltlick and water.

We collected a total of 12 blood samples of unrelated individuals from 6 Dazu black female goats (DBG) and 6 Inner Mongolia cashmere female goats (IMCG) (Fig. 1a). All the goats were housed in Dazu Black Goat Farm at Southwest University, Chongqing, China. Genomic DNA was extracted using a Tiangen DNA isolation kit (Tiangen Biotech, Beijing, China). In addition, we calculated the average hair length of 20 IMCG and 5 DBG, respectively with software SPSS 19.0.

Sequencing and variation calling

Each DNA sample was used to construct 350 bp paired-end sequencing libraries. Each library was sequenced with Illumina HiSeq PE150 platform according to the manufacturer’s instructions at Novogene (Tianjin, China). Raw data were processed to filter out the adaptors and low quality reads resulted in clean reads using BWA software34. After alignment, variation calling was performed per individual using SAMtools package35. All the called variants were annotated using the ANNOVAR package with the gene-based, or region-based, or filter-based optionsfor further analysis36.

Population structure analysis

We used the Cluster 3.0 software for performing PCA analysis with the population scale SNPs37,38. Genetic structure was estimated using the program ADMIXTURE Version 1.2239, which was run at K = 2 on the basis of the whole SNP dataset. The neighbor-joining (NJ) tree was constructed using MEGA 5.2 procedure based on genome-wide SNPs40,41.

Analysis of selective sweep

Selective sweep analyses were performed by calculating heterozygosity (Hp) and population differentiation (Fst). The Hp and Fst values, respectively were converted to a standard normal distribution, denoted as ZHp and ZFst as described42,43. In addition, those regions that have low ZHp value and high ZFst value were scanned as candidates. To know the biological function of genes within candidate regions, the analysis of Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways were performed using Goseq (Bioconductor 2.12) and KOBAS (kobas2.0–20120208), respectively.

Additional Information

How to cite this article: Guan, D. et al. Scanning of selection signature provides a glimpse into important economic traits in goats (Capra hircus). Sci. Rep. 6, 36372; doi: 10.1038/srep36372 (2016).

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.