Genome diversity of Chinese indigenous chicken and the selective signatures in Chinese gamecock chicken

Gamecock chickens are one of the earliest recorded birds in China, and have accumulated some unique morphological and behavioral signatures such as large body size, muscularity and aggressive behavior, whereby being excellent breeding materials and a good model for studying bird muscular development and behavior. In this study, we sequenced 126 chicken genomes from 19 populations, including four commercial chicken breeds that are commonly farmed in China, 13 nationwide Chinese typical indigenous chicken breeds (including two Chinese gamecock breeds), one red jungle fowl from Guangxi Province of China and three gamecock chickens from Laos. Combined with 31 published chicken genomes from three populations, a comparative genomics analysis was performed across 157 chickens. We found a severe confounding effect on potential cold adaptation exerted by introgression from commercial chickens into Chinese indigenous chickens, and argued that the genetic introgression from commercial chickens into indigenous chickens should be seriously considered for identifying selection footprint in indigenous chickens. LX gamecock chickens might have played a core role in recent breeding and conservation of other Chinese gamecock chickens. Importantly, AGMO (Alkylglycerol monooxygenase) and CPZ (Carboxypeptidase Z) might be crucial for determining the behavioral pattern of gamecock chickens, while ISPD (Isoprenoid synthase domain containing) might be essential for the muscularity of gamecock chickens. Our results can further the understanding of the evolution of Chinese gamecock chickens, especially the genetic basis of gamecock chickens revealed here was valuable for us to better understand the mechanisms underlying the behavioral pattern and the muscular development in chicken.


Results and discussion
Sequencing and genomic variant. In the present study, we whole-genome re-sequenced a total of 126 chicken samples, generating a panel of clean data ranging from 9.5 to 60.2 billion base pairs (bp), corresponding to genome coverage ranging from 7.0× to 48.9× (Table S2). Except for the Sample Laos03 (mapping rate = 94.85%), the mapping rate of the other 125 individuals was greater than 97%. The ratio of the genome covered with at least one sequencing base ranged from 89.1 to 93.5%, while covered with at least four sequencing bases ranged from 75.0 to 91.60%.
After incorporating the sequencing data of another public 31 Chinese indigenous chickens, we identified a total of 17,375,012 raw SNPs and 1,726,022 raw InDels via SAMtools, and a total of 43,643,339 raw SNPs and 4,326,184 InDels via GATK pipeline, in which a total of 17,349,501 SNPs and 1,673,029 InDels shared by both SAMtools and GATK pipelines were further identified. Following the filtration criteria in "Materials and methods" (2.2), a total of 10,119,242 genome-wide population SNPs (Table S3; Fig. S1) and 837,787 genomewide population InDels (Table S4; Fig. S2) were obtained. For this set of 10,119,242 genome-wide population SNPs, which composed of 9,463,354 known and 655,888 novel SNPs, and included 9,794,983 autosomal SNPs. Compared with previous whole-genome resequencing studies in Chinese chickens 4,6 , the number of novelSNPs and InDels identified here is relatively smaller, and this is probably because we employed two pipelines together to call the variants and large non-uniformity in terms of sequencing depth existed in 157 samples. For the SNPs abundance in all 22 populations, RJF harbored the highest in terms of both total and novel SNPs. Apart from RJF, in the populations sequenced in this study, TLF gamecock chickens exhibited the highest abundance in terms of both total and novel SNPs, while LH exhibited the lowest. For the 837,787 genome-wide population InDels, it had 382,059 insertions and 455,728 deletions. The abundance pattern across 22 populations in terms of InDels was similar to that in SNPs.

Genome-wide nucleotide diversity and heterozygosity, linkage disequilibrium. Among all 22
populations, we observed the lowest genome-wide π in three commercial populations, lowest in LH chickens ( pop_π = 0.00199), followed by RIR chickens ( pop_π = 0.00216) and WRR chickens ( pop_π = 0.00223) (Fig. 1B). While among the Chinese indigenous chickens, the populations with Muffs and Beard phenotype, including BC chicken ( pop_π = 0.00225), SK chickens ( pop_π = 0.00234) and YOU chickens ( pop_π = 0.00246) harbored the lowest genome-wide π . Across the three Chinese gamecock populations, we observed the highest Scientific RepoRtS | (2020) 10:14532 | https://doi.org/10.1038/s41598-020-71421-z www.nature.com/scientificreports/ genome-wide π in TLF chickens ( pop_π = 0.00333), followed by BN chickens ( pop_π = 0.00316) and LX chickens ( pop_π = 0.00276). Similar to the results in genome-wide nucleotide diversity, we also observed the lowest population heterozygosity (pop_He) in three commercial populations, lowest in LH chickens (pop_He = 0.2114), followed by RIR chickens (pop_Hp = 0.2257) and WRR chickens (pop_He = 0.2337) (Fig. 1C). While the population with muffs and beard phenotype, including BC chickens (pop-He = 0.2376), SK chickens (pop-He = 0.2929) and YOU chickens (pop-He = 0.3010), harbored the lowest pop-He in Chinese indigenous chickens. Among gamecock chickens, there were distinct results of pop_π , BN chickens (pop_Hp = 0.3555) but not TLF chickens (pop_Hp = 0.3299), harbored the highest pop_Hp. Also, we observed the highest level of LD in BC chickens, followed by four commercial populations and SK chickens, while the lowest was recorded in RJF chickens. Besides, the three Chinese gamecock chickens showed a low level in LD, highest by TLF chickens, followed by LX and BN chickens (Fig. 1D). Except three breeds (BC, SK, and YOU) with muffs and beard phenotype, the other Chinese indigenous chickens have undergone much less intensive artificial selection compared with commercial chickens, which are basically consistent with previous findings 4,7,8 . Surprisingly, as one of the earliest recorded chicken populations in China 3 , Chinese gamecock chickens have been supposed to be under strong artificial selection since they have been consistently selected for cockfighting. But in the present study, similar to most Chinese indigenous chickens, the gamecock chickens also harbored relatively low levels in terms of nucleotide diversity and LD, and high level of heterozygosity, suggestive to a limited number of genes possibly underlying the signatures observed in gamecock chickens.
Genome-wide genetic differentiation. We observed the highest levels (mean weighted Fst > 0.2) of genetic differentiation between each commercial population and Chinese indigenous chickens, especially between LH chickens and BC chickens (Weighted Fst = 0.4365) ( Table 1). Between the Chinese indigenous chicken populations, a higher level of genetic differentiation occurred between the three populations with Muffs and Beard phenotype (BC, SK, and YOU chickens) and others (Table 1), which were all higher than that of RJF chickens against other Chinese indigenous chickens (P < 0.01), suggestive of the driving force of this unique   1A). This does not totally support the argument given by Nie et al. 7,8 , that a critical role in shaping the genomic variation within Eurasia continent chicken breeds might have been played by isolation based on distance. We propose strong artificial selection, together with isolation by distance are the main driving forces in shaping chicken genomic variation.
Population genetic structure analysis unveiled a high level of admixture across Chinese gamecock chickens. According to the neighbor-joining tree, all the 157 chickens from 22 populations could be separated into three clusters ( Fig. 2A). Of them, Cluster 1 included four commercial populations, seven ZJ, and eight RJF individuals; Cluster 2 included the populations of four gamecock populations and YNVC chickens, and the individuals of 3 ZJ and 2 RJF, suggestive of a possible same origin of the Chinese gamecock chickens included here; while the remaining 11 Chinese indigenous populations composed the Cluster 3. Consistent with the previous study 4 , RJF and ZJ chickens could be separated into two different clusters in the present study. Except for the populations of RJF, ZJ, YNVC and Laos gamecock chickens, all the left populations could be separated into its own clade. However, the PCA results could not completely reproduce the phylogenetic relationships. The top two PCs (4.31% and 3.45% variances explained totally, respectively) could separate the four commercial populations from non-commercial populations (Fig. 2B). Especially, the 10 RJF chickens here were genetically classified very near the Chinese indigenous chickens but away from commercial chickens, which is consistent with the study of Wang et al. 2 , and probably due to that these RJF chickens belong to G. g. spadiceus. LH as a population harboring the highest level of genetic differentiation with others (Table 1), its unique genetic variation pattern could also be evidenced in the top two PCs being an independent cluster from others. But the four gamecock populations, together with YNVC, 3 ZJ, and 2 RJF chickens could not be well grouped into one cluster as indicated by the above phylogenetic analysis, suggestive of a complex genetic structure of them. Within each population, we observed 3 and 2 outlier samples in ZJ and RJF chickens, which were also revealed in the above phylogenetic analysis. Besides, it was hard to separate the populations of Laos, BN, ZJ and, YNVC from each other, suggestive of potential admixture among them.
To infer the admixture degree across 157 samples, we further performed an unsupervised Admixture analysis, with K run from 2 to 16. We found at K = 2, consistent with the above PCA result (Fig. 2B), genetic divergency first occurred between commercial populations and non-commercial ones. Except for BC and SK (except SK06), a potential widespread genetic introgression from commercial populations to other Chinese indigenous ones was observed across K = 2 to K = 5 ( Figure S3), inclusively. As suggested by the cross-validation errors ( Figure S4), K = 6 was the best assumed genetic groups in this study. At K = 6 ( Fig. 2C), LH, WRR, SK (except SK06) and BC chickens, these four populations could still keep distinct; RS and RIR chickens formed another group; While www.nature.com/scientificreports/ the last group was mainly represented by RJF chickens. For the gamecock chickens, they genetically appeared to be the admixture of RJF, Chinese indigenous and commercial chickens. Clearly, except for SK (excluding SK06) and BC chickens, we could still observe a potential widespread genetic introgression from commercial chickens to most Chinese indigenous chickens, at K = 6, which agrees with the previous study 8 .

TreeMix analysis revealed evidence of gene flows from LX gamecock chickens into other gamecock chickens. Given that a potential widespread introgression from commercial chickens to most
Chinese indigenous breeds has been suggested by above Admixture analysis, and to better understand the historical relationship within the 22 populations, we further employed TreeMix to reconstruct a maximum likelihood (ML) tree, in which it allows both populations split and migration events. We found the inferred migration edges at seven could return the smallest residuals ( Figure S5), thus being the best fit for our data. In this ML tree (Fig. 3), two gene flows from LH chickens into two Chinese indigenous breeds, including ZJ and LD chickens could be evidenced, which conformed with the Admixture results of that a potential widespread introgression from commercial chickens into Chinese indigenous chickens. Noticeably, among the three gene flows between Chinese indigenous breeds, two of them both indicated the gene flows from LX gamecock chickens into the other three gamecock breeds. LX gamecock chicken can be dated back to 700 bc 3 , and is one of the earliest documented Chinese indigenous chicken breeds, conferring it more advantages in the utilization of cockfighting. This may together suggest a core role played by LX gamecock chickens in recent breeding and conservation of Chinese indigenous gamecock chickens.
Severe confounding effect on selective signatures of cold adaptation exerted by genetic introgression from commercial chickens to Chinese indigenous chickens. We observed strong artificial selection that has been undergone in commercial populations here, which exhibited lower nucleotide diversity, lower heterozygosity and higher LD level within populations, and strikingly high genetic differentiation with Chinese indigenous chickens, at the genome-wide level. More importantly, Admixture analysis inferred a potential widespread genetic introgression from commercial chickens into Chinese indigenous chickens except for BC and SK, which can be also partially evidenced. This means any selective signatures especially presented by natural selection to be identified in Chinese indigenous populations will be probably extremely confounded by www.nature.com/scientificreports/ strong artificial selection undergone in commercial chickens. We argued here that the genetic introgression from commercial chickens into indigenous chickens should be quite seriously considered when identifying selective signatures presented in indigenous chickens, in terms of natural and artificial selections.
Given an example concerning to potential cold adaptation in Chinese high-latitude indigenous chickens, this confounding effect exerted by genetic introgression from commercial chickens could be observed by performing correlation analysis between the average eigenvalues (population eigenvalues) of each Chinese indigenous population from PC1 to PC10 (Raw data was from above PCA; Table S5) and the corresponding inhibiting extreme temperatures of each population in winter (Table S1), population eigenvalues of the Chinese indigenous in PC2 (3.45% variance explained totally) was found to be strongly positively (Correlation = 0.643; P = 0.005) correlated with temperature index (Fig. 4A), whereas in this scenario it suggested RIR (Eigenvalue = − 0.1728), WRR (Eigenvalue = − 0.1542) and RS (Eigenvalue = − 0.1192) chickens should be best-adapted to cold. Besides, population eigenvalues of the Chinese indigenous in PC7 (1.98% variance totally explained) and PC6 (2.07% variance totally explained) were found to be moderately positively (Correlation = 0.511; P = 0.036) and negatively (Correlation = − 0.445; P = 0.076) correlated with temperature index (Fig. 4B,C) respectively. Similarly, WRR (Eigenvalue = − 0.0282) and RS (Eigenvalue = 0.2955) chickens would separately be the best-adapted to cold in these two scenarios. Considering that, a potential widespread genetic introgression from commercial chickens to Chinese indigenous high-latitude chickens has been observed and it will be hard to conclude that the coldrelated variation to be identified from the Chinese indigenous chickens inhabiting in extreme temperature in winter is not because of genetic introgression from commercial chickens.

Selective signatures in gamecock chickens.
After removing 198 and 1,326 windows with SNP number < 5, 92,010 and 91,940 were retained in subsequent statistics of Fst values and Hp scores respectively. With the threshold of top 1% outliers of windows being the putatively selective genomic regions, we identified 920 genomic regions in both ZFst (threshold score > 3.925) and ZHp (threshold score < − 2.251) analyses. This threshold proved to be robust enough to detect the genomic regions under selection in gamecock chickens after checking the distributions of ZFst value and ZHp score of each window along the autosomes ( Figure S6). We further annotated the candidate genomic regions above, allowing us to identify 169 and 165 candidate genes in terms of ZFst and ZHp analyses respectively (Tables S6 & S7). However, only 31 genes were shared by both ZFst and ZHp analyses ( Figure S7). In a previous report by Guo et al. concerning selective signatures in BN gamecock chickens 11 , which was also based on Fst (BN_vs_RJF) and Hp (within BN population) analyses (threshold: top 5% outliers), 343 candidate genes were identified (Table S8). While in the present study, we could just re-identify only 53 genes out of the earlier reported 343 genes ( Figure S7), indicating that most of the selective genes previously identified were possibly the common ones by artificial selection during chicken domestication or biased by genetic drift. For instance, CBFB, GRHL3, Gli3, BDNF, NTS, GNAO1 and SDHB as seven highlighted autosomal candidate selective genes were previously identified. Here, we just detected strong selective signals in Gli3 (Table S9). Especially for BDNF, a gene involving the nervous system and aggressive behavior 21,22 , its selective signals in our study were very weak (Fig. 5). Further gene function annotation on the putatively selected genes from ZFst showed no biological processes (BPs) or KEGG pathways could be significantly enriched in, while for  (Table S10). In particular, candidate selective genes, including APP, EGFR, MAP3K7, TCIM, CALR, could be significantly (Adjusted P value = 0.049) enriched in positive regulation of NIK/ NF-kappaB signaling, which concerns any process that activates or increases the frequency, rate or extent of NIK/NF-kappaB signaling. Importantly, NIK/NF-kappaB signaling is closely associated with immunity, and its loss function can induce a primary immunodeficiency with multifaceted aberrant lymphoid immunity 23 . These candidate selective genes may be conducive to the inflammation control of gamecock chickens in the context of their frequent fighting. In this study, the sweeping loci with the highest ZFst values and much lower ZHp scores were observed from Chromosome 2:27,910,001-28,410,000 (Fig. 5A,B), within which AGMO (Alkylglycerol monooxygenase), MEOX2 (Mesenchyme homeobox 2), and ISPD (Isoprenoid synthase domain containing) could be further identified (Fig. 6A). Only AGMO and ISPD exhibited a moderate level of linkage disequilibrium between each other (Figure S8), and two shared long-range haplotypes across gamecock chickens could be observed in AGMO (Fig. 6B) and ISPD (Fig. 6C), respectively, suggestive of strong sweeps of these two genes in gamecock chickens. Among those SNPs detected across the genomic regions of AGMO and ISPD, there were two possibly damaging and three probably benign missense mutations (Table S11). Particularly, the possibly damaging missense mutation (p.Ala312Thr) in exon 8 of AGMO was nearly fixed in non-gamecock chickens (fixation degree = 92.2%), but much less fixed in gamecock chickens (fixation degree = 30.2%) (Fig. 6D). While, the probably benign missense mutation (p.Arg84Lys) in exon 2 of ISPD was likely to be highly selected and favored in gamecock chickens, with a fixation degree reaching 91.0% (Fig. 6E). Further, conservativeness analysis of ISPD amino acid sequence across all 38 available avian species showed that the missense mutation (p.Arg84Lys) was conservative in birds and  www.nature.com/scientificreports/ the missense Lys (K) detected in gamecock chickens could be just detected in Common Ostrich and American Crow ( Figure S9). Interestingly, ostrich is the fastest living bipedal runner and possesses a muscular pelvic limb 24 .
As the only enzyme known to cleave the O-alkyl bonds of ether lipids (alkylglycerols), the missense variants of AGMO can induce microcephaly and neurodevelopmental disabilities in human beings 25,26 . For ISPD, it encodes a 2-C-methyl-d-erythritol 4-phosphate cytidylyltransferase-like protein, and it is essential for the glycosylation of α-dystroglycan in fibroblasts 27,28 . ISPD overexpression can independently or act synergistically with ribitol to improve dystrophic phenotype 29 . Its loss-of-function mutations can disrupt dystroglycan O-mannosylation, causing Walker-Warburg syndrome, which is defined as congenital muscular dystrophy and accompanied by a variety of brain and eye malformations 30 . Considering reasonably the above results together allow us to propose the variations in AGMO, and ISPD may play important roles in shaping the behavioral and muscular signatures of gamecock chickens observed respectively. Especially, the selective ISPD missense mutation of Arg84Lys (ENSGALT00000017557.5) in gamecock chickens, is possibly advantageous for the muscular development of gamecock chickens.
To further identify the genomic regions under selection in gamecock chickens concerning aggressive behavior, we mapped the candidate selective genomic regions in gamecock chickens to the chicken aggressive behavior quantitative trait loci (QTL) database (https ://www.anima lgeno me.org/cgi-bin/QTLdb /GG/trait map?trait _ID=2402). Thus, we discovered that the genomic region covering CPZ (Carboxypeptidase Z) was the only common one, for which it has been identified to be significantly associated with chicken fighting times 31 , and missense mutation within this gene can induce neuroblastoma in human beings 32 . Further haplotype homozygosity pattern analysis of genomic region covering CPZ across 157 chickens showed that a long-range haplotype was shared by gamecock chickens compared with RJF, Chinese indigenous and commercial chickens, suggestive of a strong selective sweep of this region (Fig. 7A,B). Additionally, we also identified three missense mutations from CPZ genomic region across 157 chickens, one from exon 2 (p.Ala34Thr) and two from exon 11 (p.Thr610Ala; p.Gln616Arg), with fixation degrees reaching 72.9% and 69.6% from exon 1 and exon 2 in gamecock chickens respectively (Fig. 7C). However, we could not exclude the hitchhiking effect on CPZ selection exerted by a downstream genomic region of Chr4:81,840,001-81,873,000. This downstream genomic region of CPZ harbored the highest genetic differentiation between gamecock chickens and others, within 500-kb upstream and downstream genomic regions of CPZ (Fig. 7A). Further LD analysis on those SNPs between CPZ (n = 247) and it is a downstream highly differentiated genomic region (n = 98) revealed some SNPs from these two genomic regions were at a high level of linkage disequilibrium (Fig. 7D). Collectively, these results above indicate that the variations in CPZ probably involved the aggressive behavior observed in gamecock chickens.
Furthermore, we could also identify SOX5 (SRY-box 5), NELL1 (Neural EGFL like 1), KCNMA1 (Potassium calcium-activated channel subfamily M alpha 1), IGF-I (Insulin like growth factor 1) and IGF2BP1 (Insulin like growth factor 2 mRNA binding protein 1), harbored strikingly higher ZFst values and/or lower ZHp scores ( Fig. 4; Table 2), suggestive of strong selective sweeps of these genes in gamecock chickens. Except for IGF2BP1, another above five genes were previously reported by Guo et al. as well 11 , highlighting their selective sweeping consistency in gamecock chickens. Among them, SOX5 has proved to be the causative gene underlying pea-comb in chicken 33 , probably explaining the pea-comb phenotype commonly observed in Chinese gamecock chickens. IGF2BP1, a gene closely associated with body size and growth in ducks 34 , together with the previously reported IGF-I, they probably have played important roles in determining the large body size of gamecock chickens.

conclusions
In conclusion, we here characterized the genome diversity, linkage disequilibrium pattern, genetic differentiation, population structure and migration events, across the 157 chickens (126 ones sequenced here) from 22 populations, and re-identified the selective signatures in gamecock chickens with potential confounding effects exerted by introgression and genetic drift fully considered. Our results showed that the Chinese indigenous chickens except those breeds having muffs and beard phenotype were less intensively selected, and a widespread introgression from commercial chickens into them might have occurred, for which it could have severely confounded the selection footprints in indigenous chickens, such as cold adaptation. Importantly, we identified AGMO and CPZ might be crucial for determining the behavioral pattern, while ISPD might be essential for the muscularity observed in gamecock chickens. These results together can facilitate conservation of the 13 canonical Chinese indigenous breeds, and the genetic basis of gamecock chickens revealed here is valuable for us to better understand the mechanisms underlying the behavioral pattern and the muscular development in chicken.

Materials and methods
Sampling and genome sequencing. A total of 126 blood samples from 19 chicken populations were collected from 19 populations which composed of 13 Chinese nationwide indigenous chicken breeds, including six Huiyang Bearded chickens (BC), nine Xinghua chickens (XH), six Hetian chickens (HT), six Baier Yellow chickens (BEH), 11 Silkies (SK), six Xianju chickens (XJ), six Liyang chickens (LY), six Jining Bairi chickens (BR), six Yunyang Da chickens (YY), ten Beijing You chickens (YOU), six Lindian chickens (LD), ten Luxi gamecock chickens (LX) and six Tulufan gamecock chickens (TLF) (Fig. 1A); four typical commercial populations, including six White Leghorn chickens (LH), six White Recessive Rocks (WRR), six Cobb RS308 chickens (RS) and six Rhode Island Reds (RIR); one Red jungle fowl population from Guangxi Province (five individuals, RJF) and one gamecock population from Laos (three individuals, Laos). Genomic DNA was further extracted from the collected blood samples using NRBC Blood DNA Kit (Omega Bio-Tek, Norcross, GA, USA) following the manufacturer's instruction, and the quality of the extracted Genomic DNA was tested using Nanodrop 2000 spectrophotometer at 260/280 nm ratio (NanoDrop Inc., Wilmington, DE, USA). To provide a more comprehensive understanding and profound insight into the genome diversity of Chinese indigenous chickens and  Table S1). More than 3 μg of genomic DNA from the above samples were used to construct a paired-end sequencing library with an insert size of approximately 350 bp following the manufacturer's instructions, thereby being sequenced on Illumina HiSeq X Ten and HiSeq 2000 platforms (Illumina, San Diego, CA, USA) at Novogene Co., Ltd (Beijing, China) and Beijing Institute of Genomics (Beijing, China). After removing the sequencing paired-end reads with adaptors, N content ratio > 10% and low-quality base ratio (Q ≤ 5) > 50%, clean reads were retained for subsequent genome mapping and variant calling.
Genome mapping, variant calling and annotation. Firstly, we used Burrows-Wheeler Aligner (BWA) version 0.7.15 to map the clean sequencing reads to the Gallus gallus 5.0 reference genome (https ://www.ncbi. nlm.nih.gov/assem bly/GCF_00000 2315.4/) 41 , generating the Sam file for each sample. SAMtools version 1.3 42 , was then used to filter out the unmapped and non-unique reads from the above Sam files following the command "rmdup" and generate the corresponding BAM format files. Meanwhile, Picard version 2.9.0 (https :// broad insti tute.githu b.io/picar d/) was employed to sort the SAM files into coordinate order and further saved as binary alignment map files (BAM files), followed with duplicate reads marked and BAM files indexed. We here utilized SAMtools version 1.3 and GATK version 3.7.0 43 , simultaneously to detect SNP and InDel at a population level, only with the SNPs and InDels detected by both pipelines kept for further analysis. For SAMtools calling, raw SNPs and raw InDels were called using the SAMtools mpileup package with default parameters. Before GATK calling, we performed a step of base quality score recalibration to get more accurate base qualities, in which a set of over 14 million known chicken SNP data from Ensembl database (ftp://ftp.ensem bl.org/ pub/relea se-94/varia tion/gvf/gallu s_gallu s/) was used together with GATK version 3.7.0 "BaseRecalibrator" to generate the recalibrated BAM files. Further, the engine "Unifiedgenotyper" in GATK (default settings) was employed to call the raw SNPs and InDels. Finally, the common sites of SNP/InDel identified by both SAMtools and GATK were retained, and the SNPs were further submitted to VCFtools version 0.1.14 44 , for quality control using the following filtration criteria: (1) max-missing 0.1; (2)-maf 0.05; (3)-minQ 20; (4)-min-meanDP 5; (5) max-meanDP 1,000; (6)-minGQ 5, in which the SNPs and InDels sites with missing data < 0.1, minor allele frequency (MAF) > 0.05, quality value > 20, mean depth values between 5 and 1,000, and genotype quality above 5 were kept for subsequent analyses. To annotate the SNPs and InDels identified here, ANNOVAR (Version: 2013-05-20) was employed 45 . Considering our samples which consisted of males and females, we further extracted the autosomal SNPs for genetic differentiation, pooled-heterozygosity, LD, population genetic structure and selective sweep analyses at the genome-wide level to avoid non-stochastic effects.  Step size: 20 Kb) following the formula given by Rubin et al. 46 : Haploview version 4.2 47 , was used to evaluate the genome-wide linkage disequilibrium pattern within each population, with arguments "-maxdistance 500;-minMAF 0.05;-binsize 100" employed. Also, it was utilized to infer the square of the correlation coefficient (r 2 ), haplotype structure and frequency for some specific genomic regions presented in this study.
Considering the Laos gamecock population that had only three individuals here, we didn't consider its related results in terms of the above analyses. The genome-wide nucleotide diversity ( pop_π ) and the heterozygosity of each population ( pop_He ) were measured with the mean values of all windows' π and Hp. Genetic differentiation. For the genetic differentiation between each population, VCFtools version 0.1.14 was used to calculate the pairwise Fst values between each population 48 , with a window size of 40 Kb and a step size of 20 Kb.
Population genetics analysis. We used TreeBeST version 1.9.2 software 49 , to calculate the distance matrix and thus constructed a neighbor-joining tree (bootstrap values = 1,000) with all identified autosomal SNPs. Before performing the Principal component analysis (PCA) and Admixture analysis, all population autosomal SNPs were firstly LD-based pruned using Plink version 1.9 50 , (https ://pngu.mgh.harva rd.edu/purce ll/ plink /) with the option "indep-pairwise 50 5 0.5" employed. Based on the pruned population SNP data, we then performed Principal component analysis (PCA) and unsupervised Admixture analysis to assess the population's genetic structure. For the PCA, smartpca program in Eigenstrat version 6.1.4 51 , was adopted with the explained variance given in according to its corresponding eigenvalue proportion in the sum of eigenvalues. Meanwhile, Admixture version 1.3.0 52 , was run with K = 2 to K = 16, along with their corresponding cross-validation errors (default setting used) calculated, respectively.
To estimate the potential impact exerted by extreme temperature in winter on Chinese indigenous chickens, mean eigenvalues of each population (population eigenvalue) at each principal component (PC) were calculated. Pearson correlation analysis was then performed between the extreme temperature of each Chinese indigenous population and its corresponding population eigenvalue at each PC.
TreeMix analysis. We used TreeMix software 53 , to infer the historical relationships of the 22 chicken populations included here. We ran TreeMix with migration events given from 1 to 10, and generated their corresponding residual matrix, with options "-noss" and "-k 500" used. A tree with the smallest residuals was to be the best fit for the data. Considering the wild population (RJF chickens) included here could not be grouped into the same cluster in the phylogenetic analysis, we did not root the maximum likelihood tree.
Genome-wide selective sweep analysis. We employed two methods here, including calculating the genetic differentiation (Fst) between gamecock chickens (TLF, LX, Laos, and BN chickens) and non-gamecocks (chickens except RJF and gamecock chickens) and the pooled-heterozygosity score (Hp) within gamecock chick- www.nature.com/scientificreports/ ens upon sliding windows, to identify the genomic regions under selection in gamecocks population. Considering the gamecock populations harbor relatively high genome nucleotide diversity and heterozygosity in chickens, we narrowed down the window size and step size to 20 Kb and 10 Kb when calculating both Weir-Fst value and Hp score of each window. We eliminated the windows with SNPs less than 5 to ensure detective accuracy. The top 1% outliers of bins were regarded as the putative genomic regions under selection, and further annotated using Ensembl BioMart tool (https ://oct20 18.archi ve.ensem bl.org/bioma rt/martv iew/fcee6 700cd e0db9 59bc3 0ef4f c9d83 9a). Those putatively selected genes from each method were then submitted to gProfiler (https ://biit.cs.ut.ee/gprofi ler/gost) for function enrichment analysis with options "Organism: Gallus gallus" and "User threshold: 0.05". Both the Fst value and Pooled-heterozygosity score of each bin were Z-transformed according to the formula below and further Manhattan-plotted with in-house R scripts: PANTHER version 11.0 (https ://www.panth erdb.org/tools /csnpS coreF orm.jsp?) 54 , was employed to estimate the likelihood of nonsynonymous (amino-acid changing) coding SNPs to cause a functional impact on the proteins of ISPD, AGMO, and CPZ.
Research ethics statement and data availability. All the animal experiments used in the present study were approved by the South China Agricultural University Institutional Animal Care and Use Committee (Approval number: 2015-A003; Guangzhou, People's Republic of China), and were handled strictly in compliance with the guidelines of this committee.
The genome sequencing raw data has been uploaded into the NCBI SRA database with the accession number SAMN14651083.