Genomic scans for selective sweeps through haplotype homozygosity and allelic fixation in 14 indigenous sheep breeds from Middle East and South Asia

The performance and productivity of livestock have consistently improved by natural and artificial selection over the centuries. Both these selections are expected to leave patterns on the genome and lead to changes in allele frequencies, but natural selection has played the major role among indigenous populations. Detecting selective sweeps in livestock may assist in understanding the processes involved in domestication, genome evolution and discovery of genomic regions associated with economically important traits. We investigated population genetic diversity and selection signals in this study using SNP genotype data of 14 indigenous sheep breeds from Middle East and South Asia, including six breeds from Iran, namely Iranian Balochi, Afshari, Moghani, Qezel, Zel, and Lori-Bakhtiari, three breeds from Afghanistan, namely Afghan Balochi, Arabi, and Gadik, three breeds from India, namely Indian Garole, Changthangi, and Deccani, and two breeds from Bangladesh, namely Bangladeshi Garole and Bangladesh East. The SNP genotype data were generated by the Illumina OvineSNP50 Genotyping BeadChip array. To detect genetic diversity and population structure, we used principal component analysis (PCA), admixture, phylogenetic analyses, and Runs of homozygosity. We applied four complementary statistical tests, FST (fixation index), xp-EHH (cross-population extended haplotype homozygosity), Rsb (extended haplotype homozygosity between-populations), and FLK (the extension of the Lewontin and Krakauer) to detect selective sweeps. Our results not only confirm the previous studies but also provide a suite of novel candidate genes involved in different traits in sheep. On average, FST, xp-EHH, Rsb, and FLK detected 128, 207, 222, and 252 genomic regions as candidates for selective sweeps, respectively. Furthermore, nine overlapping candidate genes were detected by these four tests, especially TNIK, DOCK1, USH2A, and TYW1B which associate with resistance to diseases and climate adaptation. Knowledge of candidate genomic regions in sheep populations may facilitate the identification and potential exploitation of the underlying genes in sheep breeding.

www.nature.com/scientificreports/ Study of population structure gives information on anthropogenic activities and historical processes that have influenced recent gene pools and the genetic relationships among breeds (Ju et al. 2019). Population structure among breeds can be studied using principal component analysis (PCA), admixture and phylogenetic analyses. A range of demographic forces and evolutionary trends affects linkage disequilibrium (LD) patterns on the genome 5 . The LD patterns provide good historical information on the population demography.
Natural and artificial selections leave patterns on the genome that result in differences in allele frequencies among populations 6 . If the selection pressure is high at the level of an individual locus, the frequency of the selected variant increases. In addition, selection will change the diversity pattern around the selected variant through genetic hitchhiking, known as a selective sweep 7 . As a result, different genetic variations and various haplotype structures are fixed over time within separated subpopulations, leading to a wide range of farm animal breeds and distinct genetic populations 8 . Selective sweeps detected in livestock breeds can add to new information about their population history.
Several methods have been developed to scan genome-wide selective sweeps 9 . Most of the methods are based on: (1) increases in derived allele frequency and decreases in genetic variation near a selective sweep (hitchhiking) within a population, (2) haplotype length and structure measured by extended haplotype homozygosity (EHH) or EHH-derived statistics, and (3) the differentiation of genetic populations measured by F ST or related statistics 10 .
To capture any signal in the genome, depending on the number of populations, temporal context scale, and type of selection signatures more than one method is often needed 6 . Therefore, we implemented four complementary statistical tests, F ST , FLK (the extension of the Lewontin and Krakauer), xp-EHH (cross-population extended haplotype homozygosity), and Rsb (extended haplotype homozygosity between-populations). We studied selection signature in 14 indigenous sheep breeds from Iran, Afghanistan, India and Bangladesh, the four neighboring countries located in the Middle East and South Asia having more than 80 indigenous sheep breeds adapted to diverse ecological conditions. The selection signatures can illuminate selection patterns at the genome level of these indigenous sheep breeds, from adaptation to local environment and selection by breeders to improve production.
Genotype quality control. OvineSNP50 BeadChip (Illumina, San Diego, CA, USA) was used to genotype animals. The SNP location information was taken from the Illumina Oar_v4 assembly, retrieved from SNPChiMp v.3 12 .
The genotype data from different breeds were merged using PLINK 13 . We excluded the SNPs located on sex chromosomes and those with unknown chromosomal position. The quality control was performed using We performed a PCA to investigate the population structure and to check whether samples for a breed came from a homogeneous population. PCA was done for the 14 sheep breeds using the smartpca program, which is part of EIGENSOFT 7.2.1 18 .
Linkage disequilibrium (mean of r 2 ) among SNPs was estimated for the breeds using PopLDdecay v1.01 software, and a Perl script was applied to visualize the results 19 .
Admixture analysis. For admixture analysis, quality filtered genotype data were pruned using PLINK based on LD. In a sliding window of 50 SNPs, LD pruning was carried out, moving the window in steps of 5 SNPs at a time, and removing all SNPs within each window exceeding the 1.7 variance inflation factor (VIF) threshold (-indep 50 5 1.7). VIF is known as 1/(1 − r 2 ), with r 2 being the correlation of the squared inter-variant allele count 20 .
We analyzed ancestry using ADMIXTURE v1.3.0 to infer breed origins and quantify the populations' admixture 21 . For a priori defined ancestry component (K), individual ancestry proportions were calculated with ADMIXTURE v1.3.0, which was an assumption of the number of ancestral populations 20 . Using 14-fold cross-validation for K values ranging from 2 to 14, admixture analysis was performed. To identify the most likely number of ancestral populations, the lowest 14-fold cross-validation error was applied. Finally, the admixture graphs were visualized using the R package BITE 22 .

Runs of homozygosity.
Runs of homozygosity (ROHs) were studied for all 14 breeds. Using the R package "detectRUNS" 23 , ROHs were calculated. The sliding window method was applied to calculate ROH segments 24 . Conditions used to detect segments of ROH were: sliding window size (windowSize = 15 SNPs), minimum number of homozygous SNPs in a run (minSNP) = 20, threshold of windows overlapping, homozygous (threshold) = 0.05, minimum number of SNP per kbps (minDensity) = 1/103, maximum distance between two SNPs (maxGap) = 106 bps, and the minimum length of a homozygous run (minLengthBps) = 250,000 bps. By default settings defined in the detectRUNS package, the ROHs detected were divided into five categories. (0 to < 2 Mb, 2 to < 4 Mb, 4 to < 8 Mb, 8 to < 16 Mb and ≥ 16 Mb). For each of the ROH length categories, the mean ROH sum per breed was determined by summing up all the ROHs per animal in that category and by averaging them per breed. The individual genomic inbreeding coefficient (FROH) was also determined as follows: where L ROH is the total length of all ROHs observed in an individual's genome and Lgenome is the sum of the length of the autosomes 23 .
Selection sweep, gene annotation, and functional analysis. Neighbor-joining tree and PCA analysis divided the sheep populations in three distinct categories, IR (contains AFS, MOG, QEZ, ZEL, LOR breeds), IN (contains BGA, BGE, GAR, CHA, IDC breeds), and AF (contains IBL, ARB, BLO, GDK breeds) Table 1. Therefore, we compared pairwise these three categories for selective sweeps analysis. The F ST analysis is a widely used approach to identify genetic differentiations between populations compared to the within-population polymorphic frequency 25 . We performed the F ST to identify genomic regions under increasing differentiation using VCFtools v0.1.15 26 . For each comparison, the mean of F ST value was computed in all 39,348 SNPs. Z transformation of the mean of F ST values (Z(F ST )) was performed using the "scale" command in R software.
The FLK test is an extension of the original Lewontin and Krakauer (LK) statistic 27 . It calculates a population differentiation statistic, which includes a kinship matrix representing the relationship between populations 28 . This test accounts for population structure and differences in the effective population size by modeling the genetic divergence between populations as a result of drift and population division 29 .
For FLK analyses, p-values were computed as explained in the hapFLK software documentation 30 . For each comparison, the negative log p-value was calculated using the hapFLK R script 30 , and the candidate genomic regions under selection were plotted.
Extended haplotype homozygosity (EHH) detects selection signatures by comparing a high frequency and extended homozygosity based haplotype with other haplotypes at the selected locus 31  www.nature.com/scientificreports/ between two populations at the same SNP 31 . The xp-EHH test has a high power to detect selection signatures in small sample sizes, and therefore grouping of genetically similar breeds may help in gaining power 32,31 . Rsb test to identify selective sweeps is based on the same idea of estimation of EHH as xp-EHH test. However in contrast to xp-EHH test, it does not require phasing information 28 . Rsb compares the EHH patterns of the same allele between populations instead of comparing the EHH between alleles within one population, analogous to other statistics that are often focused on contrasting genetic variation between populations 33 .We used the xp-EHH and Rsb approaches 33,34 to determine selected alleles with higher frequency than expected according to their haplotype length to obtain recent and generally segregating selective sweeps. The haplotypes were phased with Beagle 14 , and then xp-EHH and Rsb scores were calculated for each haplotype within a population. Haplotype frequencies were computed for 39,348 SNPs. For each locus, the xp-EHH and Rsb score were calculated using the rehh package 35 in R and the candidate genomic regions under selection were obtained.
For each test, the genes that were considered as candidates were found within the intervals spanning the candidate genome regions and also overlapping candidate genes among the tests were captured using the Ovis Oar_v4 reference genome assembly in the Ensembl 36  The biological enrichment and functional annotation of the genes under selective pressure were defined using Gene Ontology Consortium (http://geneo ntolo gy.org).

Results
Populations and genotype data. After quality control and imputation of missing genotypes from 463 individuals genotype data for 39,531 SNPs from 14 sheep breeds Table 1, 453 individuals and 39,348 SNPs remained for analysis. In details 10 individuals removed due to missing genotype data (-mind), 180 SNPs removed due to missing genotype data (-geno), and 3 SNPs were removed due to minor allele frequency (-maf).
Population genetic structure and linkage disequilibrium. The Neighbor-joining phylogenetic tree analysis divided the 14 breeds into three main branches, IR, IN, and AF. The IR group included AFS, MOG, QEZ, ZEL, and LOR, in a main branch Fig. 1, blue color), which illustrated close relationships in the blue branch. These five breeds are from mountainous and forest areas with cold and temperate climates of Iran. The AF group has two distinct sub-branches, one for the three Afghan breeds (ARB, BLO, GDK), and the other own for the Iranian IBL breed Fig. 1, red color). The IBL sheep is from a hot dry climate in the south-eastern deserts of Iran, bordering Afghanistan and Pakistan and therefore IBL is geographically closer to Afghan breeds than the other Iranian breeds in this study. The ARB, BLO, and GDK breeds formed a dense sub-branch that indicates their   Fig. 1, green color). In this branch, two Bengal breeds (BGA and BGE) and GAR formed a distinct cluster, and two other Indian sheep breeds were placed in two separate clusters. The GAR and BGA which are both named Garole breed live in West Bengal state of India and Bangladesh, respectively however, some breeding isolation between them occurred. Therefore, a close genetic relationship between these two breeds is expected. The LD patterns among the IR and IN groups indicated that the mean of correlation coefficient values (r 2 ) in both groups dropped rapidly at approximately 10 Kb while the AF group showed a slower drop and its r 2 values at 50 Kb was higher than the other groups (Supplementary Figure S3). The average r 2 at 250 Kb for the IR, IN and AF breeds were 0.0351, 0.0230 and 0.0693, respectively. There was a big difference in r 2 values more than 100 Kb between (IR and IN) and AF.
PCA results Fig. 2 also indicated close relationships within the IR, IN, and AF groups and supported separation into the three broad geographic groups that were identified by the neighbor-joining tree Fig. 1. Although the breeds clustered according to geographic origin, a gradient based on the geographic distance was less pronounced Fig. 2. In addition, the first principal component (PC1), explaining 13.3% of the total genetic variation among breeds, clearly separated the IR and IN breeds from the AF breeds, thus forming two clusters. Along with the PC1 projection spectra, both IBL and Afghan breeds formed the AF group but a large genetic variation are shown between them. Among the AF breeds, IBL is clearly distant from the other breeds and supported the phylogenetic results. The subclusters of MOG, GEZ, and AFS breeds overlapped, indicating a close relationship and possible admixture of these breeds from the same region in north-western Iran. The LOR breed clearly distant from the other IR breeds which show geographic distance between the LOR from the west and south-western of Iran and the other IR breeds from the north and north-western of Iran. The patterns of genetic variation observed for the AFS, MOG, and GEZ breeds suggested a recent admixture between these three Iranian breeds.
PC2, explaining 6.8% of the total genetic variation, separated the Afghan breeds from the other breeds, but it did not clearly show geographic distance between the IR and IN breeds (Supplementary Figure S1). PC3, explaining 3% of the total genetic variation, separated the IR from the IN and also showed close genetic relationship among two Bengal breeds and GAR, while genetic distances among IDC, CHA, and the other IN breeds Fig. 2. For a more clear assessment, we did PCA between the IR and IN breeds which were separated by PC1 (Supplementary Figure S2).
The occurrence and extent of breed admixture were examined by estimating individual ancestry proportions from quality filtered and LD pruned genotype data. During pruning the dataset 13,257 of 39,348 SNPs removed and 26,091 SNPs were remained. Admixture analyses were carried out with up to 14 ancestral components (K) Fig. 3. Cross-validation (CV) errors were calculated to identify the most likely number of ancestral populations. The lowest CV error was detected for K = 12 Fig. 3a. Although at K = 10, CV errors had stagnated after a decline, ancestry components up to K = 10 separate breeds, and so it was recognized as the optimal value of K Fig. 3b.The results of Admixture were in general agreement as PCA. Although the AF breeds, especially the IBL from the IR and IN breeds, were separated at the first ancestry components (K = 2) and also at K = 4, the IR breeds were separated from the IN breeds but a substantial IR ancestry is observed in the CHA breed at K = 4. In addition, based on geographic origin the breeds were divided as follows: K = 2: the AF breeds,K = 4: the IN breeds,K = 7: the IR breeds. At K = 10. The breed-specific ancestry components were clearly defined by all breeds except MOG and QEZ and the Afghan breeds (ARB, BLO, and GDK). However, our finding showed that increasing the number of  www.nature.com/scientificreports/ K above 10 did not yield a consistent MOG and QEZ separation. Therefore, four ancestral components distinguished the five IR breeds where AFS, ZEL, and LOR were unambiguously recognized. Similar to PCA findings, the fourth component was shared between MOG and QEZ, confirming a close genetic relationship. There were no differences among the ARB, BLO, and GDK Afghan breeds from K = 2 to K = 14 which indicated close genetic relation sheep of them, confirming results from PCA and the neighbor-joining tree. The Bengal breeds (BGA and BGE) separated from K = 9, but despite expectation, BGA and GAR with the common name and root separated from K = 5. In general, compared with the other IN breeds, closer genetic relationships were seen between BGA, BGE, and GAR confirming PCA and the neighbor-joining tree analyses.  Table 1.  Figure S4). Genomic inbreeding ranged from 0.008 (in the AFS breed) to 0.5 (in the GAR breed).
The Rsb scores were calculated for haplotype frequencies Fig. 6. The top 1% of Rsb, considered as selective sweeps, identified 185 genes for (a) IR and IN breeds, 249 genes for (b) IR and AF breeds, and 233 genes for c) IR and AF breeds (Supplementary Table S3). Many candidate genes specially associated with immune response and heat stress were identified by Rsb test, such as, ATP2B1, ATP2C1, ATP6V1H, BMPR1B, PLCE1, LRP1B, CXCL1, CD19, DOCK1, DOCK4, PTPRA, MAPK3, UNC5C, ANKRD2, BBS9, NAFTC2IP, RNF139, and ZNF695 in immune system and environment adaptation, and IFT22, EIF2A, HSPB1, TBX6, TBX21, GNA12, BMP7, IL16, IL27, IL4R, and IL21R in heat stress. Furthermore, HOXD1, HEXD2, and MTX2 affect the horn traits,and PRLP,    The overlapping candidate genes for (b) IR and AF breeds include, Echinoderm microtubule-associated protein-like 5 (EML5) on chromosome seven, may change the assembly dynamics of microtubules to make microtubules are slightly longer but more dynamic and it is possible that EML5 plays a role during neuronal development in the regulation of cytoskeletal rearrangements, IQ domain-containing protein E (IQCE) on chromosome 24, involved in body development, TRNA-YW Synthesizing Protein 1 Homolog B (TYW1B) on chromosome 24, influenced on the wybutosine biosynthesis pathway. Usherin (USH2A) on chromosome 12, may be involved in the function of synapses and plays an important role in the development and maintenance of cells in the inner ear and retina. SPO11 initiator of meiotic double-stranded breaks (SPO11) on chromosome 13, involved in the production of double-strand breaks (DSB) of DNA and it is specifically involved in the growth of the testis, maintenance of the male germ line, and maturation of sperm. Three overlapping candidate genes for c) IN and AF breeds were detected; the IQCE, TYW1B, and an unknown gene with Ensemble number (ENSOARG00000025902) which all of these three genes were detected before (IR vs. AF).
We also detected overlapping candidate genes for IR vs. IN, IR vs. AF, and IN vs. AF data on the F ST , xp-EHH, Rsb, and FLK tests Fig. 9. For the F ST test PPA2, involved in the immune system, and KCNIP4 plays important role in heart performance and it is related to skeletal muscle growth and also immune response. SYT1, associated with feeding behavior traits such as residual feed intake and TMEFF2, involved in a wide range of traits such as,immune response, milk production and sperm morphology, were detected as overlapping candidate genes for the Rsb test. For the FLK test, PPA2, EML5 genes, which have been found in the previous tests, MGAT5, associate with dry matter intake and NEB, involved in environment adaptation, were detected as overlapping candidate genes on the three different data. We did not find any overlapping candidate genes on the all data by the xp-EHH test.
Biological enrichment analysis of significant biological processes for candidate genes under positive selective pressure revealed 26 Gene Ontology (GO) terms Table 2. These GO terms reflected protein function and biosynthetic processes, including TNIK and DOCK1 genes associated with cytoskeleton organization (GO:0007010)   Fig. 10.

Discussion
The present study investigates the genetic diversity and selective sweeps of 14 sheep breeds from Iran, Afghanistan, India, and Bangladesh. The selective sweeps were studied using the F ST, FLK, xp-EHH and Rsb statistical methods on the three cluster of breeds (IR, IN, and AF). Our goal in the current study was to search the genomes of these indigenous sheep breeds to highlight genetic variants that can be used in developing next-generation productive breeds, better suited to diverse Iran environments, in a comparative scale with Indian, Bengal, and Afghan sheep breeds. Furthermore, the other goal was using four comparable selective sweeps tests to cover all the regions of the genomes and capture maximum candidate genes, as well as review their biological function. The results showed that these breeds' genomes contain multiple regions under selection. These regions contain  Furthermore, the AF cluster showed an IBL sub-cluster and a compact sub-cluster of three Afghan breeds, indicating a closer relationship among Afghan breeds and their genetic distance from IBL.
These findings are consistent with previous research on sheep 38,39,3,40 , which showed that individuals were separated by global population structure patterns according to their geographical origin.   41,38 PCA results demonstrated that the genetic variation was associated with the separation among sheep breeds from different parts of the world. This was further supported by neighbour-joining tree analysis revealing that the population was split according to geographic origin (IR, IN, and AF). Population structure analyses of the IR, IN, and AF breeds clearly reflected the geographic distribution at PC1 and the separation of northern from southern breeds at PC3. Admixture and phylogenetic patterns. In accordance with the previous analyses, admixture results confirmed that the first few ancestral breed components (K = 2 to K = 5) were related to the geographic origins. High levels of breed admixture were detected among the Iranian (IR and IBL) breeds, and also among the IN breeds. A significant IR ancestry is observed in the CHA breed at K = 4, which probably due to the same climate between the CHA from Kashmir and the IR breeds from the northwest of Iran Table1, and also due to historical ties and neighborliness between Iran and India, especially in the Kashmir region. It is possible that the IR breeds and CHA have common ancestors. However, low levels of admixture events among the breeds originating from the different geographical regions were detected. For example, although the GAR and BGA from India and Bangladesh have a common breed name (Garole), they separated at k = 5 ancestral breed components, while BGA and BGE which are known as two different breeds in Bangladesh showed more relationship and they have been separated at k = 9 which confirm the effect of geographic origin in breeds separation. Admixture results confirmed genetic divergence identified through the neighbor-joining and PCA.
Inference based on population neighbour-joining trees based on genome-wide allele frequencies clustered the breeds into three monophyletic clades according to the geographical origin. The deepest population split among the AF breeds separated IBL from the other AF breeds. Among the IN breeds, IDC and CHA showed deeper population splits, in line with geographic clades detected by the PCA and admixture analysis. These results support the previous findings 2, 41,38 . Runs of homozygosity. The history of inbreeding within a population can be estimated from the length distribution of ROH segments 42 . We estimated F ROH to study genomic inbreeding. The average F ROH levels estimated for the breeds was (0.09), which was almost the same as the previous study 3 . In the studied breeds, the range of FROH recorded 0.008 to 0.5, was higher than the previous maxima but the observed minima were consistent with previous findings. Mastrangelo et al. 43 reported the range of F ROH from 0.016 to 0.099 in domestic sheep breeds. www.nature.com/scientificreports/ The IBL and IN breeds had the largest number of ROHs, and therefore showed highest FROH levels, indicating the relationship between the number of ROH segments and FROH levels 3 . The results of ROH segment also showed that more than 93% of the ROH segments were shorter than 8 Mb, which indicated older events of inbreeding and a board effective population size of sheep flocks 43 . The IN breeds displayed a higher FROH variance than the IR and AF breeds which could indicate more effective population size variation in the IN breeds. Inbreeding and extended ROH segments can be increased by small population size and intense selection, thereby continuing to express the deleterious alleles 44 .
Genome-wide selective sweeps. The ability of specific genomic regions to detect selective sweeps depends on the selection of analytical tools appropriate to the biological situation but no single method can detect selective sweeps that are both starting and nearly completed. However, combining several tests increases significantly the power to recognize the region selected 6,7 . Therefore, we used F ST , FLK, xp-EHH, and Rsb test statistics to detect genome-wide selective sweeps in (a) IR and IN breeds, (b) IR and AF breeds, (c) IN and AF populations. F ST was first implemented to measure the degree of genetic differentiation between populations based on variations in allele frequency 45 . The genomic variation information is provided by F ST at a locus between the populations compared to within the populations. Therefore, the F ST is an evidence of selection: high F ST values indicate positive local adaptation 46 . The older selection events between populations are expected to be identified by F ST 47,48 . The xp-EHH test is an extension of EHH 31 , that incorporates information on the relationship between an allele's frequency and LD measurements with neighboring alleles. Therefore, this test may provide maximal statistical power and low ascertainment bias sensitivity 33 . The Rsb test is population comparison test to identify selective sweeps 33 . The test is based on the same idea as the XP-EHH, identifies loci similar to the XP-EHH test under selection, but can be implemented with unphased data 28 . Generally, the xp-EHH and Rsb tests are used to detect recent positive selection within population and between-populations, respectively 9 . The FLK (extended Lewontin and Krakauer test) test is based on the assumption that two new populations are formed by the splitting of a population,calculates a statistic of population differentiation, which incorporates a matrix of kinship describing the relationship between populations 27,28 . For each SNP, the FLK test calculates  www.nature.com/scientificreports/ a global F ST , but allele frequencies are first rescaled using a matrix of population kinship. This matrix, which is estimated from the genome-wide data observed, measures the amount of genetic drift that can be predicted along all branches of the population tree under neutral evolution 27 . Therefore, the integration of these four complementary statistical tests provides a valuable tool for detecting, with greater confidence, positive selection of genomic regions. For F ST and FLK, only the top 1% Z(F ST ) values and the top 1% -log(p-value) were considered, respectively to be representing selective sweeps as recommended in previous studies 29,40 .
Analyses of selective sweeps were reported for several international sheep populations from several countries, including China 49 53 . This study, using F ST , xp-EHH, Rsb, and FLK, detected on average 128, 207, 222, and 252 genomic regions as candidates for selective sweeps, respectively. Although the selected candidate regions are narrow and are distributed across different chromosomes, however for F ST and FLK tests, chromosome 1 showed a low value for IR vs. AF and IN vs. AF comparisons which may indicate the genome of two populations are the same in this region and many common genes were expected to be fixed in both populations 48 , Figs. 4 and 7. Several of these genes encode economically important traits. For example, genes that have directly or indirectly influence traits for adaptation to hot arid conditions and heat tolerance (TRHDE, IL4R,IL21R, and SLC4A4), which are reported as candidate genes involved in heat tolerance on sheep 57 . The heat shock protein B1 (HSPB1) gene which expresses both at mRNA and protein levels under heat stress on poultry 58 , reported in sheep 59 , and cattle 60 . All of these candidate genes were detected in IR vs. AF, and vs. AF clusters, where the AF breeds are common (Supplementary Tables S1, S2, and S3). This indicates that the AF breeds, which are from a hot dry climate, have undergone selection for heat tolerance.
Many of the candidate genes identified in this study are effective in genetic resistance to disease, immune response and climate adaptation, which indicates differential selections among the studied breeds. Since genetic resistance against diseases and harsh environmental conditions are important characteristics of indigenous animal breeds, the identification of a large number of genes in this study points toward the associated genes have been under selection pressure over time due to the natural selection of immune response traits 61,62 . For example, we detected the DOCK family (DOCK1, DOCK4, DOCK10) 63 (Supplementary Tables S1, S2, and S3). Almost all these genes were detected in all three clusters, a (IR and IN), b (IR and AF), and c (IN and AF), which may indicate genetic resistance and high immune response against diseases and harsh environmental conditions in these native breeds.
We detected some genes involved in production traits and indirectly related to climate adaptation, such as, FABP3. Calvo et al. 74 showed linkage disequilibrium between FABP3 gene and quantitative trait loci (QTL) for milk fat content. Other related milk traits candidate genes included the LRP1B and CNTN4 which previously reported on sheep 75 and cattle 76 . The ITPR2 and SLC27A6 are also two examples of important candidate genes detected by Li et al. 75 on sheep and both have been proposed to be candidate genes for milk and fat production in cattle and indirectly involved in climate adaptation 77,78 . All of these genes were found in b (IR and AF), and c (IN and AF) clusters, indicating AF breeds may be under selection pressure related to milk traits but it needs further research to conclude. (Supplementary Tables S1, S2, and S3).
We found several candidate genes involved in body weight and growth traits specially post-weaning gain in all population clusters, such as the TRHD, UBR2, GRM2, GRM3 which also related to climate adaptation indirectly 79 Four of the genes (DOCK1, TYW1B, USH2A, and TNIK) play important roles in resistance against diseases and climate adaptation. DOCK1 located on chromosome 22 is involved in immune response 63,64 . TNIK on chromosome 1 plays several functions in embryonic development, especially during the early embryo to blastocyst stages, participates in the regulation of the inflammatory response against infections 83 , TYW1B on chromosome 24 influences artery disease and blood pressure in human 84 , USH2A on chromosome 12 may be involved in the function of synapses and plays an important role in the development and maintenance of cells in the inner ear and retina 85 .
In total, seven unique candidate genes were detected for IR vs. IN, IR vs. AF, and IN vs. AF comparisons by F ST , Rsb, and FLK analysis, but no overlapping candidate gene was found for the xp-EHH method Fig. 9.
PPA2 on chromosome 6 is associated with immune response and disease resistance in cattle 86 . KCNIP4 gene on chromosome 6 is directly involved in processes related to muscle growth and fat deposit and indirectly climate adaptation in sheep 87 and was reported in cattle involved in bovine growth and calcium metabolism 88 . SYT1 gene on chromosome 3 is associated with feeding behavior traits related to local adaptation 89 , and TMEFF2 gene on chromosome 2 is involved in a wide range of traits such as, immune response, milk production and, sperm morphology 68,90 . For the FLK test, PPA2, EML5, MGAT5, and NEB genes were detected, which PPA2 and EML5 have been found in the previous tests Fig. 9. NEB gene on chromosome 2 is involved in environmental adaptation. Among 1262 selected genomic regions reported by Yudin  www.nature.com/scientificreports/ GO classifications of the candidate genes were performed to enable a better understanding of their molecular functions. Based on the GO biological process (BP) for a significant threshold (p ≤ 0.05), we implemented the GO on 11 overlapping candidate genes. Only six genes (TNIK, DOCK1, SFMBT1, SPO11, USH2A, and TYW1B) associated with the 26 GO terms were identified. In total 11 GOs were related to TNIK, USH2A, TYW1B, and DOCK1, which are associated with local adaptation (resistance against diseases).
In confirmation of our results, Nie et al. 92 , reported different GO terms associated with the TNIK gene in human 92 . Four other GOs associated with DOCK1. DOCK family genes have several biological functions 63,64 .
Absolute correlation among the F ST , FLK, xp-EHH, and Rsb tests were calculated Fig. 10. The xp-EHH, and Rsb are based on the frequency of extended haplotypes between two populations 33,34 , whereas F ST and FLK are based on allele frequencies 6,37 . So as expected, maximum correlations were observed between F ST and FLK, as well as between xp-EHH and Rsb. On the other hand, we detected minimum correlations between haplotype based tests (xp-EHH, and Rsb) and allele based tests (F ST and FLK). These findings are consistent with the previous reports 8,28 .

Conclusions
Our results showed the population structure and selective candidate genomic regions of the 14 indigenous sheep breeds from Middle East and South Asia. This information would be valuable in future study on genetic basis for local adaptation of indigenous breeds. In F ST, FLK, xp-EHH, and Rsb complementary statistical tests, some candidate genomic regions under selective pressure were detected in indigenous sheep breeds and these candidate genomic regions may facilitate identification of the underlying genes and possible exploitation in future sheep breeding.

Data availability
Genotype data from the sheep breeds (Afshari, Moghani, Qezel, Bangladeshi Garole, Bangladesh East, Indian Garole, Changthangi, and Deccani) are available through the Sheep HapMap project 11 . The ZEL, Lori-Bakhtiari, Iranian Balochi, Arabi, Afghan Balochi, and Gadik breeds data are part of the Iranian national genetic evaluations of economic traits conducted at the Animal Breeding Center of Iran. Any request for data should be addressed to the corresponding author.