Introduction

Farm animals contribute in many ways to human survival and economic development, particularly as societies evolve from subsistence agriculture into market-based economies1. However, in the past two decades, farm animal genetic resources have faced an alarming rate of extinction and genetic erosion, as approximately 200 uniquely adapted breeds have become extinct and up to 30% of global livestock breeds (approximately 1,200–1,500) are currently endangered2. This loss of diversity is attributed to multiple causes, including regional economic forces such as the decline in the economic viability of traditional livestock production systems3, destruction of the native habitats (e.g., grazing lands) of livestock breeds4, rapid dissemination of uniform and highly productive breeds at the expense of native genetic resources5 and the extension of markets and economic globalization6. In this context, an integrated analysis of the genomic diversity of farm animals and the associated economies can aid an understanding of economic impact on the changes in farm animal genomic diversity in recent years. Additionally, it will contribute to the development of governmental interventions and the implementation of conservation priorities that are necessary to ensure that ongoing agricultural development will be compatible with the conservation and sustainable use of local livestock diversity7.

Here, we ranked the conservation priorities of 97 native sheep breeds and 53 native cattle breeds from around the world based on genome-wide single-nucleotide polymorphism (SNP) data for the first time. Firstly, we calculated heterozygosity, inbreeding coefficient, and runs of homozygosity (ROH) for each sheep and cattle breed to detect the small and inbred populations that are facing the risk of extinction. Moreover, we implemented the first association analysis between economic indexes and genomic diversity by studying a total of 2,941 animals from 150 native breeds of two globally distributed livestock species: sheep (1,910 animals, 97 breeds; Supplementary Table 1) and cattle (1,031 animals, 53 breeds; Supplementary Table 2). Specifically, the estimations of genomic diversity contributions for sheep and cattle were based on SNP array data genotyped from the Ovine SNP50 (or Infinium HD) and Bovine SNP50 BeadChips, respectively. Through the molecular coancestry8 and allelic richness approaches9, we calculated the breed contribution to genomic diversity at three levels: within-population diversity (ΔGDWS), between-population diversity (ΔGDBS), and total genomic diversity (ΔGDT) for each breed after its removal (Supplementary Tables 1 and 2). For region-wide breeds, the estimation of genomic diversity contributions was implemented by excluding all of the breeds within the region from the total breeds (Supplementary Tables 3 and 4). We then compared the ΔGDWS, ΔGDBS and ΔGDT among the regions of origin (or countries) in terms of the annual gross domestic product (GDP) per capita (GDPPC). We also took into account the increase in the GDP per capita (ΔGDPPC) for the regions during the past two decades in a dual-regression analysis to correlate the contribution of the genomic diversity of the breeds and the past economic development in a region. To facilitate comparisons among the regions with different economies, we grouped the regions into three categories (i.e., underdeveloped, developing and developed) according to the average GDPPC (A-GDPPC) during the years 1993–2013. Based on all the above analyses, we were finally able to determine the contribution of each breed to the maximum amount of genomic diversity and establish the core sets of breeds with higher priority for conservation.

Results

Genetic diversity pattern and inbreeding coefficient

Heterozygosity is a measure of genetic variation within a population. We estimated the observed (HO) and expected (HE) heterozygosity for each sheep and cattle breed. For sheep breeds, the highest HO value was observed in the RAA (0.394) breed from Spain, and the lowest was in the SPS breed (0.252) from Yunnan province of China. The highest and lowest expected (HE) heterozygosity were also observed in the RAA breed (0.385) and SPS breeds (0.258), respectively (Supplementary Table 1). For the cattle breeds, CANC breed from Brazil and PRP breed from France had the highest HO (0.362) and HE (0.343) values, respectively, and BALI breed from Indonesia presented the lowest HO (0.061) and HE (0.063) values (Supplementary Table 2). For the estimation of inbreeding coefficients (FIS) of sheep breeds, BGE breed had the highest FIS value (0.157) and DSH breed had the lowest FIS value (−0.065) (Supplementary Table 1). In cattle breeds, the SOM (0.050) and ND2 (−0.108) presented the highest and lowest FIS values, respectively (Supplementary Table 2).

Effective population size

The results for effective population size (Ne) for sheep and cattle were calculated based on the linkage disequilibrium method and presented in the Supplementary Tables 1 and 2. In sheep, Ne ranged from 3,074 (BYK) down to 3 (WNS), while in cattle the value ranged from 1,429 (ZBO) down to 1 (BALI). The high degree of inbreeding may be the reason for the extremely low Ne value of BALI, as evidenced by the high FIS and FROH value observed in this breed (Supplementary Table 2). To glean further details, we divided the Ne values into three categories: Ne ≤ 100, 100 < Ne < 200, and Ne ≥ 200, and the number of breeds within each of the three categories were summarized in Table 1 (also see Figs 1a and 2a, Supplementary Figs 1a and 2a and Supplementary Tables 1 and 2).

Table 1 The number of sheep and cattle breeds in different categories of population genetic parameters.
Figure 1
figure 1

Distribution patterns of Ne (a) NROH (b) LROH (c) FROH (d) found in 97 sheep breeds. Ne: effective population size. FROH: inbreeding coefficient based on runs of homozygosity. NROH: number of ROH. LROH: mean length of ROH. The Y axis represents the number of breeds that fall into three different ranges of the statistics.

Figure 2
figure 2

Distribution patterns of Ne (a) NROH (b) LROH (c) FROH (d) found in 53 cattle breeds. Ne: effective population size. FROH: inbreeding coefficient based on runs of homozygosity. NROH: number of ROH. LROH: mean length of ROH. The Y axis represents the number of breeds that fall into three different ranges of the statistics.

Runs of homozygosity (ROH)

In this study, we calculated the ROH statistics for each breed of sheep and cattle, and subsequently the inbreeding coefficient based on ROH (FROH) was estimated. The FROH values ranged from 0.0004 (WZS) to 0.2016 (WIL) in sheep breeds (Supplementary Table 1) and from 0.0001 (ZEB) to 0.6842 (BALI) in cattle breeds (Supplementary Table 2). Regarding the number of ROH (NROH) above 500 kb, the sheep breed of WIL (NROH = 780) showed the highest NROH value and WZS (NROH = 2) had the lowest NROH value (Supplementary Table 1). For the cattle breeds, ZEB (NROH = 1) and BALI (NROH = 1,127) breeds showed the lowest and highest NROH value, respectively (Supplementary Table 2). Finally, we also calculated the average length of ROH above 500 kb for each breed (LROH). For sheep and cattle breeds, LROH values ranged from 26.9 Mb (HLS) to 9.0 Mb (SOA), and 21.8 Mb (BOR) to 5.1 Mb (ZEB), respectively (Supplementary Tables 1 and 2). To obtain more details, FROH, NROH and LROH values were also divided into three categories: FROH categories (FROH ≥ 0.10, 0.05 < FROH < 0.10, and FROH ≤ 0.05), NROH categories (NROH ≥ 200, 150 < NROH < 200, and NROH ≤ 150) and LROH categories (LROH ≥ 20 Mb, 15 Mb < LROH < 20 Mb, and LROH ≤ 15 Mb), and the number of breeds belong to each of the categories were shown in Table 1 (also see Figs 1 and 2, Supplementary Figs 1 and 2 and Supplementary Tables 1 and 2).

Genetic structure

The relationship between breeds was investigated using the NJ tree based on the Reynolds distance estimated from allele frequency differences, and branch lengths were highly variable. The 97 sheep breeds were clearly divided into African, Asian (Southwestern Asian, Southeastern Asian, and China), and European sheep breeds (Supplementary Fig. 3a). However, sheep from the Americas were clustered with European-derived sheep breeds (Supplementary Fig. 3a). The 53 cattle breeds were clearly divided into European and African cattle breeds, although the American and Southeastern Asian cattle breeds exhibited an admixture with cattle breeds from Europe and Africa (Supplementary Fig. 3b).

Conservation priorities

The mean values obtained using the molecular coancestry and allelic richness methods showed ranges for ΔGDWS, ΔGDBS, and ΔGDT in sheep from −0.200 (SPS) to 0.077 (RAA), −0.067 (KAZ) to 0.218 (SOA), and −0.044 (SPS) to 0.042 (DSH), respectively (Supplementary Table 1). In cattle, these values ranged from −0.781 (BALI) to 0.257 (CANC), −0.198 (CANC) to 0.670 (BALI) and −0.176 (LAG) to 0.172 (HFD), respectively (Supplementary Table 2).

As representative of the main cattle breeds reared around the world, six cattle breeds (HFD, HO, AN, ANR, NRC, and BEFM; 11.3%) showed a ΔGDT > 0.1 and 11 (20.8%) breeds showed a ΔGDT < −0.1 in this study (Supplementary Table 2 and Supplementary Fig. 4f). However, none of the native sheep breeds presented extreme values of ΔGDT > 0.1 or <−0.1 (Supplementary Table 1 and Supplementary Fig. 4c). This observation implied a more diverse genetic constitution of sheep compared with cattle. Another possible reason is the differences in number of sheep and cattle breeds analyzed, as the contribution of a single breed to genomic diversity is expected to be lower in a larger sample.

Comparison of conservation priorities in three different economic categories

According to the average GDP per capita (A-GDPPC) during 1993 to 2013, we divided countries or regions into three economic categories: underdeveloped countries/regions, developing countries/regions, and developed countries/regions. For sheep, English breeds showed the highest ΔGDT (0.259) value, Xinjiang breeds showed the highest ΔGDWS (0.278) value, and South African breeds were observed with the highest ΔGDBS (0.179) value. In contrast, Bangladeshi breeds showed the lowest ΔGDWS (−0.205) value, and Xinjiang breeds had the lowest ΔGDBS (−0.445) and ΔGDT (−0.167) values (Supplementary Table 3). For cattle, ΔGDWS ranged from 2.271 (France) to −1.421 (Indonesia), ΔGDBS ranged 1.134 (Indonesia) to −1.225 (France), and ΔGDT ranged from 1.047 (France) to −0.293 (Pakistan) (Supplementary Table 4). Further comparisons revealed significant differences in ΔGDWS and ΔGDT between breeds from the three economic categories. For instance, both species showed that the breeds from developed regions had the highest average value of ΔGDT, followed by those from developing regions and underdeveloped regions (Fig. 3). In addition, the breeds in developing and developed regions exhibited significantly (P < 0.05) higher ΔGDWS values than those in underdeveloped regions, whereas the ΔGDBS estimates for breeds from different regions within the three economic categories were not significantly different from each other (Fig. 3a). We also examined the impact of economic growth, which is typically measured as the rate of change in the average annual GDP per capita (ΔGDPPC), on the gain or loss of genomic diversity. Our results showed that ΔGDWS and ΔGDT presented significant associations with ΔGDPPC during the past two decades (sheep: R2 = 0.087, P = 0.073 (ΔGDWS), R2 = 0.278, P < 0.01 (ΔGDT); cattle: R2 = 0.235, P < 0.05 (ΔGDWS); R2 = 0.318, P < 0.01 (ΔGDT); Fig. 4a,c,d,f). In both species, ΔGDWS and ΔGDT displayed a rapid increase for ΔGDPPC < 20,000 and a slow increase thereafter (Fig. 4).

Figure 3
figure 3

Comparisons of the genomic diversity contributions of sheep (a) and cattle (b) breeds between regions (or countries) within the three economic categories. Three economic categories: I, underdeveloped; II, developing; III, developed. Three genomic diversity contributions: within-population, ΔGDWS (yellow); between-population, ΔGDBS (pink); total genomic diversity, ΔGDT (green). **P < 0.01; *P < 0.05. The loss or gain of genomic diversity is indicated by the average values of estimates obtained using the molecular coancestry and allelic richness methods.

Figure 4
figure 4

Regression between the genomic diversity contributions and the increase in GDP per capita (ΔGDPPC) during the years 1993–2013. (ac) Within-population (ΔGDWS), R2 = 0.087, P > 0.05; between-population (ΔGDBS), R2 = 0.001, P > 0.05; total genomic diversity (ΔGDT), R2 = 0.278, P < 0.01 in sheep. (df) Within-population (ΔGDWS), R2 = 0.0235, P < 0.05; between subpopulation (ΔGDBS), R2 = 0.153, P > 0.05; total genetic diversity (ΔGDT), R2 = 0.318, P < 0.01 in cattle. The loss or gain of genomic diversity is indicated by the average values of the estimates obtained using the molecular coancestry and allelic richness methods.

Discussion

Heterozygosity is an important factor for estimating the genetic variability in domestic animals. Heterozygosity represents the genetic potential and adaptation abilities to the natural environment10. In this study, we found that the livestock in developing and developed regions have higher heterozygosity than those in underdeveloped regions or some island populations. For example, sheep breeds from Southwestern Europe and Northern China obviously harbor higher observed (HO) and expected (HE) heterozygosity than those from Africa and southwestern China, which belong to underdeveloped regions (Supplementary Table 1). Similarly, cattle breeds from Europe and America possess higher HO and HE values than those from Africa and Asia (Supplementary Table 2). Besides economic factors, another reasonable explanation is that breeds with high heterozygosity are from regions near the domestication center or trade routes11,12,13,14,15. Notably, we found that the island breeds always show low heterozygosity. For example, BALI breed, which is distributed on the Bali island of Indonesia, exhibits the lowest heterozygosity (HO = 0.061 and HE = 0.063). Also, GNS (HO = 0.299 and HE = 0.297) and JER (HO = 0.290 and HE = 0.280) breeds from Guernsey and Jersey islands near England harbor lower heterozygosity than breeds from the European continent (Supplementary Table 2). These results suggested that these island breeds have suffered population decline and their genetic variation has been reduced.

Effective population size (Ne) has been regarded as a popular parameter to investigate small populations and endangered species. According to the 50/500 rule of thumb16, Ne values under 50 means that the population is likely to face a serious genetic threat after 5 or more generations. This is because small populations cannot easily avoid inbreeding depression. Ne values above 500 means that the population can maintain an evolutionary potential for longer periods of time. However, Frankham et al.17 have proposed a new criterion that Ne values under 100 and above 1,000 are suitable to control inbreeding and maintain high genetic diversity, respectively17. Based on the new criterion, more than half of the breeds for both species in this study are facing the risk of extinction as their Ne values were under 100 (Figs 1a and 2a, Supplementary Figs 1a and 2a and Supplementary Tables 1 and 2). The probable cause of this observation was human intervention in the process of livestock development, such as introduction of breeds and intensive breeding18,19,20,21,22. Overall, we can see that most sheep and cattle breeds do not show a long-term stable status, and thus, we need to pay more attention and carry out appropriate conservation strategies on these native breeds with small effective population size.

ROH is a key parameter to detect the level of matings between related individuals in a closed or small population, and the number and length of ROH could indicate the extent of recent inbreeding and artificial selection23,24. Because of its utility, ROH has been used to investigate genome-wide autozygosity regions associated with some important traits and diseases in domestic animals and humans25,26. Inbreeding is among the major causes that lead to an increase in homozygosity, which in turn, reduce fitness and accumulate more recessive alleles that may lead to expression of unfavorable phenotypes27,28. According to the inbreeding coefficient based on runs of homozygosity (FROH), we have identified 10 sheep breeds (WIL, BOR, WNS, BCS, ZTS, BGE, DSH, BHM, EFB and YXZ) and three cattle breeds (BALI, SH and JER) with an FROH value above 0.10. In addition, these breeds also showed a large number of runs of homozygosity (NROH ≥ 200) (Supplementary Tables 1 and 2). Together these results suggest that these breeds were founded as small populations or the populations recently experienced severe inbreeding. Previous studies have provided evidence that domestic animals have suffered serious inbreeding and the effective population size is declining under the pressure of artificial selection29,30. In our study, most of the sheep and cattle breeds had a ROH length above 10 Mb, suggesting they share recent ancestors 3 generations ago31,32. Also, there was a negative correlation between FROH and Ne, especially in the sheep breeds. Sheep breeds from North China had a lower FROH and a higher Ne than other regions, while in cattle breeds the lowest FROH and the highest Ne appeared in Africa and Europe. Interestingly, although some sheep and cattle breeds from the developed regions show high FROH, NROH and LROH and low Ne values, they make contributions to genetic diversity. For example, 6 sheep breeds (BOR, DSH, ISF, SOA, WIL and GAL) have a Ne value under 200, but they exhibit high ΔGDBS and positive ΔGDT values (Supplementary Table 1). Also, three cattle breeds (SH, GNS and JER) present the same pattern as the sheep breeds. Therefore, these breeds should be subjected to conservation measures to avoid the erosion of their genetic diversity. In addition, the breeds from developing and underdeveloped regions with low Ne values, such as sheep breeds BGE, GAR, ZTS and SPS, and cattle breeds LAG, ND2, BOR, ZMA and BALI (Supplementary Tables 1 and 2), are strongly differentiated from the continental breeds (Supplementary Fig. 3), implying that they may have some special economic traits to be used in breeding programs.

Population structure is a useful information for humans to design effective strategies to improve the conservation of farm animal genetic resources. In this study, we performed a comprehensive analysis of population structure of sheep and cattle breeds worldwide. The 97 sheep breeds can be clearly divided into six geographic regions (Europe, Southwestern Asia, Southeastern Asia, China, America and Africa) (Supplementary Fig. 3a). Previous research on 74 diverse sheep breeds collected globally documented a high level of admixture within the European sheep, which showed high co-ancestry with most other breeds33. Compared to the previous study, our data set incorporated sheep breeds from China, and identified the breeds with large distinctness among the sheep breeds. Based on the results of the NJ tree, the breeds from Southeastern Asia, Africa, Southwestern China and several breeds (WIL, BRL, DSH, EFB, ISF and GAL) from Europe have longer branches (Supplementary Fig. 3a), suggesting these breeds are isolated breeds and possess unique genetic characteristics. The investigation of genetic structure of cattle reveals that cattle have two domestication centers: Fertile Crescent and Indus valley15. The African taurine has large divergence from European taurine and Asian indicine because of their large portion of wild African auroch ancestry34. This is consistent with our results that African cattle show apparent divergence from European cattle (Supplementary Fig. 3b). In addition, 7 African (LAG, BAO, ND2, ZMA, NDAM, SOM and ZEB) and 3 Southeastern Asian breeds (SAHW, GIR and BALI) make a large contribution to the between-breed genetic diversity (ΔGDBS) (Supplementary Table 2). These results reflect the unique genetic resources carried by these breeds.

From the results of loss or gain of genetic diversity, coupled with the evidence of the above statistics (i.e., heterozygosity, Ne, ROH and population structure), our most prominent finding was that breeds from developed regions exhibit higher total genomic diversity than those in underdeveloped and developing regions, and thus, should be given a higher conservation priority (Fig. 3). Since the sample of this study does not cover the global distributions of sheep and cattle, future studies based on more extensive and representative samples are needed to verify this finding. For the potential factors affecting conservation priority, whether the economic situation measured based on the annual GDP per capita plays a direct role in livestock conservation is unclear. One hypothesis is that a better economy induces more effective conservation measures, including higher subsidies for raising local livestock breeds and more financial investment in conservation programs. For example, the European Union has long recognized the importance of conserving animal genetic resources (AnGR) and, in 1992, initiated a policy of economic incentives for farmers keeping native breeds, under EC regulations 2078/92 and 1257/9935. However, breeds from underdeveloped regions, such as African countries (EMZ, NQA, RMA and RDA) and the Yunnan-Kweichow Plateau of China (ZTS, WGS, SPS, TCS, DQS, NLS and WNS), where little or no financial support has been provided for local livestock conservation measures, showed extreme negative values of within-breed genomic diversity (Supplementary Table 1 and Supplementary Fig. 4a). This observation could be due to factors such as long-term genetic isolation, genetic drift, or inbreeding, and thus, effective conservation measures preventing genetic drift in small-sized and isolated breeds and promoting reasonable gene flow with outside breeds should be implemented for the native breeds in underdeveloped regions. A second hypothesis is that economic growth is one causative factor for the change in the livestock production system, possibly through effects on breeding programs. Rapid economic growth could have led to an increased need for more productive and synthetic livestock breeds36 and therefore contributed to the increased ΔGDT and ΔGDWS and decreased ΔGDBS values observed in livestock (Fig. 4). Native breeds in developed regions exhibited high values of both ΔGDWS and ΔGDT (Fig. 3), which could also be due to the improvement of breeding programs by avoiding indiscriminate crossbreeding37. However, breeds in developing regions such as Xinjiang of China showed high values of ΔGDWS but negative values of ΔGDT (Supplementary Table 1 and Supplementary Fig. 4a,c), which could be explained by the development scheme of extensive and intensive crossbreeding of sheep with exotic breeds adopted in past decades38. Apart from the potential impact of economic situation on livestock conservation, several other factors may also account for the conservation status of domestic animals. For instance, the development of the breed concept, which originated 200–250 years ago in Europe, has probably affected the number of breeds differentiated and selected within the region. Also, transboundary exchanges of breeds may have played a fundamental role in shaping the genetic diversity of breeds, especially in American and Australian countries.

Another important finding regarding genetic diversity was that breeds in developing regions have been facing the erosion of between-breed genomic diversity (ΔGDBS). Both species are transboundary animals, and many highly productive commercial breeds have been transferred worldwide to upgrade local breeds39,40,41,42. With the rapid economic development and increase in the human population in developing regions, the share of trade in genetic material from developed to developing countries increased from 20% in 1995 to 30% in 200543,44. Long-term intensive crossbreeding programs using imported exotic breeds have narrowed the diversity of local genetic materials and led to the loss of between-breed genomic diversity in developing regions45. In northern China, the estimates of both ΔGDBS and ΔGDT were negative, suggesting that northern Chinese sheep have suffered severe genetic erosion induced by exotic breeds in the past, which has resulted in the loss of both total and within-breed genomic diversity (Supplementary Fig. 4b,c). Similar scenarios were also observed in the breeds of Iran, Turkey and Pakistan (Supplementary Fig. 4a,c). However, livestock in underdeveloped regions (including remote and isolated island areas) are commonly used to service the local community46,47, being subjected to little or no influence from exotic breeds, and these populations therefore showed high ΔGDBS values (Supplementary Fig. 4b,e). For example, the diverse and isolated topography of the Yunnan-Kweichow Plateau of China has contributed to the genetically isolated populations of local sheep breeds, which show high and positive ΔGDBS values, but negative ΔGDT values (Supplementary Fig. 4b,c). Similar results were observed in cattle from mainland Africa and Madagascar (Supplementary Fig. 4e,f). In these scenarios, breeds with high levels of within-breed genomic diversity (ΔGDWS) should be ranked with high conservation priorities to preserve closed populations that will be capable of coping with future challenging environments or diversified food production. However, breeds with large divergence should also give protection measures to prevent their extinction due to their unique genetic resources and special adaptation to local environment.

As the ovine and bovine SNP BeadChips have generally been developed on European breeds, it is expected that ascertainment bias would impact the estimates of genetic diversity metrics. However, our results should not be significantly affected by ascertainment bias for the following reasons. On the one hand, Herráez et al.48 have demonstrated that the effect of ascertainment bias could be countered if the SNPs with high LD were removed. Based on this finding, we recalculated the heterozygosity and ΔGDT (loss or gain of genetic diversity) by using the LD-pruned data, and obtained similar results with those not controlling for LD (Supplementary Fig. 5). On the other hand, Kijas et al.33 calculated genetic diversity of sheep by using two types of SNP chip data which were genotyped by Roche 454 and Illumina GA sequencing, respectively. The Roche 454 SNPs were developed primarily using animals of European origin, while the Illumina GA SNPs were designed using a larger number of animals selected from multiple regions. The results showed that ascertainment bias was unlikely to heavily influence genetic diversity between breeds and regions. From all the evidence presented above, we believe that ascertainment bias should have little influence on our results.

In conclusion, we show that native sheep and cattle breeds from developed regions in this study make a greater contribution to total genomic diversity than those from developing and underdeveloped regions; thus, a higher conservation priority should be given to these breeds. In contrast, most of the breeds in underdeveloped and developing regions make a small or no contribution at all to total genomic diversity and therefore receive relatively lower rankings in terms of conservation priority. Nevertheless, we noticed that breeds from underdeveloped countries tend to contribute more to between-breed diversity (Fig. 3). This is also a conservation objective, as many of these small, isolated and peripheral populations may have valuable variation (e.g., private alleles) that should be conserved. Also, it should be noted that the lack of samples in some distribution areas of sheep and cattle limit the generality of the results. The mechanisms underlying these observations are not well understood, but the regional annual GDP per capita and its growth rate, which are two important indicators of economic development, could be the missing link and help to explain the differences in the genomic diversity contributions of livestock breeds from different regions. In addition to genomic diversity, other factors, such as sociocultural value, local adaptation and the special uses of some livestock breeds, should also be considered in the development of conservation programs. With an awareness that the conservation recommendations made from the analysis of the contribution of breeds to diversity depend on the conservation interests and scopes, our results could help local authorities and animal breeders to ensure the effective and sustainable utilization of animal genetic resources in the future.

Methods

The methods were carried out in accordance with the approved guidelines of the Good Experimental Practices adopted by the Institute of Zoology, Chinese Academy of Sciences. All experimental procedures and animal collections were conducted under a permit (No. IOZ13015) approved by the Committee for Animal Experiments of the Institute of Zoology, Chinese Academy of Sciences, China.

Ovine sample collection and DNA extraction

Blood or tissue samples from 783 animals representing 40 native Chinese sheep breeds were collected in this study (Supplementary Table 1). In all cases, particular efforts were made, based on both pedigree information and the knowledge of local herdsmen, to ensure that the animals were unrelated and typical of their breeds. Whole blood samples were collected into evacuated tubes containing EDTA and ear marginal tissues were collected and stored in 2-ml microcentrifuge tubes containing 75% ethanol. Genomic DNA was extracted from the tissue samples using the standard phenol/chloroform protocol49 or from blood using the General AllGen Kit (Tiangen Biotech, Beijing, China) following the manufacturer’s instructions.

Ovine SNP datasets genotyping

Of the 40 Chinese native sheep breeds/populations, 35 breeds (see Supplementary Table 1; Dataset I) were genotyped with the Illumina Ovine SNP50 (54,241 SNPs) BeadChip, and 5 breeds (Hu, Wadi, Sishui Fur, Large-tailed Han and Small-tailed Han sheep; see Supplementary Table 1; Dataset II) were genotyped with the Illumina Ovine Infinium HD (685,734 SNPs) BeadChip. The Ovine SNP50 and Infinium HD BeadChips were developed by the International Sheep Genomics Consortium (ISGC; http://www.sheephapmap.org). Details regarding SNP discovery, the design of the ovine array and genotyping procedures for the BeadChips can be found at the following address: http://www.sheephapmap.org/hapmap.php, and in Kijas et al.33 for the SNP50 BeadChip and Anderson et al.50 for the HD SNP BeadChip.

Published Ovine SNP dataset

An Ovine SNP50 BeadChip dataset of 74 worldwide-distributed breeds/populations (Dataset III) was retrieved from a previous study33. A set of 57 native breeds (Supplementary Table 1) that had been sampled and genotyped within the Sheep HapMap project were selected out of the 74 breeds (for information on the breeds and their geographic origins, see Kijas et al.33).

Ovine SNP datasets quality control

All of the ovine SNP BeadChip datasets (Datasets I, II and III) were merged to apply quality control procedures using Plink v.1.9 software51. The SNP quality control measures have been detailed elsewhere, in Kijas et al.52, Miller et al.53, Kijas et al.33 and Lv et al.54. Briefly, quality control was performed according the following criteria: (1) individuals with a call rate >95%, (2) SNPs with a minor allele frequency (MAF) >0.05, (3) SNPs with a >95% genotyping rate, and (4) SNPs with physical locations on autosomes. After removing the SNPs and individuals that failed to meet any of the above criteria, 39,641 SNPs and 1,910 individuals were retained in our working dataset.

Bovine SNP datasets

Bovine SNP50 BeadChip datasets were retrieved from previous publications10,34, which included 53 native cattle breeds across its main distributions in the world (Supplementary Table 2). SNPs and individuals that failed to meet any of the following criteria were removed: (1) individuals with a call rate >95%, (2) MAF > 0.05, (3) SNPs with a >95% genotyping rate, and (4) SNPs with physical locations on autosomes (see also Decker et al.34). After filtering, there were 37,827 SNPs and 1,031 animals available for further analyses.

Genetic diversity

The observed (HO) and expected (HE) heterozygosity of each sheep and cattle breed were estimated using Plink v.1.951. They are important within-breed genetic diversity parameters for the conservation priorities in domesticated animals. In addition, FIS was calculated by genepop R package55 based on the observed versus expected number of homozygous genotypes.

Effective population size

Effective population size (Ne) is a well-known parameter in evaluating conservation priorities for a species. Estimates of contemporary Ne were performed with the linkage disequilibrium method using NE ESTIMATOR v.2.1 software56 with the following parameter settings: random mating model and a Pcrit value with 0.05 which as the criterion for excluding rare alleles. In order to insure all loci in the analysis are physically unlinked57, we removed pairwise comparisons with r2 ≥ 0.05 before calculating Ne using Plink v.1.9 software51 (indep-pairwise 100 25 0.05). In addition, we divided the Ne value into three categories: Ne ≤ 100, 100 < Ne < 200, and Ne ≥ 200.

Runs of homozygosity (ROH)

Runs of homozygosity (ROHs) were detected using the Plink v.1.9 software51 for each individual. The following settings were generally suitable for ROH identification in the domesticated animals58: (1) required minimum density (-homozyg-density 1000) and (2) number of heterozygotes allowed in a window (-homozyg-window-het 1). In this study, we detected ROH with the minimum length of 500 kb (-homozyg-window-kb 500) and a minimum SNP number of 50 SNPs (-homozyg-window-snp 50). The number (NROH) and average length (LROH) of ROH for each breed were estimated and the NROH and LROH were divided into three categories: NROH ≤ 100, 100 < NROH < 200, and NROH ≥ 200; LROH ≥ 20 Mb, 15 Mb < LROH < 20 Mb, and LROH ≤ 15 Mb. In addition, the inbreeding coefficient based on ROH (FROH) was calculated for each animal according the following formula26,59:

$${F}_{ROH}=\frac{{\sum }_{K}(Length(RO{H}_{k}))}{L}$$
(1)

where the numerator represents the sum of ROH per animal above 500 kb length and L is the total length of genome. Total lengths of the cattle and sheep genomes were considered as 2,510,611 kb60 and 2,610,000 kb61, respectively. For sheep and cattle, the mean of FROH of each breed were estimated separately. The breed FROH values were divided into three ranges: FROH ≥ 0.10, 0.05 < FROH < 0.10, and FROH ≤ 0.05.

Genetic structure analysis

In order to understand the relationship within and between breeds across each major geographic group, neighbor-joining (NJ) trees were constructed by the software SplitTree 462. To avoid the SNPs with high levels of LD distorting the NJ tree, SNP were pruned using Plink v.1.951. For the sheep and cattle dataset, the 39,641 and 37,827 SNPs were pruned to 34,770, and 29,093 SNPs respectively, by application of LD pruning command line–indep-pairwise 100 25 0.25. Subsequently, Reynolds genetic distances63 were calculated by Arlequin 3.5.2.264, and the NJ trees were constructed using SplitsTree 462 and visualized by FigTree v.1.3.165.

Analysis of conservation priorities based on the molecular coancestry approach

The method of molecular coancestry8 has proven to be efficient for defining conservation priorities for domestic animals. Unlike prioritization based on Weitzman diversity measures66,67, this method considers within-breed genetic variation, which is of great importance for the management of livestock68. The method assumes that there is a metapopulation including n populations with Ni breeding individuals. In the metapopulation, si denotes the average self-coancestry of the Ni individuals of population i; Dii is the average distance between individuals of the ith population; and Dij is Nei’s minimum distance between populations i and j69. The average global coancestry (\(\bar{f}\)) was calculated as follows70:

$$\bar{f}=\sum _{i=1}^{n}\,\frac{1}{n}[{s}_{i}-{D}_{ii}-\frac{{\sum }_{j=1}^{n}\,{D}_{ij}}{n}]$$
(2)

This equation can also be expressed based on genetic diversity70:

$$(1-\bar{f})=(1-\tilde{f})+\bar{D}=\,1-\tilde{s}+\tilde{D}+\bar{D}=(1-\tilde{s})+(\tilde{s}-\tilde{f})+(\tilde{f}-\bar{f})$$
(3)

In equation (3), \(\tilde{f}\) and \(\tilde{s}\) are the mean coancestry and self-coancestry of the populations, respectively; \(\tilde{D}\) is Nei’s minimum distance between individuals within populations; and \(\bar{D}\) is the average genetic distance over the entire metapopulation. In the derived expression, the total genetic diversity (\({{\rm{GD}}}_{{\rm{T}}}=1-\bar{f}\)) was divided into three partitions: the genetic diversity within individuals (\({{\rm{GD}}}_{{\rm{WI}}}=1-\tilde{s}\)), the genetic diversity between individuals (\({{\rm{GD}}}_{{\rm{BI}}}=\tilde{s}-\tilde{f}\)) and the genetic diversity between populations (\({{\rm{GD}}}_{{\rm{BS}}}=\tilde{f}-\bar{f}\)). The sum of the first two components (GDWI and GDBI) gives the genetic diversity within populations (\({{\rm{GD}}}_{{\rm{WS}}}=1-\tilde{f}\)). As a result, the total genetic diversity GDT is assumed to be the sum of the genetic diversity within populations (GDWS) and the genetic diversity between populations (GDBS), i.e., GDT = GDWS + GDBS.

The loss or gain of genetic diversity (ΔGDWS, ΔGDBS and ΔGDT) was estimated using Metapop v.2.0 software71 when each of the populations was removed from the dataset. The parameter of λ = 1, which indicates equal weight for within- and between-breed diversity, was applied in the Metapop calculations71 (Supplementary Tables 1 and 2). It is noteworthy that the initial version of Metapop used loss (−) vs gain (+) of diversity as indicative of high vs low contribution of diversity, respectively. However, in the latest version of Metapop (used in this paper), the sign was changed to go to a more intuitive view: + means greater contribution and − means lower contribution.

Analysis of conservation priorities based on the allelic richness approach

Allelic richness is an important measure of genetic diversity for setting conservation priorities in livestock. The rarefaction method developed by Hurlbert is advantageous for estimating allelic richness72,73. With this method, El Mousadik and Petit denoted allelic richness as ai, representing the number of different alleles in a sample of genes taken at random. The calculation formula is as follows74:

$${a}_{i}=\sum _{k=1}^{K}\,(1-{P}_{ik})=\sum _{k=1}^{K}(1-\frac{(\begin{array}{c}{N}_{i}-{N}_{ik}\\ {\rm{g}}\end{array})}{(\begin{array}{c}{N}_{i}\\ {\rm{g}}\end{array})})$$
(4)

where Pik is the probability that allele k does not exist in a random sample of g genes; Ni represents the total number of genes in the sample of a given subpopulation i and Nik represents the number of copies of the kth allele from that subpopulation.

The within-subpopulation component of allelic diversity was calculated according to the following formula75:

$${A}_{s}=(\frac{1}{n}\sum _{i=1}^{n}\,{a}_{i})-1$$
(5)

The average distance between all subpopulations is

$${D}_{A}=\frac{1}{{n}^{2}}[\sum _{i,j=1}^{n}\,{d}_{A,ij}]$$
(6)

where dA,ij is the average allelic distance between subpopulations i and j, and it is equal to

$${d}_{A,ij}=\frac{1}{2}\sum _{k=1}^{K}[(1-{P}_{ik}){P}_{jk}+{P}_{ik}(1-{P}_{jk})]$$
(7)

The total allelic diversity is estimated with the following equation:

$${A}_{T}={A}_{S}+{D}_{A}=[\frac{1}{n}\sum _{i=1}^{n}({a}_{i}+\frac{1}{n}\sum _{i=1}^{n}\,{d}_{ij})]-1=[\frac{1}{{n}^{2}}\sum _{k=1}^{K}\,\sum _{i,j=1}^{n}\,(1-{P}_{ik}{P}_{jk})]-1$$
(8)

The three estimators (AS, DA and AT) were calculated using Metapop v.2.0 software as described above (Supplementary Tables 1 and 2).

The results obtained by the molecular coancestry and allelic richness methods have some differences. However, we compared the estimated values based on both methods using paired t-tests, and found that they did not show significant differences for the three estimators (for sheep and cattle, ΔGDWS; t = 0.01, df = 96, P = 0.999 and t = 0.031, df = 52, P = 0.976; ΔGDBS: t = 0.2, df = 96, P = 0.842 and t = 0.286 df = 52, P = 0.776; and ΔGDT: t = 0.301 df = 96, P = 0.764 and t = 0.429, df = 52, P = 0.670, respectively). In order to obtain more accurate results, the average values of the estimates obtained using the two approaches were employed in the following statistical analysis. The average values of ΔGDWS, ΔGDBS and ΔGDT in sheep and cattle were divided into four fractions: <−0.1, −0.1–0, 0–0.1 and >0.1, and then plotted around the world using the software ArcGIS v.10.1 (ESRI Inc, Redlands, CA, USA) (Supplementary Fig. 4).

Analysis of the effect of regional economics on the conservation priority of different breeds

We further investigated the associations between the changes (loss or gain) in genetic diversity for each breed (or group of breeds) and the economies of the countries (or regions) from which the breeds originated. The average GDP per capita of the countries (or regions) during the years 1993 to 2013 was collected from the World Bank database (http://data.worldbank.org.cn/indicator). Because the 40 native Chinese sheep populations originated from 13 provinces, which differ greatly in their economies and employ different breeding and conservation strategies for domestic animals76, the GDP per capita for the provinces, rather than the whole country of China, was collected for the period from 1993 to 2013 (from the China Statistics Bureau; available at http://data.stats.gov.cn; Supplementary Table 5) and included in the analysis. The countries (or regions) were classified into three different categories based on their average GDP per capita (A-GDPPC) during these years: underdeveloped countries/regions (A-GDPPC ≤ $1,500); developing countries/regions ($1,500 ≤ A-GDPPC ≤ $15,000); and developed countries/regions (A-GDPPC ≥ $15,000). For sheep, the breeds came from 38 countries or regions: 9 were underdeveloped; 16 were developing; and 13 were developed. For cattle, the breeds came from 21 countries or regions: 10 were underdeveloped; 3 were developing; and 8 were developed (see Supplementary Tables 3 and 4). Subsequently, we calculated the loss or gain of diversity for each country/region when all of the breeds from the country/region were removed using the two approaches described above (Supplementary Tables 3 and 4). Differences in the loss or gain of genetic diversity averaged over sheep and cattle breeds in the three different economic categories were compared using the boxplots statistics in the boxplot package of R software v.3.4.377, and the P value was calculated using the Kolmogorov-Smirnov test78.

To evaluate the effect of economic development on the loss or gain of genetic diversity for the breeds, we performed regression analysis between the loss or gain of genetic diversity (within subpopulation, ΔGDWS; between subpopulations, ΔGDBS; and total genetic diversity, ΔGDT) and the increased average GDP per capita (ΔGDPPC) for 1993 to 2013 from each country (or region) using the dual-regression approach. The coefficient of determination (R2) and its significance (p-value) were calculated using SPSS v.22.0 software79.

Data availability

The data sets analyzed in the present study are available from the corresponding author on reasonable request.