In diploid organisms, single nucleotide differences observed between paternal and maternal chromosomes are called heterozygous sites1. With the continued development and cost reduction of high-throughput DNA sequencing and genotyping technologies, researchers have powerful tools for studying animal genomes through whole-genome molecular markers. However, little has been done to analyze genomic aspects of heterozygosity in livestock populations2, and the limited evidence suggests that increased heterozygosity over time may be attributed to selection3. Heterozygosity-rich regions (HRR), also known as runs of heterozygosity (ROHet), are a recently emerged analytical concept and refer to regions of consecutive heterozygous sites detected between paternal and maternal chromosomes in diploid organisms4. The analysis of HRR aims to identify genomic regions with high genetic variability, to provide information about the populations’ genetic diversity and evolutionary history5, as well as to identify specific segments in the genome where increased genetic diversity could be beneficial6,7. HRR may harbor loci associated to key functional traits such as immune response, survival rate, fertility and other fitness traits8 and avoid the deleterious effects of harmful homozygous genotypes9,10,11. Williams et al.4 first introduced the concept of HRR in livestock and suggested that some of these regions were under balancing selection and contained recessive lethal mutations in cattle. Along this line, further studies were subsequently carried out in cattle6,7, horses3, sheep8,12 and pigs9,13,14, but less commonly in goat species15. Goats (Capra hircus) were domesticated around 10,500 years ago in the Fertile Crescent16. After dispersion from its center of domestication, this species has undergone an intense adaptation process and occupied different agroecological areas worldwide17. Different breeds and goat strains have been selected for milk, meat and fiber production, playing an important role in the livestock sector around the world. Given the wide range of genetic variability and its ability to populate very different geographical areas and climates and produce in conditions of low anthropogenic input, the goat species is the best for studying genetic diversity and adaptation18,19. Thanks to these features, this species provides one of the most suitable models for understanding the patterns and distribution of HRR. Furthermore, Italy is the European country with the highest number of goat breeds, and it provides a precious reservoir of genetic diversity shaped by its varied history, environment, climate, and farming traditions20. Therefore, in this study, we investigated the genomes of Italian and worldwide goat breeds, to detect and characterize their HRR patterns and reveal regions of heterozygosity (HRR islands) which contain candidate genes related to specific traits.


Genetic diversity indices

The genetic diversity indices were calculated per breed and then averaged per geographical group (see Table 1 for details on the dataset). The summary statistics showed a high variability all over the breeds (see Supplementary Table S1). On average, Asian breeds revealed the lowest values of expected (HE = 0.340) and observed (HO = 0.332) heterozygosity, and the highest average level of inbreeding (0.243). Conversely, the Turkish breeds (KIL and KLS) deviated from the Asian trend and showed values in contrast (HO and HE higher than 0.396 and inbreeding lower than 0.091). Similarly, African breeds showed a mean HO of 0.371 and a mean HE of 0.366, but a notable lower FIS equal to 0.152. All European breed groups (Europe, Alpine arch, and Italy) reported average heterozygosity indices higher than 0.386 and average inbreeding coefficient lower than 0.121. Values of the Italy group of breeds spanned from ARG (HO = 0.416, HE = 0.412 and FIS = 0.051) to MNT_I (HO = 0.271, HE = 0.263 and FIS = 0.381) that also represented the range of the whole dataset. ARG, MLG, and JON were the breeds with the highest HO (0.416, 0.415, and 0.415, respectively), while RCC, ARG, and M×S showed the highest HE (0.414, 0.412, and 0.412, respectively). The lowest FIS was estimated in ARG (0.051) and the highest, excluding the island-isolated MNT_I, was reported in GIR (0.174). The MAF averaged per geographical group followed the trend of HE, showing the highest value in Italy group (0.308) and the lowest in the Asia group (0.261). RCC highlighted the highest breed value (0.328), while CAN the lowest (0.229).

Table 1 Dataset composition.

Genetic relationships

A representation of genetic relationships within- and between-breeds comes from multidimensional scaling analysis. Two different graphs of the same MDS analysis were generated: the first (Fig. 1) shows the breeds grouped according to their geographic breeding area as reported in Table 1, and the second (see Supplementary Fig. S1) represents all breeds separately. In Fig. 1, the first component (C1 = 29.88%) clearly separated the European and Italian breeds from the rest of the dataset: African and Asian breeds clustered according to their different breeding location, while European, Alpine, and Italian breeds partially overlapped and showed a north–south gradient of variation. Focusing on this cluster, the spatial breeds grouping highlighted the effective geographic distribution within each country and between countries, with the Southern, Center, and Northern (Alpine cluster) Italian breeds extending along a line. The second component (C2 = 11.25%) reported a partial overlapping between Spanish and Egyptian breeds, as well as between Turkish, Bezoar and the European macro cluster (Alpine, Italian, and French breeds). Considering both C1 and C2 components, the Spanish MLG breed and PYR from France slightly detached from the European macro cluster (see Supplementary Fig. S1). The Neighbor-Net based on pairwise Reynolds’ genetic distances also shows distinct clustering according to the different geographic distribution of breeds (see Supplementary Fig. S2). In particular, a phylogenetic divergence between European and non-European breeds was highlighted, with the MAL and MLG breeds at the basis of the separation. The Italian breeds highlighted a complex interweaving of nodes and positioned at the center of the Net. Alpine and French breeds clustered showing a certain degree of distinction from other European breeds. The Brazilian breed (CAN), MNT_I and breeds from Asia showed the highest degree of divergence with respect to the analyzed goat breeds.

Figure 1
figure 1

Multidimensional scaling (MDS) plot according to C1 (29.88%) and C2 (11.25%) components. The geographical clusters are represented by different shades of a color. See Table 1 for a full definition of the dataset.

Heterozygosity-rich regions

The HRR analysis was performed using two methods of investigation (CR and SW) and reported a total of 13,612 HRR for CR approach and 13,558 HRR for SW approach. Individuals reporting no HRR belonged to Asian breeds (3 animals for KAC, 1 animal for PAT, and 1 animal for TAP). Table 2 shows the descriptive statistics of HRR per breed and geographic cluster. Pearson correlation coefficients were calculated between average values (over the 49 breeds) of the four parameters (NHRR, LHRR, SHRR and DHRR) obtained with the two methods (CR and SW), reporting an r > 0.999 for NHRR, SHRR and DHRR and an r = 0.947 for LHRR. Based on the high overlapping of the HRR results and high correlation coefficients, we here referred only to the results of the CR approach that reported the highest total number of HRR. Figure 2 graphically presented the results for each parameter per breed and geographical group. The Alpine group had the highest average values within the dataset (ANHRR = 12.66, ALHRR = 0.51, ASHRR = 6.38 and ADHRR = 0.0026), with ALP and SAA showing high values above the average both within the group and in the entire dataset. Similar averages were exhibited by the breeds of the Europe group, with MLG breed showing the highest values. The Italy group had mean values of the HRR parameters close to those of Alpine and Europe and showed MON and M×S with the highest values, while MNT_I and GIR were the ones with the lowest. The rest of the goat breeds analyzed showed a fair level of rich heterozygosity and values not far from the group means (ANHRR = 11.59, ALHRR = 0.50, ASHRR = 5.80 and ADHRR = 0.0024). Both African and Asian breeds reported low values for all parameters. Within the first group, breeds showed comparable average values, while in the second group, KIL and KLS breeds from Turkey differentiated and showed values above the average.

Table 2 Descriptive statistics for Heterozigosity-rich regions (HRR) by consecutive runs (CR) and sliding window (SW) methods.
Figure 2
figure 2

Boxplots of the Heterozigosity-rich regions (HRR) indices aggregated by geographic area. (a) NHRR = average number of HRR per breed, (b) LHRR = average length of HRR per breed (in Mbp), (c) SHRR = mean genome length covered by HRR segments (in Mbp), (d) DHRR = average coefficient of diversity. The different colors indicate the two approaches, CR (blue) and SW (light blue). Each boxplot extends from the 25th to the 75th percentile and shows the average (red dashed line) and the median (black horizontal line) group value. The black dots represent the single-breed results.

Heterozygosity islands and gene annotation

According to the filter applied for the identification of HRR islands, 1,665 SNPs were considered to form a total of 166 HRR islands (see Supplementary Table S2). The average number of islands on the overall dataset was 3.39 per breed, and the highest number of HRR and islands was found in MNT_I (103 and 16, respectively), while no HRR islands were detected for the BRK breed. To explain the workflow in HRR detection, we report the results of SAA. The 0.1 percentile threshold for cutting off the SNP-in-HRR p-values allowed the detection of 164 SNPs for both CR and SW. These markers identified 89 in-breed HRR, but only 21 exceeded the frequency (20% within-breed) and length (minimum 4 SNPs) thresholds, for a final detection of 3 HRR islands. Table 3 reports details of each HRR island including genomic coordinates, breeds, and annotated genes. CHI1, CHI11, CHI12 and CHI18 were the chromosomes harboring the highest number of HRR islands (sum of breeds' HRR islands), reporting a total of 33 (3 unique), 16 (4 unique), 25 (4 unique) and 36 (4 unique) islands, respectively. Several breeds had HRR islands in common. Notably, 31 (including 22 Italian breeds) out of 49 breeds shared an island in CHI1 (CHI1-A from 131.88 to 132.54 Mb). Similarly, a portion of the genome ranging from 36.36 to 40.01 Mb on CHI18, including two HRR islands, was largely shared by several European breeds: 19 breeds had the CHI18-A island in common, and 13 breeds the CHI18-B. Finally, 15 breeds with different geographic distributions reported a common HRR island in CHI12 (CHI12-A from 49.80 to 51.28 Mb). All HRR islands are plotted in Fig. 3, highlighting the chromosome and breeds involved. The gene enrichment performed on genes annotated within HRR islands revealed a total of 74 different processes, mainly biological processes involved into regulation of metabolic processes (CHI12-A), sensor organ development (CHI11-C) and activation of immune response (CHI8-C, CHI16-D) (see Supplementary Table S3). No results were found for CHI1-A shared island.

Table 3 Distribution of HRR islands per chromosome.
Figure 3
figure 3

Graphical distribution of Heterozigosity-Rich Regions (HRR) islands per chromosome. Detected HRR in the meta-population (a) and per breed (b). The gap between consecutive islands within a chromosome and between consecutive chromosomes do not correspond to the real distance in bps.


In livestock, selective breeding tends to progressively decrease the diversity and resilience of target breeds21. Locally reared breeds still seem to maintain their rustic characteristics thanks to the lower anthropic impact22. Goats are well known for being the most suitable livestock species for adapting to harsh environments18,19. In particular, local goat breeds represent a way to the sustainability of animal production in marginal areas in both developed and developing countries18,23. The domestication process has triggered different evolutionary trends, which over the centuries have led to the development of several breeds with different productive aptitudes. This study aimed to provide a general overview of the relationship among the studied goat breeds and to highlight their HRR patterns as repositories of advantageous gene diversity related to adaptation and resilience processes6,7. Identifying a species-specific HRR pattern could provide a clearer view of the mechanisms regulating the genome modelling following anthropogenic selection combined with environmental interaction.

Genetic diversity indices

The high degree of variability in the dataset reflected the expected biodiversity19,23,24 among different breeds. The choice of the dataset, composed of goat breeds claiming a general dairy productive orientation, led to an evident imbalance between the European panorama and the rest of the geographical groups. Therefore, the average diversity values per group need to be considered a function of the notable numerical disparity between European goats and breeds from other parts of the world. Colli et al.19 showed a pattern of observed heterozygosity in worldwide goat breeds that matched our results. They highlighted a clear association of this parameter with geography reflecting climatic conditions and breeding management. Therefore, the higher values are probably caused by admixture events between breeds reared in areas with extensive practices (i.e., transhumance in South Italy). The two Sicilian breeds—Argentata dell’Etna and Messinese—represent a known case of occasional admixture linked to the shared breeding area, which impacts on high values of both HO and HE25. Despite their status as cosmopolitan breeds and notoriously more subject to selection, the Saanen and the Camosciata delle Alpi showed relatively high heterozygosity and low molecular inbreeding, proving the proper management of their selective plan. Notably, the feral Montecristo goat resulted in the lowest heterozygosity of the whole dataset, likely reflecting a strong inbreeding due to its geographic isolation. Somenzi et al.26 have retraced the evolutionary history of this feral population, suggesting repeated bottleneck events and founder effects that characterized the demographic history of the insular goat.

Genetic relationships

Genetic relationships analyses pointed out the differences between European and non-European goats, likely as a consequence of gene pool divergence between domesticated breeds and ancestral populations from the regions of the Fertile Crescent19. We can also assume that physical barriers, such as long distances and mountain ranges, have marked this divergence. Moreover, bias related to the design of the Illumina chip panel must be considered to explain the clear divergence of the two macro goat clusters24,27. The Multidimensional-scaling analysis highlighted the high cohesiveness of the European breeds in their totality (Mediterranean, Central-Europe and Alpine goats) when compared with the rest of the data set. The north–south geographic cline and the greater homogeneity of some breeds such as the Malaguena and the Pyrenean, were particularly evident. More significant variability among the different breeds of the other geographic groups was found (e.g., Pakistani vs Turkish goats, Ethiopian vs Egyptian goats vs Malian goats), probably due to several factors involving the diverse histories of local communities, the different levels of gene flow affecting breeds from different geographic areas (e.g. Pakistani vs Turkish goats)19,28, the eventual influence of the European genome29, as well as the original complexity of the African and Asian goat stock origin30,31. The phylogenetic nodes of the Neighbor-Net based on pairwise Reynolds’ genetic distances better represented the genetic relationships among breeds, confirming the genetic closeness of the Europe, Alpine and Italy geographic groups and their divergence from the African and Asiatic strains. The Montecristo goat showed a close relationship to the other insular goat (Corse) and genetic proximity to the Italy group possibly because of the recent inputs of domestic stocks (twentieth century) as already reported in a previous study26.

Heterozygosity-rich regions

HRR are heterozygous genomic regions which are potentially associated with disease resistance, immunity, and adaptation processes. The higher the level of diversity, the better the biological response of species to environmental changes or new diseases. Consecutive Runs and Sliding Window are the two approaches currently used to detect stretches of consecutive homozygosity (ROH) and HRR in livestock species e.g.3,7,8,32,33. Although only few studies have focused on identifying HRR islands with both methods of detection, it seems that the use of the Consecutive Runs approach is preferred3,7,15. In this study, we investigated both methods, highlighting the overlapping results for HRR identification. Strong correlation between CR and SW was estimated for all the parameters, indicating nearly identical results for all breeds. In a similar study on HRR in pigs, Bordonaro et al.14 found a Pearson correlation coefficient higher than 0.96 for all four parameters (NROH, LROH, SROH, FROH), confirming the obtaining of overlapping results between the two approaches. As mentioned by Biscarini et al.6, the choice of the parameter values is of fundamental importance in detecting ROH/HRR because of its effect on results. The information present in the scientific literature regarding the setting of the detection parameters is not very detailed. According to Mulim et al.7, the allowed number of opposite and missing SNP and the gap between two consecutive markers are the factors that mainly affect the detection of runs. In our preliminary screening, we evaluated the number of HRR found by testing several combinations of parameters (the minimum number of SNPs in a run, the minimum length, and the number of missing/opposite markers). The minimum number of SNPs in a HRR (minSNP) and the maximum number of SNP with opposite genotype in a window (maxOppWindow) were the parameters that most influenced the number of HRR found in SW method, while for the CR method it was the number of opposite genotypes allowed in a HRR (maxOppRun)34. Selli et al.8 conducted a methodological study on several combinations of detection parameters in Sliding Window approach to investigate HRR. They found that a reduction in the minimum number of SNPs in a run (minSNP) and in a window (windowSize), as well as the number of homozygous and missing SNPs allowed (maxOppRun and maxMissRun), caused a decrease in HRR found, whereas a reduction in the minimum length of the run (minLengthBps) caused an increase in the number of HRR. Similarly, minSNP and windowSize positively affect the HRR length. In this scenario, it is unclear how different combinations might affect the results, and further analyses are needed6. In general, the within-breed occurrence of HRR is lower than that of ROH2,3,6. Indeed, we found an average number of HRR per breed lower than 18, while Cortellari et al.23 found an average number of ROH of 31.5 in the same Italian breeds data. Other studies on HRR reported an average of 9.5 in semi-feral Chillingham cattle population known to be characterized by a strong inbreeding4, less than 9 in the italian Maremmana cattle breed6, 121.5 in Pinzgauer cattle breed2, 57.8 in commercial turkeys35, 52.2 in a local horse breed3, ~ 40 in Duroc pig9 and 139.6 in sheep8. These notable differences may be attributed to the species of interest, to the breeds analysed, and to the different SNP arrays and parameters used for the HRR detection. All the cited studies also found that HRR were much shorter than ROH, confirming our general mean of ~ 500 kb in length and reflecting low coverage of the entire genome (maximum of 6.38 Mb for the Alpine breed group). These results might be related to the absence of missing and opposite SNPs in the detection approach, that usually reduces the size of detected ROH6 and, similarly, of HRR. We highlighted a notable difference between the results obtained for European (Italy, Alpine, Europe groups) and non-European clusters (Africa, Asia, Brazil). We assume that this result, also reported by Li et al.15 in chinese goats compared to worldwide breeds, could come from ascertainment bias related to the SNP array design based on European goat breeds24,27. Worth of note are the outcomes of Maltese × Sarda (M×S) goats. This population represents the only crossbreed sample included in the data set and the high values for all the indices were expected and probably related to the combination of relatively different genomes.

Similarly, Mulim et al.7 reported the borderline case of the Montana cattle that combines different genomes due to the crossbreeding between Bos taurus indicus and Bos taurus taurus and, consequently, showed the highest amount of HRR in their dataset.

Heterozygosity islands and gene annotation

The identification of recurrent HRR for each breed highlighted wide regions of shared heterozygosity in the chromosomes CHI1, CHI11, CHI12, and CHI18. Williams et al.4 observed that heterozygosity-rich regions in the Chillingham cattle genome appear to be not randomly distributed but rather grouped in particular chromosomal locations. Moreover, some of the detected heterozygosity islands had already been recognized as homozygosity islands in previous studies15,36. Classifying the same chromosomal regions as both homozygosity-rich and heterozygosity-rich emphasizes the diversity among breeds within the goat species. The different breeding strategies and the different production orientations15 rather than the adaptation to different environments have differentially shaped the genome of breeds14. CHI1-A (131.88–132.54 Mb) was the most shared island within the whole dataset. STAG1 and PCCB genes, mapped within CHI1-A, are linked to reproduction-related traits and were reported to experience balancing selection in goat breeds, playing a significant role in the domestication of the species15. Notably, this HRR island was not present in either Bezoar or Montecristo, the two wild goat populations in the dataset. The second most common HRR island was CHI18-A (36.36–37.20 Mb). It was detected mainly in Italian and Alpines breeds, matched a ROH island found in goats from Europe, Africa and Asia by Bertolini et al.36, and it was reported as a possible signature of selection for fiber production with a low number of SNPs supporting this finding37. This region harbors genes involved in productive mechanisms in other species, such as the RIPOR1 gene for feeding intake control in the Nellore cattle breed38 or the PSKH1 gene listed as a candidate gene for mean staple length, live weight, greasy fleece weight, and mean fiber diameter in Merino sheep39. The EDC4 gene was reported to mediate the post-transcriptional regulation of IL-6 cytokine playing an important role in immune response40. The nearby island CHI18-B (39.56–40.01 Mb) was detected in 7 Alpine breeds out of 10 and in 5 breeds of the Italy group, allowing us to hypothesize a possible correlation to a well-defined geographic area. This short region harbors the ZFHX3 gene, also known as ATBF1, which has a well-known function in humans as a transcription factor that regulates myogenic and neuronal differentiation, with similar effects on Chinese native goat breeds where its polymorphisms influence animal growth rate41. As far as found in literature, a study on 3 Chinese local goat breeds confirmed the association of this gene with development traits42. In our results, we did not find this HRR island in any of the six Asian breeds. Bertolini et al.36 found two ROH regions in goat chromosome 12 (from 43 to 44 Mb in European breeds and from 50 to 51 Mb in worldwide goats) which overlapped two of our HRR hotspots CHI12-B (43.91–44.47 Mb) and CHI12-A (49.80–51.28 Mb), harboring genes that are related to ectodermal, nervous system, and hearing function (GJB2 and GJB6)43,44 as well as gonad development (SAP18)45. CHI12-A was also reported as a ROH hotspot by Li et al.15 in Chinese and other worldwide goat breeds; the authors underlined this region in common with a ROH hotspot in Chinese sheep breeds46 and possibly under parallel selection between goats and sheep species before the domestication due to the presence of a series of genes (GJA3, GJB2 and GJB6) associated with perception senses, such as sight and hearing essential for surviving in harsh environments (lack of food and presence of wild enemies). These genes were also identified under positive selection for growth (body size, skeletal and embryonic development) in Boar goat47 and in Barki goat and sheep48. Signatures of selection localized in CHI12-A (49.80–51.28 Mb) were reported in goats by Serranito et al.22 which listed a series of annotated genes involved in the adaptation of small ruminants to arid environments (RNF17, ATP12A, GJB2) and high altitude (ATP12A, PARP4, ZMYM5, PSPC1, CENPJ, GJB2). Moreover, the interleukin17D (IL17D) gene, spanning from ~ 50.90 to ~ 50.93 Mb and belonging to the IL17 family of cytokines, was closely associated to host defense and immune response in humans49,50. This region perfectly matched the one found in 5 commercial and local goat breeds by Biscarini et al.51. CHI11-A (37.76–38.56 Mb) was a shared island among 11 breeds, mainly from the Alpine and Italy groups. This region overlapped the ROH island found by Bertolini et al.36, which detected a signature of selection in a previous study37. In our dataset, the two cosmopolitan breeds (Saanen and Camosciata delle Alpi) reported this HRR island in which several mapped genes have a role in livestock production. In particular, the PPP4R3B gene was associated to thermotolerance in the African N’Dama cattle breed52; Berihulay et al.53 found this gene involved in gluconeogenesis and lipidic metabolism in Abergelle goats. The PNPT1 gene was reported implicated into RNA transport in pigs53 and Bandur sheep54. Previous studies linked the EFEMP1 gene with traits of interest in livestock species. Zhang et al.55 identified this gene as a significant influencer of the oleic acid content in the meat of the Wagyu × Angus cattle breed. Or else, EFEMP1 was previously identified as a differentiated expressed gene in muscle for residual feed intake in Nelore steers and seems to be regulated by a transcriptional factor (TCF4)56. The worth of notice are the islands in CHI8, even though they are present only in few breeds: CHI8-A (74.95–75.32 Mb) and CHI8-C (39.05–39.29 Mb) seem to have a strong correlation with immune response. In particular, the AQP3 gene in the first island was found in a ROH hotspot in Ganxi and Guangfeng goats by Li et al.15 who stated its involvement in immunoreactions. The CD274 and JAK2 genes in CHI8-C were reported in several biological processes for immune response in our gene enrichment results.

Heterozygosity islands in feral Montecristo goat

The evolutionary history of the feral Montecristo goat, geographically isolated in the homonymous island, subject to repeated bottleneck phenomena, and in the absence of anthropic management, albeit with a partial introduction of domestic germplasm26, represents an outlier case in the analyzed dataset. Moreover, notwithstanding the highest inbreeding coefficient (FIS) and the lowest heterozygosities (HO and HE) of the whole dataset, Montecristo goat reported the highest number of HRR. Events of balancing selection characterizing this feral goat population might be assumed during the shape of its genomic structure. Despite a high inbreeding, these events may have led to an increase in short segments of heterozygosity, that are strongly associated with fitness and survival traits2,12. Indeed, FAF1 gene in CHI3-A (24.83–25.57 Mb) was highly associated with tolerance to Theileria infection in African cattle57. On chromosome 3, HRR islands (CHI3-C and CHI3-D) harbored two genes (SPATA1 and ASH1L) connected to reproduction traits in livestock species58,59,60. Moreover, SYT11 may play an important role in adaptation to different environmental conditions in Kenyan goat breeds61, while ARHGEF2 gene’s function in livestock is still unclear, but may be involved in epithelial barrier permeability affecting host-microbial interactions in the rumen62. The HRR island in CHI6-A (23.81–24.43 Mb) was also found as a HRR region in MNT_I by Somenzi et al.26, which attributed a reproduction function due to the presence of PPP3CA gene associated to fecundity traits and litter size in small ruminants. The HRR islands in CHI8-B (38.47–38.69 Mb), in CHI15-A (65.98–66.54 Mb), and CHI15-B (72.58–73.07 Mb) showed other candidate genes related with immune response traits such as RANBP663, GUCY1A264, PIWIL465, and CNTN566. The IL31RA gene in CHI20-A (23.34–23.95 Mb) is a cytokine receptors known to be involved in human inflammation and allergic diseases67. Moreover, SLC38A9 gene in the same HRR island was identified as an indicator of heat stress in bovine68. Finally, the last gene worth of interest is EML1 in CHI21-C (64.17–64.51), which plays a role in processes of exocytosis (the process of releasing vesicle content to the extracellular environment) in the molecular pathway of Ab production by B lymphocytes69. However, we cannot exclude that the high number of HRR islands found in the feral goat Montecristo derives from the bias linked to the chip design based on the domesticated goat genome.


This work studied the genetic relationships among goat breeds with dairy aptitude and investigated their continuous heterozygosity patterns. We confirmed a notable divergence between the European lineage and the Asian and African breeds. European goats have shown a clear north–south geographic cline and genetic interconnections probably due to gene flow between geographically close breeds. The investigation of heterozygosity-rich regions highlighted specific portions of the goat genome associated with different functional factors. The distribution of the HRR islands in the goat genome seems to be mainly related to the breeds' geography. A species-specific HRR pattern, possibly shaped by adaptation mechanisms, might provide a clearer view of the mechanisms modelling the genome as a function of anthropic selection. Interestingly, some heterozygosity hotspots showed overlap with ROH islands reported in the literature. This possibly indicates genomic areas targeted by selective breeding which has shaped the goat genome differently in different production contexts. The methodological part of the manuscript gave new insight into the standardization of HRR detection, highlighting the point of overlap between two different detection methods. Further investigation is needed to improve the parameter settings of HRR detection, possibly involving different species and breeds genotyped with different densities of SNP arrays.


Dataset and quality control

A dataset including worldwide goat breeds was generated by merging the Italian Goat Consortium2 (IGC2) and the ADAPTmap repositories described in Cortellari et al.20 and Stella et al.18, respectively. The breeds involved in the analyses were selected for their milk-production aptitude, excluding those for meat or fiber, and grouped according to their geographic breeding area, resulting in a reduced dataset comprehensive of 1,289 individuals belonging to 49 goat breeds (Table 1). Italian breeds were divided into two different groups (Southern and Center breeds labelled as Italy, while Northern breeds as Alpine arch) due to their isolation and divergent morpho-aptitude factors. The random sampling selection procedure implemented in the bite r package70 was used to select a maximum of 30 representative individuals per breed. All animals were genotyped with the Illumina GoatSNP50k BeadChip, including a total of 53,347 SNPs24. Chromosomal coordinates of SNPs were referred to the ARS1 genome assembly (Assembly GCA_001704415.1) using the commands --update-map and --update-chr. The software PLINK ver. 1.971 was used to perform filtering and quality control. After removing the unmapped SNPs and markers located on sexual chromosomes, the additional quality parameters were set as follows: minor allele frequency ≥ 0.05 (--maf 0.05), SNP genotype call rate ≥ 0.95 (--geno 0.05) and individual genotype call rate ≥ 0.80 (--mind 0.20), resulting in 48,544 SNPs and 1287 goats.

Genetic diversity indices

The software Plink ver. 1.971 was used to estimate the observed and expected heterozygosity (HO and HE), the molecular inbreeding coefficient (FIS) and calculate the minor allele frequencies (MAF) for each breed, using the commands --hardy, --het and --freq respectively. Genetic diversity indices values were averaged per geographical group.

Genetic relationships

To explore the genetic relationships among breeds, the multidimensional scaling (MDS) analysis was performed based on pairwise identity-by-state (IBS) distances between individuals (--read-genome --cluster --ppc 0.0001 --mds-plot 30 eigvals) using Plink ver. 1.971. In addition, the Neighbor-Net based on pairwise Reynolds’ genetic distances was obtained using software Arlequin ver. and plotted using SplitsTree4 ver. 4.14.873.

Heterozygosity-rich regions detection

Heterozygosity-rich regions (HRR) were investigated using the R package detectRUNS ver. 0.9.633, by two different methods of detection: Sliding Window (SW)74 and Consecutive Runs (CR)75. The absence of clear guidelines from literature led us to test range of values for the detection parameters34. Screening involved several combinations, varying the minimum number of SNPs in a HRR (10–25), the minimum length (103 bp to 106 bp), and the number of missing/opposing markers (0–3). The evaluation criteria to choose the parameters setting were based on (i) avoiding missing or opposite genotype per run; (ii) obtaining a comparable number of HRR between the two methods (SW and CR). The following parameters were finally set: (i) the minimum number of SNPs included in the HRR was 10; (ii) the number of missing or opposite genotypes were set to zero; (iii) the maximum gap between consecutive SNPs was set to 1 Mb; (iv) the minimum HRR length was set to 250 kb. Additional parameters were set for the SW method: v) sliding window of 10 SNPs for HRR; vi) no missing or opposite genotypes were allowed in the window; (vii) the minimum density of one SNP every 100 kb; (viii) the threshold to call a SNP within a HRR was set to 0.05. The HRR pattern per breed was investigated through the average HRR number (NHRR), the average length of HRR in Mbp (LHRR) and the mean genome length covered by HRR segments in Mbp (SHRR). Finally, the total length of the genome covered by HRR per breed was divided by the total autosomal genome length covered by the SNP array (~ 2.4 Gb) to evaluate the coefficient of diversity (DHRR)9,14. In addition, the four descriptive breed-statistics were aggregated by geographic area (ANHRR, ALHRR, ASHRR and ADHRR).

Heterozygosity islands detection and gene annotation

HRR islands per breed were identified comparing the results of SW and CR methods of detection. Based on the standard normal z-score, obtained from all the SNPs-within-HRR incidence per breed and method, p-values were calculated and the top 0.1% of SNPs were selected to constitute an island14,76,77. Only those markers identified simultaneously with both methods were considered. HRR islands in a breed were considered those regions with frequency ≥ 20% and a minimum of 4 SNPs. The genomic coordinates of HRR islands were examined using the Ensemble browser for the goat genome, according to the assembly ARS1 (Assembly GCA_001704415.1) to retrieve annotated gene lists. Gene Ontology (GO) and the enrichment analysis of annotated genes were conducted for each island using the open-source Database for Annotation, Visualization, and Integrated Discovery ver. 2021 package78. For the GO terms and Kyoto Encyclopaedia of Genes and Genomes (KEGG) pathway analysis, the level of significance for the enriched biological processes was set as p-value < 0.05. Corrections for multiple testing were made by applying the Bonferroni test.