Genomic information is currently used in animal breeding programs to enable selection for difficult to measure traits, increase the overall rate of genetic gain, and to improve the understanding of genetic and biological causes underlying phenotypic variation. Genomic selection (GS) is an approach which uses genome-wide markers simultaneously to predict breeding values1. This approach has been shown to increase the rate of genetic gain when pedigree-based selection is suboptimal1, which is the case for lowly heritable traits. For instance, GS based on simulated data showed an increase in reliability of breeding values for young animals when using genomic (r2 > 60%) versus parent average (r2 = 32%) information, equivalent to approximately 20 offspring2. Furthermore, genetic gain can be increased using genomic information by shortening the generation interval1. Alternatively, genetic markers scattered across the genome offer an opportunity to conduct genome-wide association studies (GWAS) to characterize genes underlying genetic variation for traits of interest.

The success of GS and GWAS are dependent on linkage disequilibrium (LD) or gametic disequilibrium between the markers and causal mutations3 because generally only the markers are observed and the casual mutations are unknown. The LD between a marker and a causal mutation can be considered as the proportion of causal mutation variance that can be captured by the marker variance4,5. Through the knowledge of the degree of LD, it is possible to define the density of genetic markers necessary to achieve a certain accuracy of prediction and to determine when the estimates of genetic marker effects should be updated. It has been well documented that simply increasing marker density does not improve prediction accuracies. Although increased marker density improves resolution, it can also decrease power and add noise to the analyses by the use of non-informative SNP. Furthermore, increased marker density can dilute individual marker effects if, for example, two markers are associated with the same QTL and the two markers are in high LD with each other.

LD is defined as a non-random association between alleles at different loci6, and it is commonly represented by |D′| and r2 metrics7. The extent of LD can vary between and within species due to evolutionary history and population structure mainly characterized by insertions, deletions, chromosomal rearrangements, or inversions4. This association between markers and causal mutations may change overtime due to recombination and selection4 necessitating the re-estimation of marker effects.

Estimates of LD have been reported in ovine for some domestic pure and crossbred populations, as well as in wild sheep by using microsatellites and SNP markers4,8,9,10,11,12,13,14. Nevertheless, there are few studies that report LD estimates for Brazilian Santa Inês sheep using SNP. Ovine populations have retained a relatively high level of genetic diversity, unlike bovine, which justify the importance of LD mapping in many breeds within species15. Moreover, LD estimates between different breeds can be informative relative to the overall diversity level in a species and the selection level applied to them.

Therefore, the aim of the current study was to characterize LD structure in Brazilian Santa Inês sheep for the first time, given its commercial importance for meat production, reproductive efficiency, and tropical adaptation in Brazil, and compare the LD observed in the Santa Ines breed with other breeds. Beynon et al.16 mentioned the importance of studies focused on breeds as a chance to identify variation and understand the biological mechanisms that enable these breeds to survive in different local environments.

Many studies have evaluated imputation accuracy17 and the accuracy of genomic estimated breeding values using different marker panel densities in sheep18,19,20. The appropriate panel density could be specific to each species and breed depending on overall LD structure. Unfortunately, the current genotyping costs in sheep are greater than the economic value of breeding animals21. Consequently, we also aimed to provide an estimate of the marker density required for genomic studies in the Santa Inês breed.

Results and Discussion

Descriptive statistics

After quality control (QC), 38,168 autosomal SNPs remained comprising approximately 53% of the entire panel. The SNPs retained after QC spanned a total of 299.63 megabases (Mb) of the genome, with a mean (standard deviation) distance between adjacent SNP of 0.07 (0.075) Mb. This value was close to that obtained by Liu et al. in Spanish Churra sheep (0.06 Mb)14. SNPs were evenly distributed throughout the genome as the distances between adjacent markers ranged from 0.064 to 0.085 Mb. The chromosomes differ in size and SNP quantity, with chromosome 24 being the smallest in size - OAR24 (44.21 Mb). Liu et al.14 observed a similar behavior considering the same SNP panel (OAR24- 44.85 Mb), with OAR24 being the smallest chromosome (44.85 Mb) whereas the OAR2 was the largest (263.11 Mb). The number of SNPs per chromosome was proportional to the size of each chromosome. Descriptive statistics of the SNP and LD (r2 and |D′|) for each chromosome are presented in Table 1.

Table 1 Descriptive analyses, MAF, F, Ne,and average linkage disequilibrium (r2 and |D′|) between adjacent and all pairwise SNP pairs by chromosome.

In addition, 35% of the SNPs (18,716) had minor allele frequency (MAF) lower than 0.20, with a mean MAF over all SNPs of 0.35. According to another sheep study, 33% of the SNPs had MAF lower than 0.2022. Extending our comparison to other species, the mean MAF was relatively higher than those found for Bos taurus indicus, with values ranging from 0.19 to 0.2523,24. The MAF is important because LD, independent of the metric used, is a function of allelic frequency. In general, low MAF may correspond to a larger difference in allele frequency of coupled alleles, which can result in lower estimates of LD as measured by either r2 or |D′|25. Consequently, applying QC and the choice of QC criteria can affect the distribution and extent of LD6.

Inbreeding coefficient and effective population size

For a better understanding of the population described in this study, inbreeding coefficient (F) and effective population size (N e ) were estimated for all chromosomes together and for each chromosome separately, using genomic information. The estimate of F was 0.04, a relatively low coefficient for a population that originated from the same commercial herd. Using pedigree information to estimate the inbreeding coefficient, Pedrosa et al. found values equal to 0.02 in the Santa Inês breed26. Al-Mamun et al. found average inbreeding coefficients for Merino, Border Leicester and Poll Dorset equal to −0.013, 0.09 and 0.02, respectively13. A recently published study in ovine found average inbreeding coefficients based on excess of homozygosity (standard deviation- SD) of −0.008 (0.031), ranging from −0.079 to 0.30112. Compared with Kijas et al.11 and Liu et al.14, the F estimated from the Santa Inês breed was lower. Negative inbreeding coefficients occur when the number of observed homozygous loci is lower than the expected, suggesting that the population is more heterogeneous than expected, perhaps due to the composite nature of the breed.

In the N e estimation process, genetic distance between markers was estimated by a fixed ratio across the whole genome of one Mb per centiMorgan (cM). Prieur et al. evaluated three different methods to transform the genetic distance in ovine, and concluded that the estimation process using CRIMAP software (v2.503) was more accurate27. However, Prieur et al. also verified that the ranking for r2 and N e between breeds were not affected by the method used and mentioned that the LD estimator was not different between methods27.

The N e estimated herein was 96 in the current generation. Kijas et al.15 observed N e equal to 520 in the Brazilian Santa Inês breed, however, in their study only 47 animals were used. Pedrosa et al. also estimated N e using pedigree information and found a relatively low value (76) in Santa Inês26. These differences in N e can be due to the number of animals used (395 vs. 47 vs. 17,097) and the source of relationship information (genomics vs. pedigree). Al-Mamun et al. found values of N e ranging from 140 (Border Leicester breed) to 348 (Merino breed)13. Brito et al.12 found values of N e in the most current generations in multi-breed sheep populations ranging from 125 to 974. Using a Spanish Churra sheep population, García-Gámez et al.28 and Chitneedi et al.29 estimated N e equal to 159 and 83, respectively.

The presence of artificial selection in the population under study was verified through the reduction of N e over the generations. In this study, N e ranged from 1,705 to 28,191 between 16 and 296 generations, respectively, before the current generation. Mastrangelo et al. estimated the N e at 295 generations ago to be 747 animals in Barbaresca sheep30. Liu et al. observed N e equal to 4,472 and 160 at 2,000 and 5 generations ago, assuming that one Mb is equivalent to one cM14. Brito et al.12 reported estimates of effective population size of 5,537 animals 1,000 generations ago to 687 in the most recent generation. We hypothesize that the large difference in N e between the current and historic generations could be because the breeds that comprise the composite breed of Santa Inês were divergent historically and, thus, these estimates include multiple divergent breeds. The Santa Inês breed is relatively new, having only begun in the 1950s by non-systematic crossing of the Brazilian Somali, Bergamasca and Morada Nova breeds31. This illustrates that the large estimates of historic N e reflect time points before the formation of the breed, and even before the domestication of ovine.

We also estimated the N e for each chromosome. Chromosome 6, OAR6, exhibited the smallest N e , which was in contrast to the results of Liu et al. that reported the smallest N e for OAR1014.

Linkage disequilibrium analysis between adjacent SNPs

The average (SD) r2 and |D′| values estimated between adjacent SNPs from the 26 autosomal chromosomes were 0.166 (0.2189) and 0.617 (0.3349), respectively. Using the dairy sheep breed Frizarta, Kominakis et al. estimated r2 and |Dʹ| equal to 0.18 and 0.50, respectively, at an average inter-marker distance of 0.031 Mb32. Mastrangelo et al. observed average r2 (SD) in Sicilian sheep equal to 0.155 (0.2040)33. Al-Mamun et al. also reported LD estimates from multiple domesticated sheep (Ovis aries) breeds including: Merino (MER), Border Leicester (BL), Poll Dorset (PD) and crossbred populations (i.e., F1 crosses of Merino and Border Leicester (MxB) and MxB crossed to Poll Dorset (MxBxP)). The authors used the same genotype panel but adopted a different data quality control (MAF < 0.01) and reported a mean r2 of 0.12 (MER), 0.20 (BL), 0.19 (PD), 0.13 (MxB) and 0.13 (MxBxP); and mean |D′| of 0.52 (MER), 0.72 (BL), 0.69 (PD), 0.54 (MxB) and 0.55 (MxBxP)13. In the Barbaresca sheep breed, the mean r2 across autosomes was 0.215, with an average distance between adjacent SNP pairs of 0.063 Mb30.

A study published with multi-breed sheep reported mean (SD) r2 of 0.26 (0.100)12. The estimates of r2 are relatively consistent across sheep populations, with the exception of larger r2 values reported by Brito et al. Nevertheless, we should consider that the distance between markers was much shorter in Brito et al. than herein (4.74 kb versus 70 kb in the present study), which can be one reason for the increase in r2. Additionally, Brito et al. reported LD levels less than 0.10 for SNP located more than 0.04 Mb apart12. A recent study from Michailidou et al. observed a mean r2 equal to 0.121, 0.098, and 0.092 in Boutsko, Chios, and Karagouniko, respectively, with the average intermarker distance 0.27 Mb for all breeds34.

Sheep populations have been associated with lower levels of LD in comparison to other ruminant and nonruminant species. Although the comparison between species is difficult due differences in genome size as well as the quality control applied, mean values between adjacent SNPs of 0.32 (r2) and 0.69 (|D′|) were estimated from the Australian Holstein-Friesian cattle population using 9,195 SNP with the mean SNP distance equal to 0.25 Mb6. The mean r2 for pigs of Landrace (87 animals), Yorkshire (96 animals), Hampshire (78 animals) and Duroc (90 animals) breeds were 0.36, 0.39, 0.44, and 0.46 estimated from 40, 144, 39, 110, 32, 370 and 34,129 SNP spaced at average distances of 0.06, 0.06, 0.07, and 0.07 Mb, respectively35.

The average LD (SD) between adjacent SNP within the same chromosome ranged from 0.135 (0.1972) to 0.194 (0.2423) for r2 and 0.568 (0.3391) to 0.650 (0.3368) for |D′| (Table 1). Chromosomes 6, 11, 12, 14, 17, 20, 21, 23 and 24 had lower average LD using r2 lower than the 0.16 threshold24. Considering r2 metrics between adjacent SNPs, chromosomes 2, 10 and 16 had higher levels of LD compared to other chromosomes. The high level of LD present on OAR10 was similar to that observed by Al-Mamun et al.13.

Linkage disequilibrium analysis among all pairwise SNPs

The average (SD) for r2 and |D′| estimated between all pairwise SNPs on the 26 autosomal chromosomes were 0.018 (0.032) and 0.225 (0.213), respectively. In a study which used microsatellite markers to evaluate LD using chromosomes 1–10 of domestic sheep (Ovis aries) with mean distance between markers ranging from 10 to 40 Mb, a mean (SD) value of 0.211 (0.004) for |D′| was estimated10. Al-Mamun et al. who also used domesticated sheep (Ovis aries), found mean r² between all pairwise SNPs (0.05 Mb mean distance) of 0.007 (MER), 0.013 (BL), 0.018 (PD), 0.009 (BxM) and 0.012 (BxMxP); and mean |D′| of 0.168 (MER), 0.29 (BL), 0.27 (PD), 0.18 (BxM) and 0.19 (BxMxP)13. Additionally, Miller et al. using non-domesticated sheep (Ovis canadensis and Ovis dalli) and the same genotype panel but adopting a different QC (MAF < 0.10), reported a mean r2 (SD) of 0.042 (0.067)4. Considering the confidence interval obtained for the estimates presented in this study as well as in the studies previously reported, it is possible to assume that estimates of r2 and |D′| across all SNP combinations on a chromosome are relatively consistent across sheep populations.

Figures 1 and 2 illustrate r2 and |D′|, respectively, as a function of the intermarker distance for chromosomes 1 and 24. Supplementary Fig. S1 and S2 depict r2 and |D′|, respectively, for the other chromosomes. Overall, the relationship between LD and intermarker distance suggest that as intermarker distance decreases, LD increases. A notable exception is chromosome 1. On this chromosome, r2 presented secondary high peaks around the interval from 100 to 150 Mb (Fig. 1). On all chromosomes, |D′| maximum was observed between many SNP pairs with high intermarker distances (Fig. 2). We contend that this might occur due to the dependence of |D′| on allele frequency. The unexpected increase in LD between some SNP pairs with larger intermarker distances could also be explained by selection. It is possible that favorable alleles for different traits were selected, resulting in a high degree of LD on longer intermarker distances, even extending to inter chromosome pairs of SNP. Another potential reason for high r2 values when intermarker distance was large is assembling errors, potentially explaining the phenomenon on chromosome 1.

Figure 1
figure 1

Linkage disequilibrium (LD) measured by r2 plotted as a function of intermarker distance (Mb) for chromosomes 1 (OAR1) and 24 (OAR24).

Figure 2
figure 2

Linkage disequilibrium (LD) measured by |D′| plotted as a function of intermarker distance (Mb) for chromosomes 1 (OAR1) and 24 (OAR24).

The average (SD) r2 between all pairwise SNPs contained on the same chromosome with intermarker distance greater than or equal to 0.10 and lower than 0.20 Mb was 0.1033 (0.0807) across all chromosomes. Zhao et al. observed r2 values equal to 0.044, 0.132 and 0.158 in Sunite, German Mutton Merino and Dorper sheep, respectively, in the same marker distance interval36. Additionally, García-Gámez et al. observed r2 equals to 0.086 for SNP also within the same marker distance interval in a Spanish Churra sheep population28. Similarly, Chitneedi et al. observed the average of 0.066 for r2 in Spanish Churra sheep using the high-density imputed genotypes29.

Using LD categories defined by Espigolan et al., Table 2 shows the average intermarker distances between pairwise SNPs exhibiting low LD (r2 ≤ 0.16), medium LD (0.16 < r2 < 0.70), and high LD (r2 > 0.70)24. Higher levels of r2 (greater than 0.70) were found at distances between markers smaller than 0.768 Mb with 3,296 combinations of SNPs (0.01% of all combinations). For medium levels of r2 (0.16 to 0.70), distances lower than 5.277 Mb were observed with 273,659 combinations of SNPs (0.849%). Considering low levels of r2 (lower than 0.16) distances found were higher than 15.110 Mb with 31,939,376 combinations of SNPs (99.140%).

Table 2 Mean intermarker distance and frequency for each category of linkage disequilibrium (high, medium and low) according to r2 metrics.

Relationship between linkage disequilibrium, inbreeding coefficient and effective population size

The relationships between r2, |D′|, MAF, F, and N e are reported in Table 1. The mean MAF was similar across all chromosomes. The correlation between the two measures of LD was 0.75 when LD was estimated between adjacent SNP and 0.97 when estimated among all pairwise SNP. Although |D′| tends to overestimate LD values compared to r2 as reported by Zhao et al.37, both LD metrics exhibited the same behavior (Table 1). This is expected since these metrics are defined similarly as a function of allele frequency. The differences between the two metrics (r2 and |D′|) are related to the weight applied to the allele frequencies. Given |D′| is entirely dependent on the frequency of the alleles, |D′| possibly inflates LD estimates37. On the other hand, the r2 proposed by Hill and Robertson7 aims to reduce this frequency dependence.

According to Hill and Robertson7, LD (numerator of r2) and F have a linear relationship as shown in the equation below7. In a population under selection, the number of homozygotes tends to increase for many favorable alleles. Consequently, the inbreeding coefficient and LD between these selected alleles increase7.


where \({D}^{2}={({\rho }_{AB}-{\rho }_{A}{\rho }_{B})}^{2}\) and is the numerator of r2, \({\rho }_{A}\,\,\)is the probability of allele A at marker 1, \({\rho }_{B}\) is the probability of allele B at marker 2, and \({\rho }_{AB}\) is a probability of the pair of AB markers; \({p}_{0}\) and \({q}_{0}\) are the frequency of A and B alleles, respectively, in generation zero or with initial equilibrium. A positive relationship (0.22) was observed between the D2 estimated by equation (1) as a function of inbreeding coefficients and the average D2 observed between adjacent SNPs on each the chromosome. A possible justification for the low correlation could be the relatively limited number of SNPs per chromosome on the panel used in the current study. The SNPs contained on the panel used herein covers only 299.6 Mb out of a total of 2,615.52 Mb, equivalent to 11% of the sheep genome. However, a few negative values were observed (e.g., −0.08) when estimating the correlation between D2 estimated by F (equation (1)) and average D2 between all pairwise SNPs on the chromosome. Additionally, equation (1) was derived under the assumption of finite and natural populations7.

The expectation of D at generation t can be derived from c (the recombination rate) and \({N}_{e}\). This is given by38:


A negative correlation between D, which is the numerator of |D′|, and both r2 and effective size (N e ) is expected. Considering N e as an indicator of selection, lower N e values are a result of high selection pressure, and consequently a reduction in the number of breeding animals and genetic diversity. A negative relationship between average LD between all pairwise SNPs on a chromosome and N e was observed (−0.16), as expected. However, the correlation between average LD between adjacent SNPs and N e was positive (0.35). One potential reason for the observed discrepancy is the fact that N e was estimated based on the LD between all pairwise SNPs rather than LD between adjacent SNPs. For instance, Lindblad-Toh et al. also observed that the effective population size and the inbreeding coefficient were reduced during dog domestication, resulting in a decrease of LD39.

Haplotype blocks

The construction of haplotypes with only two (frequency = 1,879) to twenty-one (frequency = 1) markers was consistent with the low LD among pairwise SNP reported in this study. The mean size of haplotype blocks and the frequency of the number of SNPs for each chromosome are reported in Table 3. Short haplotype blocks in common among breeds have been observed by others17. The average distance (SD) between markers that formed the haplotype blocks was 0.04 (0.033) Mb. Considering the size of the sheep genome and the average distance between SNP that formed the haplotype blocks, it was possible to indirectly infer the minimum number of markers needed for genomic analyses, which was 61,415 SNPs. However, due to the high standard deviation of the distance between markers that formed the haplotype, it is important to use this number with caution.

Table 3 Summary of mean and standard deviation (SD) of intermarker distance in haplotype blocks for each chromosome and frequency of haplotype blocks size.


The extent of LD among adjacent markers for the Santa Inês breed resembled those of previously reported results in other breeds of domesticated sheep. The mean LD values between all SNP pairs on each chromosome were consistent with domestic and wild sheep (Ovis canadensis and Ovis dalli) and they were lower than the estimates reported in other species. The findings reported in this study will be useful to provide a theoretical reference in determining the number of markers needed for future GS and GWAS in Santa Inês sheep.


Animal resources, genotyping and quality control

All experimental procedures employed in the present study that relate to animal experimentation were performed in accordance with the resolution number 07/2016 approved by Institutional Animal Care and Use Committee Guidelines from the School of Veterinary Medicine of University Federal of Bahia – UFBA and sanctioned by the president Prof. Claudio de Oliveira Romão to ensure compliance with international guidelines for animal welfare.

The dataset included the genotypes of 396 animals from the Santa Inês sheep breed collected between 2016 and 2017. These animals were fed in confinement for 54 to 92 days on average, during four different periods with slightly different nutritional management. This herd is located at the Experimental Farm of São Gonçalo dos Campos, the city of São Gonçalo dos Campos, Bahia, Brazil, and it is associated with the Federal University of Bahia (UFBA).

To characterize the Santa Inês sheep population, the relationship between animals was estimated using a genomic relationship matrix, G, as described in VanRaden (2008)40. The G matrix was constructed by using the PREGSF90 software in the BLUPF90 package41,42,43. The average relationship between animals (SD) was 0.001 (0.0634), with minimum and maximum values equal to −0.135 and 0.934, respectively. The hierarchically clustered heatmap of the G matrix was constructed using the gplots R package44 and is presented in Fig. 3. The heatmap represents the relationship among individuals, with darker shades (red) representing low relationship between animals and lighter tones (light yellow) representing a high degree of relationship. The blocks observed in the heatmap represent individuals with stronger degrees of relationship than the overall mean relationship. By analyzing each block, we observed an overall relationship mean (standard deviation) within all blocks equal to 0.004 (0.0606), varying from −0.023 (0.0291) to 0.079 (0.1514). Random blocks with darker tones within the Fig. 3, for example, showed a lower mean (standard deviation) degree of relationship, with value equal to 0.001 (0.0555). None of the blocks can be considered as an exclusively full-sib or half-sib group45, although they include full-sib and half-sib relationships. Inside the most defined diagonal block, for example, 13 full-sib animal pairs and 350 half-sib animal pairs are represented. In the population as a whole, there are one twin animal pair, 38 full-sib animal pairs and 3,089 half-sib animal pairs. The structure of this population can be observed by a distribution printed into the left of Fig. 3, which presents the frequency of pairs by relationship degree. The major density of animal pairs is near zero, representing the overall low relationship among them. It is also possible to observe higher density of animal pairs above zero, closely to 0.25, 0.5 and 1.0, representing the half-sibs, full-sibs and twins as well as a mass lower than zero. The genetic structure of sampling might influence the LD results. For instance, a population with an elevated level of relationship probably will also have a higher level of inbreeding and, consequently, a higher LD level. Therefore, the complex breeding history of Santa Inês may have influenced the estimates of LD.

Figure 3
figure 3

Hierarchically clustered heatmap of the genomic relationship among the individuals. At the top left, there is a histogram (green line) of the number of pairs of individuals (y axis = count) at each relationship degree (x axis = value). A vertical dashed green line is on the relationship degree equal to zero. At the bottom right, there is a heatmap of the relationship among the individuals. In both the histogram and the heatmap, the color gradient from dark red to light yellow represents the variation of the relationship degree from low to high, respectively.

DNA was extracted from tissue samples of the Longissimus dorsi muscle collected from the left hemi-carcass and stored in 2.0 milliliter (ml) Eppendorf tubes. DNA extraction was performed according to protocols for lysis buffer and RNase. A high-density SNP panel (Illumina High-Density Ovine SNP BeadChip®) containing 54,241 SNP was used for genotyping. Chromosomal coordinates for each SNP were obtained from the ovine genome sequence assembly, Oar_v3.1.

Quality control (QC) of the genomic data was performed by the GenABEL R package46 for LD analyses47. The PREGSF90 interface of the BLUPF90 program41,42,43 was used to edit the genomic data for F, N e , MAF, and haplotype analyses. SNPs with a call rate lower than 0.90, MAF lower than 0.05 and p-value lower than 0.1 for the Hardy-Weinberg Equilibrium Chi-square test were excluded. One sample with a call rate lower than 0.9 was also removed. Table 4 summarizes the number of SNPs per chromosome before and after QC. We considered only the autosomal chromosomes (OAR1 to OAR26) in this study resulting in 38,168 SNPs retained for further analysis.

Table 4 The number of SNPs per chromosome before and after quality control.

Inbreeding coefficient and effective population size

Inbreeding coefficient (F) was calculated as a function of the expected and observed homozygote difference by using the PLINK software48. This is given by


where \({F}_{i}\) is the estimated inbreeding coefficient of the iih animal; \({O}_{i}\) is the number of homozygous loci observed in the iih animal, \({E}_{i}\) is the number of homozygous loci expected and \({L}_{i}\) is the number of genotyped autosomal loci48.

Effective population size (N e ) was obtained by the SNeP software49. This software provides a history of the effective population size, that is, the number of past generations based on the relationship between N e , linkage disequilibrium represented by r2, and recombination rate (c) by using the following equation50.


Therefore, by solving equation (4), we have:

$${N}_{e(t)}={(4f({c}_{t}))}^{-1}(E{[{r}^{2}|{c}_{t}]}^{-1}-\alpha )$$

where \({N}_{e(t)}\) is the effective population size at generation t, which is \({(4f({c}_{t}))}^{-1}\)51; \({c}_{t}\) is the recombination rate in generation t which is proportional to the physical distance between markers, r2 is LD, and \(\alpha \) \({\rm{is}}\) the adjustment for mutation rate. The parameter α can assume three different values: \(1,\,2\) or \(2.2\)52. When we consider \(\alpha \) equal to 1, \({N}_{e}c\) tends towards 0 and we assume that there is no selection or mutation. On the other hand, when mutation does occur, the parameter \(\alpha \) can be equal to 2 or 2.2. The value of 2.2 comes from the result of the equilibrium expression \(\frac{E[{({\rho }_{AB}-{\rho }_{A}{\rho }_{B})}^{2}]}{E[{\rho }_{A}(1-{\rho }_{A}){\rho }_{B}(1-{\rho }_{B})]}\) that was equal to \(\frac{5}{11}\). In this expression, \({\rho }_{A}\,\,\)is the probability of allele A at marker (or SNP) 1, \({\rho }_{B}\) is the probability of allele B at marker (or SNP) 2, and \({\rho }_{AB}\) is a probability of the pair of AB markers; following Ohta & Kimura52. Tenesa et al. proposed \(\alpha \) equal to two53.

In our study, the \({N}_{e}\,\,\)by chromosome was the result of a harmonic mean due to a relatively small number of SNPs in each chromosome. The physical distance was transformed to genetic distance considering one Mb as one centimorgan (cM).

Linkage disequilibrium analysis

The estimation of LD was performed in two ways for each chromosome: (1) between neighboring pairs of SNPs (adjacent SNPs) and (2) pairwise combination of all SNPs (pairwise SNPs) using the function LD in the R package genetics47,54. The |D′| is a scale of the frequency difference of the allele pairs AB, where A is the allele of the marker (or SNP) 1, and B the allele of the marker 2, and the expected frequency of each allele separately. |D′| parameter ranges from 0 to 1 and it is given by55:



$$D=\,{\rho }_{AB}-{\rho }_{A}{\rho }_{B}$$


$$\,\{\begin{array}{c}D > 0,\,{D}_{max}=\,{\rm{\min }}({\rho }_{A}{\rho }_{b},{\rho }_{a}{\rho }_{B})\\ D < 0,\,{D}_{max}=\,{\rm{\max }}(-\,{\rho }_{A}{\rho }_{B},-{\rho }_{a}{\rho }_{b})\end{array}\,\}$$

Here \({\rho }_{A}\,\,\)is the probability of allele A at marker 1, \({\rho }_{a}\) is the probability of allele a at marker 1, \({\rho }_{B}\) is the probability of allele B at marker 2, \({\rho }_{b}\) is the probability of allele b at marker 2, and \({\rho }_{AB}\) is a probability of the pair of AB markers. Maximum likelihood was used to estimate \({\rho }_{AB}\) because genotype AB/ab is not distinguishable from genotype aB/Ab56.

The squared correlation between the markers, given by r2, is expressed as7:

$${r}^{2}=\frac{{D}^{2}}{({\rho }_{A}{\rho }_{a}{\rho }_{B}{\rho }_{b})}$$

where \(\,{D}^{2}={({\rho }_{AB}-{\rho }_{A}{\rho }_{B})}^{2}\), \({\rho }_{A}\,\,\)is the probability of allele A at marker 1, \({\rho }_{a}\) is the probability of allele a at marker 1, \({\rho }_{B}\) is the probability of allele B at marker 2, and \({\rho }_{b}\) is the probability of allele b at marker 2.

In total, four LD estimates were obtained: (1) |D′| between adjacent SNPs; (2) |D′| between all pairwise SNPs; (3) r2 between adjacent SNPs; and (4) r2 between all pairwise SNPs.

Haplotype blocks

The haplotype blocks were identified by following the approach suggested by Gabriel et al.57 which was implemented via PLINK48. Blocks were partitioned according to whether the upper and lower confidence limits on estimates of pairwise |D′| measure fall within certain threshold values. The desired SNP panel density was estimated by the ratio of the megabase pair over the entire ovine genome and distance between markers that composed the haplotype blocks.

Data availability

Data are available on request.


All experimental procedures involving sheep were approved by the Institutional Animal Care and Use Committee Guidelines from School of Veterinary Medicine of University Federal of Bahia – UFBA and sanctioned by the president Prof. Claudio de Oliveira Romão (n° 07/2016). All experiments were performed in accordance with relevant guidelines and regulations.