Abstract
For genomic selection to be successful, there must be sufficient linkage disequilibrium between the markers and the causal mutations. The objectives of this study were to evaluate the extent of LD in ovine using the Santa Inês breed and to infer the minimum number of markers required to reach reasonable prediction accuracy. In total, 38,168 SNPs and 395 samples were used. The mean LD between adjacent marker pairs measured by r2 and |D′| were 0.166 and 0.617, respectively. LD values between adjacent marker pairs ranged from 0.135 to 0.194 and from 0.568 to 0.650 for r2 for |D′| across all chromosomes. The average r2 between all pairwise SNPs on each chromosome was 0.018. SNPs separated by between 0.10 to 0.20 Mb had an estimated average r2 equal to 0.1033. The identified haplotype blocks consisted of 2 to 21 markers. Moreover, estimates of average coefficients of inbreeding and effective population size were 0.04 and 96, respectively. LD estimated in this study was lower than that reported in other species and was characterized by short haplotype blocks. Our results suggest that the use of a higher density SNP panel is recommended for the implementation of genomic selection in the Santa Inês breed.
Introduction
Genomic information is currently used in animal breeding programs to enable selection for difficult to measure traits, increase the overall rate of genetic gain, and to improve the understanding of genetic and biological causes underlying phenotypic variation. Genomic selection (GS) is an approach which uses genome-wide markers simultaneously to predict breeding values1. This approach has been shown to increase the rate of genetic gain when pedigree-based selection is suboptimal1, which is the case for lowly heritable traits. For instance, GS based on simulated data showed an increase in reliability of breeding values for young animals when using genomic (r2 > 60%) versus parent average (r2 = 32%) information, equivalent to approximately 20 offspring2. Furthermore, genetic gain can be increased using genomic information by shortening the generation interval1. Alternatively, genetic markers scattered across the genome offer an opportunity to conduct genome-wide association studies (GWAS) to characterize genes underlying genetic variation for traits of interest.
The success of GS and GWAS are dependent on linkage disequilibrium (LD) or gametic disequilibrium between the markers and causal mutations3 because generally only the markers are observed and the casual mutations are unknown. The LD between a marker and a causal mutation can be considered as the proportion of causal mutation variance that can be captured by the marker variance4,5. Through the knowledge of the degree of LD, it is possible to define the density of genetic markers necessary to achieve a certain accuracy of prediction and to determine when the estimates of genetic marker effects should be updated. It has been well documented that simply increasing marker density does not improve prediction accuracies. Although increased marker density improves resolution, it can also decrease power and add noise to the analyses by the use of non-informative SNP. Furthermore, increased marker density can dilute individual marker effects if, for example, two markers are associated with the same QTL and the two markers are in high LD with each other.
LD is defined as a non-random association between alleles at different loci6, and it is commonly represented by |D′| and r2 metrics7. The extent of LD can vary between and within species due to evolutionary history and population structure mainly characterized by insertions, deletions, chromosomal rearrangements, or inversions4. This association between markers and causal mutations may change overtime due to recombination and selection4 necessitating the re-estimation of marker effects.
Estimates of LD have been reported in ovine for some domestic pure and crossbred populations, as well as in wild sheep by using microsatellites and SNP markers4,8,9,10,11,12,13,14. Nevertheless, there are few studies that report LD estimates for Brazilian Santa Inês sheep using SNP. Ovine populations have retained a relatively high level of genetic diversity, unlike bovine, which justify the importance of LD mapping in many breeds within species15. Moreover, LD estimates between different breeds can be informative relative to the overall diversity level in a species and the selection level applied to them.
Therefore, the aim of the current study was to characterize LD structure in Brazilian Santa Inês sheep for the first time, given its commercial importance for meat production, reproductive efficiency, and tropical adaptation in Brazil, and compare the LD observed in the Santa Ines breed with other breeds. Beynon et al.16 mentioned the importance of studies focused on breeds as a chance to identify variation and understand the biological mechanisms that enable these breeds to survive in different local environments.
Many studies have evaluated imputation accuracy17 and the accuracy of genomic estimated breeding values using different marker panel densities in sheep18,19,20. The appropriate panel density could be specific to each species and breed depending on overall LD structure. Unfortunately, the current genotyping costs in sheep are greater than the economic value of breeding animals21. Consequently, we also aimed to provide an estimate of the marker density required for genomic studies in the Santa Inês breed.
Results and Discussion
Descriptive statistics
After quality control (QC), 38,168 autosomal SNPs remained comprising approximately 53% of the entire panel. The SNPs retained after QC spanned a total of 299.63 megabases (Mb) of the genome, with a mean (standard deviation) distance between adjacent SNP of 0.07 (0.075) Mb. This value was close to that obtained by Liu et al. in Spanish Churra sheep (0.06 Mb)14. SNPs were evenly distributed throughout the genome as the distances between adjacent markers ranged from 0.064 to 0.085 Mb. The chromosomes differ in size and SNP quantity, with chromosome 24 being the smallest in size - OAR24 (44.21 Mb). Liu et al.14 observed a similar behavior considering the same SNP panel (OAR24- 44.85 Mb), with OAR24 being the smallest chromosome (44.85 Mb) whereas the OAR2 was the largest (263.11 Mb). The number of SNPs per chromosome was proportional to the size of each chromosome. Descriptive statistics of the SNP and LD (r2 and |D′|) for each chromosome are presented in Table 1.
In addition, 35% of the SNPs (18,716) had minor allele frequency (MAF) lower than 0.20, with a mean MAF over all SNPs of 0.35. According to another sheep study, 33% of the SNPs had MAF lower than 0.2022. Extending our comparison to other species, the mean MAF was relatively higher than those found for Bos taurus indicus, with values ranging from 0.19 to 0.2523,24. The MAF is important because LD, independent of the metric used, is a function of allelic frequency. In general, low MAF may correspond to a larger difference in allele frequency of coupled alleles, which can result in lower estimates of LD as measured by either r2 or |D′|25. Consequently, applying QC and the choice of QC criteria can affect the distribution and extent of LD6.
Inbreeding coefficient and effective population size
For a better understanding of the population described in this study, inbreeding coefficient (F) and effective population size (N e ) were estimated for all chromosomes together and for each chromosome separately, using genomic information. The estimate of F was 0.04, a relatively low coefficient for a population that originated from the same commercial herd. Using pedigree information to estimate the inbreeding coefficient, Pedrosa et al. found values equal to 0.02 in the Santa Inês breed26. Al-Mamun et al. found average inbreeding coefficients for Merino, Border Leicester and Poll Dorset equal to −0.013, 0.09 and 0.02, respectively13. A recently published study in ovine found average inbreeding coefficients based on excess of homozygosity (standard deviation- SD) of −0.008 (0.031), ranging from −0.079 to 0.30112. Compared with Kijas et al.11 and Liu et al.14, the F estimated from the Santa Inês breed was lower. Negative inbreeding coefficients occur when the number of observed homozygous loci is lower than the expected, suggesting that the population is more heterogeneous than expected, perhaps due to the composite nature of the breed.
In the N e estimation process, genetic distance between markers was estimated by a fixed ratio across the whole genome of one Mb per centiMorgan (cM). Prieur et al. evaluated three different methods to transform the genetic distance in ovine, and concluded that the estimation process using CRIMAP software (v2.503) was more accurate27. However, Prieur et al. also verified that the ranking for r2 and N e between breeds were not affected by the method used and mentioned that the LD estimator was not different between methods27.
The N e estimated herein was 96 in the current generation. Kijas et al.15 observed N e equal to 520 in the Brazilian Santa Inês breed, however, in their study only 47 animals were used. Pedrosa et al. also estimated N e using pedigree information and found a relatively low value (76) in Santa Inês26. These differences in N e can be due to the number of animals used (395 vs. 47 vs. 17,097) and the source of relationship information (genomics vs. pedigree). Al-Mamun et al. found values of N e ranging from 140 (Border Leicester breed) to 348 (Merino breed)13. Brito et al.12 found values of N e in the most current generations in multi-breed sheep populations ranging from 125 to 974. Using a Spanish Churra sheep population, García-Gámez et al.28 and Chitneedi et al.29 estimated N e equal to 159 and 83, respectively.
The presence of artificial selection in the population under study was verified through the reduction of N e over the generations. In this study, N e ranged from 1,705 to 28,191 between 16 and 296 generations, respectively, before the current generation. Mastrangelo et al. estimated the N e at 295 generations ago to be 747 animals in Barbaresca sheep30. Liu et al. observed N e equal to 4,472 and 160 at 2,000 and 5 generations ago, assuming that one Mb is equivalent to one cM14. Brito et al.12 reported estimates of effective population size of 5,537 animals 1,000 generations ago to 687 in the most recent generation. We hypothesize that the large difference in N e between the current and historic generations could be because the breeds that comprise the composite breed of Santa Inês were divergent historically and, thus, these estimates include multiple divergent breeds. The Santa Inês breed is relatively new, having only begun in the 1950s by non-systematic crossing of the Brazilian Somali, Bergamasca and Morada Nova breeds31. This illustrates that the large estimates of historic N e reflect time points before the formation of the breed, and even before the domestication of ovine.
We also estimated the N e for each chromosome. Chromosome 6, OAR6, exhibited the smallest N e , which was in contrast to the results of Liu et al. that reported the smallest N e for OAR1014.
Linkage disequilibrium analysis between adjacent SNPs
The average (SD) r2 and |D′| values estimated between adjacent SNPs from the 26 autosomal chromosomes were 0.166 (0.2189) and 0.617 (0.3349), respectively. Using the dairy sheep breed Frizarta, Kominakis et al. estimated r2 and |Dʹ| equal to 0.18 and 0.50, respectively, at an average inter-marker distance of 0.031 Mb32. Mastrangelo et al. observed average r2 (SD) in Sicilian sheep equal to 0.155 (0.2040)33. Al-Mamun et al. also reported LD estimates from multiple domesticated sheep (Ovis aries) breeds including: Merino (MER), Border Leicester (BL), Poll Dorset (PD) and crossbred populations (i.e., F1 crosses of Merino and Border Leicester (MxB) and MxB crossed to Poll Dorset (MxBxP)). The authors used the same genotype panel but adopted a different data quality control (MAF < 0.01) and reported a mean r2 of 0.12 (MER), 0.20 (BL), 0.19 (PD), 0.13 (MxB) and 0.13 (MxBxP); and mean |D′| of 0.52 (MER), 0.72 (BL), 0.69 (PD), 0.54 (MxB) and 0.55 (MxBxP)13. In the Barbaresca sheep breed, the mean r2 across autosomes was 0.215, with an average distance between adjacent SNP pairs of 0.063 Mb30.
A study published with multi-breed sheep reported mean (SD) r2 of 0.26 (0.100)12. The estimates of r2 are relatively consistent across sheep populations, with the exception of larger r2 values reported by Brito et al. Nevertheless, we should consider that the distance between markers was much shorter in Brito et al. than herein (4.74 kb versus 70 kb in the present study), which can be one reason for the increase in r2. Additionally, Brito et al. reported LD levels less than 0.10 for SNP located more than 0.04 Mb apart12. A recent study from Michailidou et al. observed a mean r2 equal to 0.121, 0.098, and 0.092 in Boutsko, Chios, and Karagouniko, respectively, with the average intermarker distance 0.27 Mb for all breeds34.
Sheep populations have been associated with lower levels of LD in comparison to other ruminant and nonruminant species. Although the comparison between species is difficult due differences in genome size as well as the quality control applied, mean values between adjacent SNPs of 0.32 (r2) and 0.69 (|D′|) were estimated from the Australian Holstein-Friesian cattle population using 9,195 SNP with the mean SNP distance equal to 0.25 Mb6. The mean r2 for pigs of Landrace (87 animals), Yorkshire (96 animals), Hampshire (78 animals) and Duroc (90 animals) breeds were 0.36, 0.39, 0.44, and 0.46 estimated from 40, 144, 39, 110, 32, 370 and 34,129 SNP spaced at average distances of 0.06, 0.06, 0.07, and 0.07 Mb, respectively35.
The average LD (SD) between adjacent SNP within the same chromosome ranged from 0.135 (0.1972) to 0.194 (0.2423) for r2 and 0.568 (0.3391) to 0.650 (0.3368) for |D′| (Table 1). Chromosomes 6, 11, 12, 14, 17, 20, 21, 23 and 24 had lower average LD using r2 lower than the 0.16 threshold24. Considering r2 metrics between adjacent SNPs, chromosomes 2, 10 and 16 had higher levels of LD compared to other chromosomes. The high level of LD present on OAR10 was similar to that observed by Al-Mamun et al.13.
Linkage disequilibrium analysis among all pairwise SNPs
The average (SD) for r2 and |D′| estimated between all pairwise SNPs on the 26 autosomal chromosomes were 0.018 (0.032) and 0.225 (0.213), respectively. In a study which used microsatellite markers to evaluate LD using chromosomes 1–10 of domestic sheep (Ovis aries) with mean distance between markers ranging from 10 to 40 Mb, a mean (SD) value of 0.211 (0.004) for |D′| was estimated10. Al-Mamun et al. who also used domesticated sheep (Ovis aries), found mean r² between all pairwise SNPs (0.05 Mb mean distance) of 0.007 (MER), 0.013 (BL), 0.018 (PD), 0.009 (BxM) and 0.012 (BxMxP); and mean |D′| of 0.168 (MER), 0.29 (BL), 0.27 (PD), 0.18 (BxM) and 0.19 (BxMxP)13. Additionally, Miller et al. using non-domesticated sheep (Ovis canadensis and Ovis dalli) and the same genotype panel but adopting a different QC (MAF < 0.10), reported a mean r2 (SD) of 0.042 (0.067)4. Considering the confidence interval obtained for the estimates presented in this study as well as in the studies previously reported, it is possible to assume that estimates of r2 and |D′| across all SNP combinations on a chromosome are relatively consistent across sheep populations.
Figures 1 and 2 illustrate r2 and |D′|, respectively, as a function of the intermarker distance for chromosomes 1 and 24. Supplementary Fig. S1 and S2 depict r2 and |D′|, respectively, for the other chromosomes. Overall, the relationship between LD and intermarker distance suggest that as intermarker distance decreases, LD increases. A notable exception is chromosome 1. On this chromosome, r2 presented secondary high peaks around the interval from 100 to 150 Mb (Fig. 1). On all chromosomes, |D′| maximum was observed between many SNP pairs with high intermarker distances (Fig. 2). We contend that this might occur due to the dependence of |D′| on allele frequency. The unexpected increase in LD between some SNP pairs with larger intermarker distances could also be explained by selection. It is possible that favorable alleles for different traits were selected, resulting in a high degree of LD on longer intermarker distances, even extending to inter chromosome pairs of SNP. Another potential reason for high r2 values when intermarker distance was large is assembling errors, potentially explaining the phenomenon on chromosome 1.
The average (SD) r2 between all pairwise SNPs contained on the same chromosome with intermarker distance greater than or equal to 0.10 and lower than 0.20 Mb was 0.1033 (0.0807) across all chromosomes. Zhao et al. observed r2 values equal to 0.044, 0.132 and 0.158 in Sunite, German Mutton Merino and Dorper sheep, respectively, in the same marker distance interval36. Additionally, García-Gámez et al. observed r2 equals to 0.086 for SNP also within the same marker distance interval in a Spanish Churra sheep population28. Similarly, Chitneedi et al. observed the average of 0.066 for r2 in Spanish Churra sheep using the high-density imputed genotypes29.
Using LD categories defined by Espigolan et al., Table 2 shows the average intermarker distances between pairwise SNPs exhibiting low LD (r2 ≤ 0.16), medium LD (0.16 < r2 < 0.70), and high LD (r2 > 0.70)24. Higher levels of r2 (greater than 0.70) were found at distances between markers smaller than 0.768 Mb with 3,296 combinations of SNPs (0.01% of all combinations). For medium levels of r2 (0.16 to 0.70), distances lower than 5.277 Mb were observed with 273,659 combinations of SNPs (0.849%). Considering low levels of r2 (lower than 0.16) distances found were higher than 15.110 Mb with 31,939,376 combinations of SNPs (99.140%).
Relationship between linkage disequilibrium, inbreeding coefficient and effective population size
The relationships between r2, |D′|, MAF, F, and N e are reported in Table 1. The mean MAF was similar across all chromosomes. The correlation between the two measures of LD was 0.75 when LD was estimated between adjacent SNP and 0.97 when estimated among all pairwise SNP. Although |D′| tends to overestimate LD values compared to r2 as reported by Zhao et al.37, both LD metrics exhibited the same behavior (Table 1). This is expected since these metrics are defined similarly as a function of allele frequency. The differences between the two metrics (r2 and |D′|) are related to the weight applied to the allele frequencies. Given |D′| is entirely dependent on the frequency of the alleles, |D′| possibly inflates LD estimates37. On the other hand, the r2 proposed by Hill and Robertson7 aims to reduce this frequency dependence.
According to Hill and Robertson7, LD (numerator of r2) and F have a linear relationship as shown in the equation below7. In a population under selection, the number of homozygotes tends to increase for many favorable alleles. Consequently, the inbreeding coefficient and LD between these selected alleles increase7.
where \({D}^{2}={({\rho }_{AB}-{\rho }_{A}{\rho }_{B})}^{2}\) and is the numerator of r2, \({\rho }_{A}\,\,\)is the probability of allele A at marker 1, \({\rho }_{B}\) is the probability of allele B at marker 2, and \({\rho }_{AB}\) is a probability of the pair of AB markers; \({p}_{0}\) and \({q}_{0}\) are the frequency of A and B alleles, respectively, in generation zero or with initial equilibrium. A positive relationship (0.22) was observed between the D2 estimated by equation (1) as a function of inbreeding coefficients and the average D2 observed between adjacent SNPs on each the chromosome. A possible justification for the low correlation could be the relatively limited number of SNPs per chromosome on the panel used in the current study. The SNPs contained on the panel used herein covers only 299.6 Mb out of a total of 2,615.52 Mb, equivalent to 11% of the sheep genome. However, a few negative values were observed (e.g., −0.08) when estimating the correlation between D2 estimated by F (equation (1)) and average D2 between all pairwise SNPs on the chromosome. Additionally, equation (1) was derived under the assumption of finite and natural populations7.
The expectation of D at generation t can be derived from c (the recombination rate) and \({N}_{e}\). This is given by38:
A negative correlation between D, which is the numerator of |D′|, and both r2 and effective size (N e ) is expected. Considering N e as an indicator of selection, lower N e values are a result of high selection pressure, and consequently a reduction in the number of breeding animals and genetic diversity. A negative relationship between average LD between all pairwise SNPs on a chromosome and N e was observed (−0.16), as expected. However, the correlation between average LD between adjacent SNPs and N e was positive (0.35). One potential reason for the observed discrepancy is the fact that N e was estimated based on the LD between all pairwise SNPs rather than LD between adjacent SNPs. For instance, Lindblad-Toh et al. also observed that the effective population size and the inbreeding coefficient were reduced during dog domestication, resulting in a decrease of LD39.
Haplotype blocks
The construction of haplotypes with only two (frequency = 1,879) to twenty-one (frequency = 1) markers was consistent with the low LD among pairwise SNP reported in this study. The mean size of haplotype blocks and the frequency of the number of SNPs for each chromosome are reported in Table 3. Short haplotype blocks in common among breeds have been observed by others17. The average distance (SD) between markers that formed the haplotype blocks was 0.04 (0.033) Mb. Considering the size of the sheep genome and the average distance between SNP that formed the haplotype blocks, it was possible to indirectly infer the minimum number of markers needed for genomic analyses, which was 61,415 SNPs. However, due to the high standard deviation of the distance between markers that formed the haplotype, it is important to use this number with caution.
Conclusions
The extent of LD among adjacent markers for the Santa Inês breed resembled those of previously reported results in other breeds of domesticated sheep. The mean LD values between all SNP pairs on each chromosome were consistent with domestic and wild sheep (Ovis canadensis and Ovis dalli) and they were lower than the estimates reported in other species. The findings reported in this study will be useful to provide a theoretical reference in determining the number of markers needed for future GS and GWAS in Santa Inês sheep.
Methods
Animal resources, genotyping and quality control
All experimental procedures employed in the present study that relate to animal experimentation were performed in accordance with the resolution number 07/2016 approved by Institutional Animal Care and Use Committee Guidelines from the School of Veterinary Medicine of University Federal of Bahia – UFBA and sanctioned by the president Prof. Claudio de Oliveira Romão to ensure compliance with international guidelines for animal welfare.
The dataset included the genotypes of 396 animals from the Santa Inês sheep breed collected between 2016 and 2017. These animals were fed in confinement for 54 to 92 days on average, during four different periods with slightly different nutritional management. This herd is located at the Experimental Farm of São Gonçalo dos Campos, the city of São Gonçalo dos Campos, Bahia, Brazil, and it is associated with the Federal University of Bahia (UFBA).
To characterize the Santa Inês sheep population, the relationship between animals was estimated using a genomic relationship matrix, G, as described in VanRaden (2008)40. The G matrix was constructed by using the PREGSF90 software in the BLUPF90 package41,42,43. The average relationship between animals (SD) was 0.001 (0.0634), with minimum and maximum values equal to −0.135 and 0.934, respectively. The hierarchically clustered heatmap of the G matrix was constructed using the gplots R package44 and is presented in Fig. 3. The heatmap represents the relationship among individuals, with darker shades (red) representing low relationship between animals and lighter tones (light yellow) representing a high degree of relationship. The blocks observed in the heatmap represent individuals with stronger degrees of relationship than the overall mean relationship. By analyzing each block, we observed an overall relationship mean (standard deviation) within all blocks equal to 0.004 (0.0606), varying from −0.023 (0.0291) to 0.079 (0.1514). Random blocks with darker tones within the Fig. 3, for example, showed a lower mean (standard deviation) degree of relationship, with value equal to 0.001 (0.0555). None of the blocks can be considered as an exclusively full-sib or half-sib group45, although they include full-sib and half-sib relationships. Inside the most defined diagonal block, for example, 13 full-sib animal pairs and 350 half-sib animal pairs are represented. In the population as a whole, there are one twin animal pair, 38 full-sib animal pairs and 3,089 half-sib animal pairs. The structure of this population can be observed by a distribution printed into the left of Fig. 3, which presents the frequency of pairs by relationship degree. The major density of animal pairs is near zero, representing the overall low relationship among them. It is also possible to observe higher density of animal pairs above zero, closely to 0.25, 0.5 and 1.0, representing the half-sibs, full-sibs and twins as well as a mass lower than zero. The genetic structure of sampling might influence the LD results. For instance, a population with an elevated level of relationship probably will also have a higher level of inbreeding and, consequently, a higher LD level. Therefore, the complex breeding history of Santa Inês may have influenced the estimates of LD.
Hierarchically clustered heatmap of the genomic relationship among the individuals. At the top left, there is a histogram (green line) of the number of pairs of individuals (y axis = count) at each relationship degree (x axis = value). A vertical dashed green line is on the relationship degree equal to zero. At the bottom right, there is a heatmap of the relationship among the individuals. In both the histogram and the heatmap, the color gradient from dark red to light yellow represents the variation of the relationship degree from low to high, respectively.
DNA was extracted from tissue samples of the Longissimus dorsi muscle collected from the left hemi-carcass and stored in 2.0 milliliter (ml) Eppendorf tubes. DNA extraction was performed according to protocols for lysis buffer and RNase. A high-density SNP panel (Illumina High-Density Ovine SNP BeadChip®) containing 54,241 SNP was used for genotyping. Chromosomal coordinates for each SNP were obtained from the ovine genome sequence assembly, Oar_v3.1.
Quality control (QC) of the genomic data was performed by the GenABEL R package46 for LD analyses47. The PREGSF90 interface of the BLUPF90 program41,42,43 was used to edit the genomic data for F, N e , MAF, and haplotype analyses. SNPs with a call rate lower than 0.90, MAF lower than 0.05 and p-value lower than 0.1 for the Hardy-Weinberg Equilibrium Chi-square test were excluded. One sample with a call rate lower than 0.9 was also removed. Table 4 summarizes the number of SNPs per chromosome before and after QC. We considered only the autosomal chromosomes (OAR1 to OAR26) in this study resulting in 38,168 SNPs retained for further analysis.
Inbreeding coefficient and effective population size
Inbreeding coefficient (F) was calculated as a function of the expected and observed homozygote difference by using the PLINK software48. This is given by
where \({F}_{i}\) is the estimated inbreeding coefficient of the iih animal; \({O}_{i}\) is the number of homozygous loci observed in the iih animal, \({E}_{i}\) is the number of homozygous loci expected and \({L}_{i}\) is the number of genotyped autosomal loci48.
Effective population size (N e ) was obtained by the SNeP software49. This software provides a history of the effective population size, that is, the number of past generations based on the relationship between N e , linkage disequilibrium represented by r2, and recombination rate (c) by using the following equation50.
Therefore, by solving equation (4), we have:
where \({N}_{e(t)}\) is the effective population size at generation t, which is \({(4f({c}_{t}))}^{-1}\)51; \({c}_{t}\) is the recombination rate in generation t which is proportional to the physical distance between markers, r2 is LD, and \(\alpha \) \({\rm{is}}\) the adjustment for mutation rate. The parameter α can assume three different values: \(1,\,2\) or \(2.2\)52. When we consider \(\alpha \) equal to 1, \({N}_{e}c\) tends towards 0 and we assume that there is no selection or mutation. On the other hand, when mutation does occur, the parameter \(\alpha \) can be equal to 2 or 2.2. The value of 2.2 comes from the result of the equilibrium expression \(\frac{E[{({\rho }_{AB}-{\rho }_{A}{\rho }_{B})}^{2}]}{E[{\rho }_{A}(1-{\rho }_{A}){\rho }_{B}(1-{\rho }_{B})]}\) that was equal to \(\frac{5}{11}\). In this expression, \({\rho }_{A}\,\,\)is the probability of allele A at marker (or SNP) 1, \({\rho }_{B}\) is the probability of allele B at marker (or SNP) 2, and \({\rho }_{AB}\) is a probability of the pair of AB markers; following Ohta & Kimura52. Tenesa et al. proposed \(\alpha \) equal to two53.
In our study, the \({N}_{e}\,\,\)by chromosome was the result of a harmonic mean due to a relatively small number of SNPs in each chromosome. The physical distance was transformed to genetic distance considering one Mb as one centimorgan (cM).
Linkage disequilibrium analysis
The estimation of LD was performed in two ways for each chromosome: (1) between neighboring pairs of SNPs (adjacent SNPs) and (2) pairwise combination of all SNPs (pairwise SNPs) using the function LD in the R package genetics47,54. The |D′| is a scale of the frequency difference of the allele pairs AB, where A is the allele of the marker (or SNP) 1, and B the allele of the marker 2, and the expected frequency of each allele separately. |D′| parameter ranges from 0 to 1 and it is given by55:
And
Where
Here \({\rho }_{A}\,\,\)is the probability of allele A at marker 1, \({\rho }_{a}\) is the probability of allele a at marker 1, \({\rho }_{B}\) is the probability of allele B at marker 2, \({\rho }_{b}\) is the probability of allele b at marker 2, and \({\rho }_{AB}\) is a probability of the pair of AB markers. Maximum likelihood was used to estimate \({\rho }_{AB}\) because genotype AB/ab is not distinguishable from genotype aB/Ab56.
The squared correlation between the markers, given by r2, is expressed as7:
where \(\,{D}^{2}={({\rho }_{AB}-{\rho }_{A}{\rho }_{B})}^{2}\), \({\rho }_{A}\,\,\)is the probability of allele A at marker 1, \({\rho }_{a}\) is the probability of allele a at marker 1, \({\rho }_{B}\) is the probability of allele B at marker 2, and \({\rho }_{b}\) is the probability of allele b at marker 2.
In total, four LD estimates were obtained: (1) |D′| between adjacent SNPs; (2) |D′| between all pairwise SNPs; (3) r2 between adjacent SNPs; and (4) r2 between all pairwise SNPs.
Haplotype blocks
The haplotype blocks were identified by following the approach suggested by Gabriel et al.57 which was implemented via PLINK48. Blocks were partitioned according to whether the upper and lower confidence limits on estimates of pairwise |D′| measure fall within certain threshold values. The desired SNP panel density was estimated by the ratio of the megabase pair over the entire ovine genome and distance between markers that composed the haplotype blocks.
Data availability
Data are available on request.
Declarations
All experimental procedures involving sheep were approved by the Institutional Animal Care and Use Committee Guidelines from School of Veterinary Medicine of University Federal of Bahia – UFBA and sanctioned by the president Prof. Claudio de Oliveira Romão (n° 07/2016). All experiments were performed in accordance with relevant guidelines and regulations.
References
Meuwissen, T. H. E., Hayes, B. J. & Goddard, M. E. Prediction of total genetic value using genome-wide dense marker maps. Genetics 157, 1819–1829 (2001).
VanRaden, P. M. Efficient methods to compute genomic predictions. J. Dairy Sci. 91, 4414–23 (2008).
Pritchard, J. K. & Przeworski, M. Linkage Disequilibrium in Humans: Models and Data. Am. J. Hum. Genet. 1–14 (2001).
Miller, J. M., Poissant, J., Kijas, J. W. & Coltman, D.w. The I. S. G. C. A genome-wide set of SNPs detects population substructure and long range linkage disequilibrium in wild sheep. Mol. Ecol. Resour. 314–322, https://doi.org/10.1111/j.1755-0998.2010.02918.x (2011).
Lu, D. et al. Linkage disequilibrium in Angus, Charolais, and Crossbred beef cattle. Front. Genet. 3, 1–10 (2012).
Khatkar, M. S. et al. Extent of genome-wide linkage disequilibrium in Australian Holstein-Friesian cattle based on a high-density SNP panel. BMC Genomics 9, 187 (2008).
Hill, W. G. & Robertson, A. Linkage disequilibrium in finite populations. Theor. Appl. Genet. 38, 226–31 (1968).
Kalinowski, S. T. & Hedrick, P. W. Estimation of linkage disequilibrium for loci with multiple alleles: basic approach and an application using data from bighorn sheep. Heredity (Edinb). 87, 698–708 (2001).
Meadows, J. R. S., Chan, E. K. F. & Kijas, J. W. Linkage disequilibrium compared between five populations of domestic sheep. BMC Genet. 9, 61 (2008).
Mcrae, A. F. et al. Linkage Disequilibrium in Domestic Sheep. Genetics 160(3), 1113–1122 (2002).
Kijas, J. W. et al. Linkage disequilibrium over short physical distances measured in sheep using a high-density SNP chip. Anim. Genet. 45, 754–757 (2014).
Brito, L. F. et al. Genetic diversity of a New Zealand multi-breed sheep population and composite breeds’ history revealed by a high-density SNP chip. BMC Genet. 18, 25 (2017).
Al-Mamun, H. A., A Clark, S., Kwan, P. & Gondro, C. Genome-wide linkage disequilibrium and genetic diversity in five populations of Australian domestic sheep. Genet. Sel. Evol. 47, 90 (2015).
Liu, S. et al. Estimates of linkage disequilibrium and effective population sizes in Chinese Merino (Xinjiang type) sheep by genome-wide SNPs. Genes and Genomics 1–13, https://doi.org/10.1007/s13258-017-0539-2 (2017).
Kijas, J. W. et al. Genome-wide analysis of the world’s sheep breeds reveals high levels of historic mixture and strong recent selection. PLoS Biol. 10, (2012).
Beynon, S. E. et al. Population structure and history of the Welsh sheep breeds determined by whole genome genotyping. BMC Genet. 16, 65 (2015).
Ventura, R. V. et al. Assessing accuracy of imputation using different SNP panel densities in a multi-breed sheep population. Genet. Sel. Evol. 48, 71 (2016).
Bolormaa, S. et al. Multiple-trait QTL mapping and genomic prediction for wool traits in sheep. Genet. Sel. Evol. 49, 62 (2017).
Bolormaa, S. et al. Genomic prediction of reproduction traits for Merino sheep. Anim. Genet. 48, 338–348 (2017).
Daetwyler, H. D., Kemper, K. E., van der Werf, J. H. J. & Hayes, B. J. Components of the accuracy of genomic prediction in a multi-breed sheep population. J. Anim. Sci. 90, 3375–3384 (2012).
Raoul, J., Swan, A. A. & Elsen, J.-M. Using a very low-density SNP panel for genomic selection in a breeding program for sheep. Genet. Sel. Evol. 49, 76 (2017).
Kijas, J. W. et al. A genome wide survey of SNP variation reveals the genetic structure of sheep breeds. PLoS One 4, e46n68 (2009).
Matukumalli, L. K. et al. Development and Characterization of a High Density SNP Genotyping Assay for Cattle. PLoS One 4, (2009).
Espigolan, R. et al. Study of whole genome linkage disequilibrium in Nellore cattle. BMC Genomics 14, 305 (2013).
Wray, N. R. Allele frequencies and the r2 measure of linkage disequilibrium: impact on design and interpretation of association studies. Twin Res. Hum. Genet. 8, 87–94 (2005).
Pedrosa, V. B., Santana, J. L., Oliveira, P. S., Eler, J. P. & Ferraz, J. B. S. Population structure and inbreeding effects on growth traits of Santa Inês sheep in Brazil. Small Rumin. Res. 93, 135–139 (2010).
Prieur, V. et al. Estimation of linkage disequilibrium and effective population size in New Zealand sheep using three different methods to create genetic maps. BMC Genet. 18, 68 (2017).
García-Gámez, E., Sahana, G., Gutiérrez-Gil, B. & Arranz, J. J. Linkage disequilibrium and inbreeding estimation in Spanish Churra sheep. BMC Genet. 13, (2012).
Chitneedi, P. K., Arranz, J. J., Suarez-Vega, A., García-Gámez, E. & Gutiérrez-Gil, B. Estimations of linkage disequilibrium, effective population size and ROH-based inbreeding coefficients in Spanish Churra sheep using imputed high-density SNP genotypes. Anim. Genet. 48, 436–446 (2017).
Mastrangelo, S. et al. Genome-wide analysis in endangered populations: A case study in Barbaresca sheep. Animal 11, 1107–1116 (2017).
ARCO. Assistência aos rebanhos de criadores de ovinos - Associação Brasileira de Criadores de ovinos. http://www.arcoovinos.com.br/index.php (2017).
Kominakis, A., Hager-Theodorides, A. L., Saridaki, A., Antonakos, G. & Tsiamis, G. Genome-wide population structure and evolutionary history of the Frizarta dairy sheep. Animal 11, 1680–1688 (2017).
Mastrangelo, S. et al. Genome wide linkage disequilibrium and genetic structure in Sicilian dairy sheep breeds. BMC Genet. 15, 108 (2014).
Michailidou, S. et al. Genomic diversity and population structure of three autochthonous Greek sheep breeds assessed with genome-wide DNA arrays. Mol. Genet. Genomics, https://doi.org/10.1007/s00438-018-1421-x (2018).
Badke, Y. M., Bates, R. O., Ernst, C. W., Schwab, C. & Steibel, J. P. Estimation of linkage disequilibrium in four US pig breeds. BMC Genomics 13, 24 (2012).
Zhao, F. et al. Estimations of genomic linkage disequilibrium and effective population sizes in three sheep populations. Livest. Sci. 170, 22–29 (2014).
Zhao, H. H., Fernando, R. L. & Dekkers, J. C. M. Power and precision of alternate methods for linkage disequilibrium mapping of quantitative trait loci. Genetics 175, 1975–1986 (2007).
Hill, W. G. & Robertson, A. The effect of linkage on limitsto artificial selection. Genetics 8, 269–294 (1966).
Lindblad-Toh, K. et al. Genome sequence, comparative analysis and haplotype structure of the domestic dog. Nature 438, 803–819 (2005).
VanRaden, P. M. Efficient methods to compute genomic predictions. J. Dairy Sci. 91, 4414–23 (2008).
Legarra, a, Aguilar, I. & Misztal, I. A relationship matrix including full pedigree and genomic information. J. Dairy Sci. 92, 4656–4663 (2009).
Misztal, I., Legarra, A. & Aguilar, I. Computing procedures for genetic evaluation including phenotypic, full pedigree, and genomic information. J. Dairy Sci. 92, 4648–4655 (2009).
Aguilar, I. et al. Hot topic: a unified approach to utilize phenotypic, full pedigree, and genomic information for genetic evaluation of Holstein final score. J. Dairy Sci. 93, 743–52 (2010).
R, W. G. et al. gplots: Various R programming tools for plotting data. R Packag. version 2, 1 (2009).
Visscher, P. M. Whole genome approaches to quantitative genetics. Genetica 136–351, https://doi.org/10.1007/s10709-008-9301-7 (2009).
Aulchenko, Y. Package GenABEL - R package reference manual. 143 Available at, https://cran.r-project.org/web/packages/GenABEL/index.html. (2015).
R Core Team R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, http://www.R-project.org/ (2013).
Purcell S et al. PLINK (1.07). PLINK: a toolset for whole-genome association and population-based linkage analysis. American Journal of Human Genetics, 81, http://pngu.mgh.harvard.edu/purcell/plink/ (2007).
Barbato, M., Orozco-terWengel, P., Tapio, M. & Bruford, M. W. SNeP: A tool to estimate trends in recent effective population size trajectories using genome-wide SNP data. Front. Genet. 6, 1–6 (2015).
Sved, J. A. Linkage Disequilibrium and Homozygosity of Chromosome Segments in Finite Populations. Theor. Popul. Biol. 2, 125–141 (1971).
Hayes, B. J., Visscher, P. M., Mcpartlan, H. C. & Goddard, M. E. Novel Multilocus Measure of Linkage Disequilibrium to Estimate Past Effective Population Size Novel Multilocus Measure of Linkage Disequilibrium to Estimate Past Effective Population Size. Genome Res. 635–643, https://doi.org/10.1101/gr.387103 (2003).
Ohta, T. & Kimura, M. Linkage disequilibrium between two segregating nucleotide sites under the steady flux of mutations in a finite population. Genetics 68, 571–580 (1971).
Tenesa, A. et al. Recent human effective population size estimated from linkage disequilibrium. Cold Spring Harb. Lab. Press Hum. 2, 520–526 (2007).
Warnes, M. G. & Leisch, F. Genetics: Population genetics (2005).
Hill, W. G. Estimation of linkage disequilibrium in randomly mating populations. Heredity (Edinb) 33, 229–239 (1974).
Leisch, F., Man, M. & Warnes, M. G. R-Package ‘genetics’ Ver.1.3.8.1. 43 (2013).
Gabriel, S. B. et al. The Structure of Haplotype Blocks in the Human Genome. Science (80-.) 296, 2225–2229 (2002).
Acknowledgements
This work was supported by São Paulo Research Foundation (FAPESP- Fundação de Amparo à Pesquisa do Estado de São Paulo; process: 2015/25024-5 and 13/04504-3), Brazil. G.B.M. is recipient of productivity fellowship from CNPq. We are indebted to the Federal University of Bahia (UFBA, Brazil) for the partnership to sheep production and Biotechnology Lab (ESALQ- USP, Brazil) for support in genotyping.
Author information
Authors and Affiliations
Contributions
A.B.A. and G.B.M. are responsible for designing the research. A.B.A. analyzed the data and drafted the manuscript. A.B.A., G.A.R. and J.P. participated of genotypic data editing. L.F.B.P., G.G.P.C. and G.B.M. provided the biological material and phenotypes. A.B.A. and G.A.R. participated in the collection of samples for DNA extraction and L.L.C. contributed with lab methodologies. G.A.R., J.P., G.M., M.L.S. and G.B.M. corrected and contributed with important modifications to the manuscript. All authors reviewed and approved the last version of the manuscript.
Corresponding author
Ethics declarations
Competing Interests
The authors declare no competing interests.
Additional information
Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Alvarenga, A.B., Rovadoscki, G.A., Petrini, J. et al. Linkage disequilibrium in Brazilian Santa Inês breed, Ovis aries. Sci Rep 8, 8851 (2018). https://doi.org/10.1038/s41598-018-27259-7
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-018-27259-7
This article is cited by
-
Elucidation of coat colour genetics in blue wildebeest
Mammalian Biology (2021)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.