Introduction

Selection and adaptation processes in animal populations succeed by means of genetic change. However, most of the changes that have occurred through domestication and subsequent selective breeding have been achieved without the understanding or monitoring of changes at the genetic level, for example, the establishment of the Booroola gene in Australian sheep flocks selected for a high incidence of multiple births (Piper and Bindon, 1982). During the past decade, with the application of genomic technologies, a substantial amount of genotype data for domestic livestock have accumulated in laboratories around the world, by scientists attempting to discover regions of the genome associated with commercially important traits (eg Walling et al, 2000). Arguably, however, these data have been under-utilised. For example, the large amounts of genotypic data collected for experiments to identify quantitative trait loci (QTLs) can be used to address additional questions regarding the genetic structure of the populations used in the studies.

Suffolk and Texel are the two main terminal-sire breeds of sheep used in the UK. The origins of the two breeds are distinct. Texel sheep originate from the Island of Texel in The Netherlands and were brought to the UK in the 1970s, with separate importations from France and The Netherlands. In contrast, the Suffolk sheep evolved from the mating of Norfolk horn ewes with Southdown rams in the late 18th century, and were traditionally used around the rotational system of farming in southeast England. Breed improvement in both cases has been achieved through the subjective appraisal of breed characteristics and perceived commercial (eg carcass) characteristics, and latterly through intensive selection within structured breeding programmes known as sire reference schemes. The current selection uses a selection index designed to improve the yield of lean meat (Simm and Dingwall, 1989). Therefore, genetic change has been achieved without knowledge of the underlying genetic architecture of the breeds.

Genetic diversity differs between domesticated breeds of sheep, including the breeds in this study, with evidence suggesting lower levels of heterozygosity in Suffolk sheep when compared to other breeds including Texel sheep (Farid et al, 2000). More specifically, within some extensively investigated areas of the sheep genome, for example, prion protein gene, Suffolk sheep have lower levels of genetic variability in comparison to other breeds (O'Doherty et al, 2000, 2001), with Suffolk animals possessing only three different PrP alleles in comparison to the five present in Texel and many other breeds.

The UK sheep genome-mapping project has collected DNA from large numbers of Texel and Suffolk sheep, comprising several large half-sib families, in commercial flocks in the UK. Families from both breeds have been genotyped for many markers within selected chromosomal regions, enabling a more definitive comparison of the heterozygosity of the Texel and Suffolk breeds, than have previously been possible. The primary aim of this study is to use this large and comprehensive data set to investigate whether heterozygosity of microsatellite markers differs between Texel and Suffolk sheep at various locations across the genome. The size and thoroughness of this data set ensures that this may serve as a case study for expected heterozygosity differences in domestic livestock populations with distinct breed histories.

Materials and methods

DNA samples and genotyping

Blood samples were collected from c. 6-month-old lambs over a 2-year period from five half-sib families (T1–T5) of Texel sheep (n=623) and three half-sib families (S1–S3) from Suffolk sheep (n=489). The family size varied from 75 to 276 offspring per sire. All animals were born and reared in commercial flocks across the UK. DNA was extracted from the blood using a standard salt-extraction method on fresh samples, and a phenol–chloroform extraction on blood samples that had been frozen. Sires of these eight families were genotyped for microsatellite markers in up to seven regions of the genome across chromosomes 2, 3, 4, 5, 11, 18 and 20 (Table 1). All offspring were subsequently genotyped for all markers that were heterozygous in their sire, with an average of 6.18 informative typed markers per chromosome per family. Dams of progeny were not genotyped. The dams used had a mean coefficient of coancestry (Falconer, 1989) of 0.86 and 0.83% for the Suffolk and Texel breeds, respectively, and hence, due to the low level of relatedness, allele frequencies subsequently calculated were assumed to be indicative of the breeds. Alleles were determined on a within-family basis because individual families were run on separate genotyping gels; hence, common reference points were not available to distinguish between gel variations. Sires were run on all gels containing their offspring to remove the within-family variation across gels.

Table 1 Summary of genomic regions investigated and families used within each region

Data analysis

Allele frequencies were estimated for each family for each marker that was genotyped in at least one family from each breed. This comprised 52 markers across the seven chromosomes, with the number of markers on any specific chromosome varying from four to 10. Markers and informative families are summarised in Appendix A. Allele frequencies were estimated using an expectation maximisation (EM) algorithm (Dempster et al, 1977), based on the frequencies of the alleles transmitted by the dam. Sire-transmitted alleles were not used in the estimation of allele frequency. The dam's allelic contribution for each marker to each of their progeny was determined using Mendelian laws of inheritance. When the allelic contribution from the dam could not be determined, that is, when the progeny genotype was equal to the sire genotype, alleles were given a probability that they had originated from the dam based on their frequencies from the previous iteration. Both alleles were initially given equal probability that they had originated from the dam. The algorithm was iterated until convergence of allele frequencies (to six decimal places) was achieved for each marker.

The expected heterozygosity for an individual marker (k) within a family can be calculated as

where Hk is the expected heterozygosity for marker k, pa is the allele frequency of allele a and nk is the total number of alleles for marker k. The expected heterozygosity may be shown to be a function of the number of alleles and the variance of allele frequency.

Equation (1) may be rewritten as

Since

We may write

As allele frequencies for every marker in any family sum to one, the numerator of (E(p))2 also sums to one. Therefore, equation (3) can be rewritten as

The expected heterozygosity is therefore a function of the number of alleles for a particular marker and the variance of allele frequency. Hence, from the numbers of alleles observed and the allele frequencies calculated in the Texel and Suffolk families, the following three statistics were derived for each informative marker within each family:

  1. i)

    expected heterozygosity for marker k (Hk);

  2. ii)

    total number of alleles for marker k (nk);

  3. iii)

    variance of allele frequency for marker k (Vk).

This provided 190 measurements of each of the three statistics across all markers and families (36 informative markers for family S1, 12 informative markers for family S2, etc).

Each of the three measurements were analysed using Genstat REML using the model:

where y is the heterozygosity, number of alleles or variance of allele frequency of marker k on chromosome j in family l within breed i, μ is the mean, bi is the fixed effect of breed i, cj is the fixed effect of chromosome j, mk is the random effect of marker k, eijkl is the random error term associated with marker k on chromosome j, in family l within breed i.

In addition, the linear and quadratic covariates of family size were included in the analysis of nk. Including the covariates of family size for Hk and Vk did not significantly improve the statistical model, and hence were omitted for these analyses.

For each of the three REML analyses, the predicted breed and chromosomal means were extracted with the standard errors of the differences between breeds and between the same chromosomes across breeds. Differences between chromosomal means across breeds were tested using a t-test, to ascertain whether differences were greater than could be explained by chance alone. To account for the number of tests performed on the seven chromosomes, a Bonferroni correction was used for each statistic by adjusting the significance level to P=0.05/7=0.0071.

If a significant difference was detected between the two breeds within a chromosomal region, the data were further analysed using the model:

where all symbols are as previous described, except bmik being the fixed effect of marker k within breed i.

The predicted means were extracted from the REML analyses for markers on chromosomes that previously had significant differences between breeds, along with the standard errors of the differences between breeds. Differences of marker means between breeds were tested using a t-test to determine which marker(s) is responsible for the differences between breeds. To account for the number of tests performed on each chromosome, and hence reduce the risk of declaring chance results significant, a Bonferroni correction was used for each statistic by adjusting the significance level to P=0.05/r, where r is the number of markers tested on an individual chromosome.

Results

Results for all the three statistics across all markers are presented in Appendix B.

Heterozygosity

The estimated REML breed means and chromosomal means within the breed for Hk are presented in Table 2. The mean Hk across all genomic regions studied, although marginally higher in the Texel population, was not significantly different between the two breeds (P=0.19). However, significant differences were observed on chromosome 4, with Texels having significantly higher expected heterozygosity. While moderately large differences between the breeds were present on chromosomes 2 and 20, after incorporating the Bonferroni correction for multiple testing, no significant differences of Hk were observed between the two breeds on any other chromosomes. The results from the analyses with individual markers (Table 3) indicate that the lower level of Hk in the Suffolk breed on chromosome 4 is primarily due to two adjacent markers (BMS648 and BM3212). Hk for these markers was very highly significantly different between the two breeds (0.704 and 0.642 in Texels versus 0.351 and 0.137 in Suffolks). After the incorporation of the Bonferroni correction for multiple testing, no other markers exhibited significantly different levels of expected heterozygosity.

Table 2 REML estimated means of heterozygosity grouped by chromosome in each breed, the differences between breed and P-values between the two groups
Table 3 REML estimated means of heterozygosity per marker located on chromosome 4

Number of alleles observed per marker

Markers had more alleles in the Texel breed, with on average 0.62 more alleles per marker in comparison to the Suffolk breed (Table 4). This was wholly due to the significantly larger number of alleles per marker on chromosome 20. Texel animals had, on average, 2.28 additional alleles per marker on chromosome 20 than Suffolk animals. No significant differences between the breeds for the number of alleles observed per marker were detected on any other chromosomes. The additional number of alleles on chromosome 20 in the Texel breed was due to significant differences at four markers in an interval incorporating the distal region of the major histocompatability complex (MHC) to 15 cM distal to the MHC locus (Table 5). OMHC1, CSRD0226, TGLA387 and BM1818 had significantly more alleles in the Texel breed, and of all markers tested on chromosome 20 only OLADRBps and McMA23 had a higher estimated mean nk in the Suffolk, although the differences for these other markers were not significant.

Table 4 REML estimated means of numbers of alleles per marker, grouped by chromosome in each breed, the differences between breed and P-values between the two groups
Table 5 REML estimated means of numbers of alleles per chromosome 20 marker, the differences between breeds and P-values between the two groups

The linear and quadratic covariates for family size were 0.053 and −1.37 × 10−4, respectively, indicating increasing numbers of allele variants identified in larger families. The quadratic covariate highlights the finite nature of the number of alleles within a breed, and demonstrates an optimal family size past which additional allele variants are less likely to be identified.

Allele frequency variance

The overall estimated Vk means were not significantly different between the two breeds (Table 6). The largest difference was observed on chromosome 4, where Vk means were higher in Suffolk animals. However, the standard errors of Vk estimates are high and thus this result was not statistically significant.

Table 6 REML estimated means of allele frequency variance per marker, grouped by chromosome in each breed, the differences between breed and P-values between the two groups

The analysis was repeated, including linear and quadratic covariates for the number of alleles; these were −0.017 and 7.37 × 10−4 respectively, both of which were significantly different from zero. While illustrating the negative correlation between nk and Vk, this alternative model produced larger standard errors and hence did not change the overall conclusions (results not shown).

Discussion

This study has detected a significant difference in the levels of expected heterozygosity between Suffolk and Texel sheep on chromosome 4, with Texels having increased heterozygosity. Further, a cluster of markers with increased numbers of alleles was also detected in the Texel breed along a region of chromosome 20 when compared to the Suffolk breed. Other than these two 15 cM segments, no significant evidence for consistent differences between the two breeds was detected elsewhere in the regions of the genome covered by this study. This is in contrast to the previously published but smaller study of Farid et al (2000), in which Suffolks were apparently less heterogeneous than Texels. Our study has both a greater number of animals and a greater number of markers than the Farid et al (2000) study.

Consider the observed differences on chromosome 20. It is not possible to tell from our data set whether the breed differences in the number of alleles are due to the Texel breed having an enhanced number of alleles, or the Suffolks fewer alleles, compared to some common ancestor breed. Paterson et al (1998) genotyped three of the same microsatellite markers used in this study, in a feral population of Soay sheep. These markers were OLADRBps, OMHC1 and BM1818 and the total number of alleles at these markers for the Soay sheep were 6, 5 and 7, respectively, indicating greater similarity with Suffolks than Texels. While the number animals genotyped for each marker differed, the minimum number was 887; with such a large population, numbers of alleles can be considered indicative of the breed. However, the Soay breed often undergoes population crashes (Clutton-Brock et al, 1992), that is, minor genetic bottlenecks, perhaps accounting for the relatively low numbers of alleles at these markers. In contrast, in a population of 200 Scottish Blackface lambs, Schwaiger et al (1995) observed 19 alleles at an MHC region marker, albeit one not genotyped in this study (OLADRB).

Population bottleneck effects in the Suffolk may be suggested as a reason for the observed differences; however, they may be ruled out, as the numbers of alleles and heterozygosity did not differ consistently elsewhere in the genome. Some aspect of the evolutionary history of the two breeds will account for the difference, and, although a number of tests are available to distinguish between the different evolutionary processes responsible for causing decreases in heterozygosity (Tajima, 1983; Fay and Wu, 2000), these would require more detailed knowledge of the structure and history of populations used in this study, than is likely to be available.

It is possible that some form of selective sweep in the Suffolk breed can explain our results. The process of selection can lead to a reduction in genetic variation. The regions of increased numbers of alleles and heterozygosity between the breeds both extend across a distance of c. 15 cM. It is worth noting that linkage disequilibrium in sheep populations has been demonstrated to be large (McRae et al, 2002), extending for tens of centimorgans, that is, at least as large as the c. 15 cM regions between BMS648-BM3212 and OMHC1-BM1818 (Maddox et al, 2001), demonstrating differential heterozygosity and number of alleles in this study. Thus, these sized regions are consistent with selective sweeps during breed formation.

If such criteria could explain the lower heterozygosity and number of alleles in the Suffolk breed, genes underlying the traits undergoing selection would be expected to be located within these regions of the genome. Proximal to the region on chromosome 4 is the gene encoding Inhibin beta A (INHBA). The role of INHBA is to inhibit follicle-stimulating hormone (FSH) secretion with subsequent effects on fertility. A gene with a bearing on such a trait could easily become the focus of direct or indirect selection, and subsequently lower genetic variation and hence lower levels of heterozygosity.

Likewise, it should be noted that the region on chromosome 20 with increased numbers of alleles in Texel animals incorporated a segment of the MHC. The MHC of sheep has a similar structure to the HLA system of humans, with distinct class I and II regions each containing a number of expressed genes (Trowsdale, 1993, 1995). Marker OLADRBps is located within the MHC class II nonexpressed genes (Blattman and Beh, 1992) and marker OMHC1 is located in the MHC class I region (Groth and Wetherall, 1994). The well-established association between the MHC and the ability of a host's immune system to respond to parasitic infection caused O'Brien and Evermann (1988) to suggest that species or populations with low MHC diversity may be more vulnerable to infectious disease. Subsequently, the maintenance of genetic diversity at the MHC of vertebrates has become a paradigm for the manner in which genetic diversity may be maintained in natural populations (Hedrick, 1994). This paradigm is further supported by experimental evidence in human populations of heterozygous advantage of MHC genotypes for diseases such as hepatitis (Thurz et al, 1997) and AIDS (Carrington et al, 1999). In addition, evidence in other species suggests that the accumulation of homozygosity across the genome may result in increased parasitic burden (Cassinello et al, 2001). However, it must be realised that factors other than MHC diversity per se play a role in disease resistance. The actual effectiveness of an animal's specific alleles in enabling it to respond to an infectious challenge will be a more important factor than heterozygosity per se, at least when considering a specific disease, and genes elsewhere in the genome will also play an important role. As an empirical example, the high disease susceptibility and decline in desert bighorn sheep could not be explained by low MHC variation (Gutierrez-Espeleta et al, 2001).

In summary, this study has demonstrated that the expected degree of heterozygosity and numbers of alleles per marker locus throughout the regions of the genome that were investigated are generally similar for Texel and Suffolk sheep, except for a region on chromosome 4 and another region within and distal to the MHC on chromosome 20. It is interesting to note that these differences occur in regions rather than at isolated markers, and the size of these regions is consistent with observed lengths of linkage disequilibrium in sheep. Thus, the reasons for these breed differences are unknown, but possible effects from previous selective sweeps merit further attention.