Founder-specific inbreeding depression affects racing performance in Thoroughbred horses

The Thoroughbred horse has played an important role in both sporting and economic aspects of society since the establishment of the breed in the 1700s. The extensive pedigree and phenotypic information available for the Thoroughbred horse population provides a unique opportunity to examine the effects of 300 years of selective breeding on genetic load. By analysing the relationship between inbreeding and racing performance of 135,572 individuals, we found that selective breeding has not efficiently alleviated the Australian Thoroughbred population of its genetic load. However, we found evidence for purging in the population that might have improved racing performance over time. Over 80% of inbreeding in the contemporary population is accounted for by a small number of ancestors from the foundation of the breed. Inbreeding to these ancestors has variable effects on fitness, demonstrating that an understanding of the distribution of genetic load is important in improving the phenotypic value of a population in the future. Our findings hold value not only for Thoroughbred and other domestic breeds, but also for small and endangered populations where such comprehensive information is not available.


Additional References
Results and Discussion

S2: Comparison between different methods of measuring inbreeding and purging
Comparison between genealogical measures of inbreeding. There are a number of ways to account for inbreeding and genetic load in a population. We compared a number of genealogical and genomic measures to determine the optimal coefficients to analyse inbreeding trends in individuals and populations. We used the stochastic gene-dropping program GRain to calculate genealogical inbreeding values (2) (See SI Materials and Methods). This simulation method allows accurate calculation of a number of ancestral inbreeding coefficients by accounting for alleles that are identical by descent (IBD) multiple times in the pedigree.
First, we calculated Wright's classical inbreeding coefficient (F) (4). Although F is the traditional method for calculating inbreeding in a population, it does not always reflect the genetic load of the population (5). There are a number of coefficients that can be calculated from pedigree data to account for purging of load and favourable selection. These measures rely on the principle that selection will remove deleterious alleles from the population, so alleles that are IBD multiple times in a pedigree are more likely to have neutral or positive effects on fitness.
Theoretically, a population can be purged of some or all of its genetic load, so that individuals with a high inbreeding coefficient may show little or no evidence of inbreeding depression (6)(7)(8).
For this reason, accounting for potential purging provides a better reflection of genetic load than simply measuring F (9). A number of measures have been proposed, which all operate under the assumption that an allele which has been IBD more than once in an individual's pedigree is less likely to have a deleterious effect on phenotype than one that is IBD for the first time.
The first of these coefficients, proposed by Ballou (5), measures the proportion of the genome that has been IBD one or more times in previous generations: where F a(s) is the ancestral inbreeding coefficient of the sire, F (s) is the inbreeding coefficient of the sire, F a(d) is the ancestral inbreeding coefficient of the dam, and F (d) is the inbreeding coefficient of the dam. Importantly, this measure accounts for alleles that are IBD multiple times in each parent's pedigree, so F a_BAL can be greater than 0 for individuals with F=0 (8).
In contrast, Kalinowoski's coefficient (F a_KAL ) only accounts for alleles that are currently IBD, and have also been IBD in the past at least once (10). Hence, when F=0, F a_KAL is also 0, resulting in a strong correlation between the two measures ( Fig S1). A major shortcoming of F a_BAL and F a_KAL is that they only measure for alleles being IBD one or more times, so only account for high lethal, recessive alleles (11).
The ancestral history coefficient (A HC ) differs from F a_BAL and F a_KAL in that it accounts for the number of times an allele has been IBD in an individual's pedigree (2). This calculation is based on the assumption that the more times an allele has been IBD in an individual's pedigree, the more likely it is to have a neutral or beneficial effect on phenotype. It is therefore possible for an individual with a comprehensive and inbred pedigree to have an A HC >1. A HC is closely correlated with F a_BAL because both measures account for all inbreeding events throughout the pedigree, although the former always holds a higher value because it quantifies the number of times an allele has been IBD (Fig. S1). This measure provides the most comprehensive and accurate reflection of purging in inbred individuals.
Comparison between genomic measures of inbreeding. The accuracy of genealogy-based inbreeding measures are highly reliant on the base population used. Each pedigree estimate assumes that the founders are unrelated, making them highly inaccurate for populations without reliable and comprehensive records. Our population has a complex pedigree with an average of 24.70 generations, allowing us to assume that these estimates are fairly accurate. However, the stochastic nature of recombination and the increase of allele frequencies through selection can lead to variability between probability-based and actual levels of autozygosity (approximately 2.43%) (12, 13). For this reason, there are increasing numbers of studies using high-density genomic information for inbreeding estimates (14). Genomic measures of inbreeding assume that increased levels of alleles IBD from common ancestors will result in higher levels of homozygosity (F H ) (14,15).
To more accurately distinguish between alleles that are identical by state (IBS) and IBD, inbreeding levels are now often measured using runs of homozygosity (ROH). These long homozygous segments show evidence of a common ancestor, as they have not been broken down by recombination in meiosis (12, 16). The stochastic process of recombination means that shorter ROH segments correspond to ancient inbreeding, whereas larger ones correspond to recent consanguineous events (12, 17,18). To differentiate between new and old inbreeding, we measured ROH with the minimal thresholds 1, 2, 5, 8, 12 and 16 Mb, which correspond to 50, 25, 10, 6, 5 and 3 generations, respectively (19). We also measured each individual's genomic level of homozygosity (F H ), because it does not distinguish between alleles that are IBS and IBD.
Pairwise relationships showed close correlations between each ROH threshold. There was very little difference between F ROH1 and F ROH2 (Fig. S2), possibly because the 45,451 SNP panel did not provide adequate density to distinguish between these thresholds (3,20,21). There is increasing evidence that high-density panels are needed for accurate estimations of smaller ROH lengths (3,20,21). Some individuals in our dataset showed no evidence of recent inbreeding: six individuals did not have an ROH over 12 Mb (five generations), and 23 with none over 16 Mb (three generations).
Comparison between genealogical and genomic inbreeding estimates. As we expected, there are some correlations between pedigree-and genomic-based measures. F showed the closest correlation with F ROH16 (0.35, Fig. S3), probably reflecting recent inbreeding events (12, 22).
Shorter ROH regions may not always correspond to autozygous segments, or may not be detected due to insufficient SNP coverage, explaining their lower correlations with F (21).
Interestingly, our studies show a lower correlation between F and F ROH than many other studies, with reported correlations of ~0.7 (22, 23). This is probably because much of F in the Thoroughbred population is attributed to individuals many generations back. The SNP density used in our analysis was probably not comprehensive enough to capture these distant inbreeding events.
The close relationship between F a_Kal and F measures has resulted in similar correlations with ROH coefficients (Fig. S3). Since A HC and F a_Bal reflect levels of purging, rather than the proportion of the genome that is IBD, it is unsurprising they show no significant relationship with any genomic inbreeding measures. F H had no relationship with any pedigree-based measure of inbreeding, indicating that its failure to distinguish between IBD and IBS alleles makes it a crude measure of inbreeding (Fig. S3).
Which measure of inbreeding is best? One major shortcoming that we have identified in using ROH estimations to quantify inbreeding is the lack of concordance in the parameters used to define an ROH (24), making it difficult to compare results between studies. For example, some studies allow for one heterozygous SNP in an ROH because of genotyping errors, whereas others believe that it makes estimations inaccurate, particularly if the heterozygous SNP is at the end of the ROH (24). Additionally, a 50k SNP panel may not be sufficiently dense to accurately detect ROH, and increasing panel density can identify different ROH regions (3,20,21). Additionally, the discovery of long ROH segments in outbred human populations (25) brings into question their accuracy for capturing levels of inbreeding. Recombination 'hotspots' and high linkage disequilibrium are proposed to account for these long IBD segments persisting for many generations (21, 25,26). Considering that the Thoroughbred population has one of the highest linkage disequilibrium rates of any domestic animal population (27), this makes the accuracy of using ROH measurements to reflect inbreeding levels in the population questionable.
In contrast, F is a well-known and widely used method, so comparisons between different studies are much easier to make (although the accuracy and number of generations of the pedigree can confound these results). For populations with deep and complex pedigrees, such as the Thoroughbred population, F allows us to estimate whole-population inbreeding levels, as well as those for deceased individuals. In this respect, F is highly useful for studying inbreeding trends over time and predicting future implications for the population. However, F does not account for factors such as genetic drift and selection in the population, making ROH advantageous in this respect (17). For this reason, using F ROH measures may be the most accurate way of measuring individual inbreeding levels in the Thoroughbred population.
Additionally, pedigree information can be used to estimate levels of purging and account for the selection of favourable alleles in a population. However, F a_Bal and F a_Kal do not effectively detect the purging of mildly deleterious alleles (11, 28). Therefore, we propose that the best coefficient for quantifying purging is A HC . In a deep and complex pedigree (such as the Thoroughbred pedigree), this coefficient best captures the effectiveness of selective breeding practices in increasing the frequency of favourable alleles, and the purging of highly and mildly deleterious alleles.
Consequently, we have chosen F and A HC measures for the further analysis of the effects of inbreeding on fitness. We have also chosen ROH thresholds of 5 and 12 Mb (corresponding to 10 and 5 generations, respectively). We chose the 5 Mb threshold to account for old inbreeding: with the SNP panel used in our study, any ROH estimate below 5 Mb may not truly reflect ROH coverage. We have also chosen the 12 Mb threshold to reflect new inbreeding, as 10% of the individuals in our dataset did not have any SNPs in the 16 Mb category.

S3: Further output from linear mixed models.
In our analyses we implemented multiple measures of racing performance to account for talent, consistency, and constitutional soundness. The first measure, cumulative earnings, accounts for the amount of prizemoney a Thoroughbred earns throughout their racing career, and is based on the assumption that an individual's ability will be reflected by the amount of prizemoney that they earn. However, this measure can favour individuals that perform inconsistently, but win one big prizemoney race in their career. For this reason, we included earnings per start, which favours individuals that perform consistently well in high-class races. Individuals that only contest in one or two race starts in their lifetime could still have high cumulative earnings and earnings per start, so we also included career length as a performance measure. Career length does not account for long breaks between race starts on account of injuries or poor recovery, so total number of starts was also included in the analysis. Lastly a measure of a horse's consistency was also included as winning strike rate, because top-class individuals should win most, if not all, of their races. Consequently, these measures are all correlated (S4).
We found that sex and year of birth significantly affected all measures of racing performance. In all models, female horses had lower levels of performance. For prizemoney measures, this discrepancy is probably due to bigger, stronger males being able to win more races with higher prizemoney. This probably also accounts for males having a higher winning strike rate, as females racing against males will have less chance of winning. The lower total starts and career length observed for female horses can be explained by female horses being retired earlier for breeding purposes. Although some stallion prospects may also be retired early to stud, castrated males will continue racing for longer due to having no residual breeding value.
We used the linear mixed models to estimate the predicted values of each racing performance measure over a range of F and A HC coefficients (Fig S5). These predictions follow the trends seen in the regression coefficients, with increasing F decreasing performance, and increasing A HC enhancing it.
Using the numerator relationship matrix incorporated into the linear mixed models, we calculated the estimated breeding values (EBVs) for all individuals in the pedigree. The extensive pedigree information available for each individual in our sample increases the accuracy of our EBV information. However, it has been reported that phenotypic information over multiple generations will also increase the accuracy of EBV estimates (29).
We found that the distribution of EBVs in the population has changed over time (Fig. S6). The EBVs for each phenotypic measure showed similar trends, probably because of the correlations between measures. For this reason, we only present EBVs for cumulative earnings in the main body of the paper.

S4: The greatest ancestral contributions to the Thoroughbred population.
We determined the 20 ancestors with the greatest marginal contributions to the current population using iterations as implemented by PEDIG 1.0 (30). Calculating ancestral contributions rather than only contributions from founders accounts for bottlenecks in the pedigree, and is particularly advantageous in our population because it allows us to estimate and understand the contributions of particularly successful and popular breeders to the current In this respect, marginal contributions are advantageous when modelling the heterogeneity of inbreeding depression. However, it is important to note that an ancestor will pass on different sets of genes to each of its descendants (31), so marginal contributions could overestimate redundancies in the pedigree, as close relatives may have inherited completely different genetic information from the same common ancestor. We found large differences in the raw and marginal contributions of some ancestors in our dataset (Fig. S7, Table S1), such that selecting individuals based on raw contributions would result in a largely different sample space.
All of the greatest marginal contributors in our dataset are closely related, some of them sharing common ancestors, and others that were mated to produce a number of successful and influential progeny ( Figure S8). The large contributions made by these close relatives demonstrate the narrow population bottleneck from which the breed has originated, mirrored in the initial large increase in the F coefficient at the foundation of the breed ( Figure S9).
We found that many of these individuals were reported to be highly influential sires and dams in the early days of the Thoroughbred breed formation. Historically, there are considered to be four great sire lines responsible for the early formation of the Thoroughbred breed. Two of these, Of the 20 ancestors analysed for their partial contributions, five contributed to over 5% of the genomes of the current population and ten contributed to over 2% ( Figure S1). We selected the top ten individuals for further analysis because they all had a pFi of over 0.005.

S5: Whole-population inbreeding trends over time.
As expected in a closed population with a small number of dominant founders (32), F has increased consistently since the foundation of the Thoroughbred population ( Figure S9a). A HC values have also increased in the population over time ( Figure S9b). The exponential increase of A HC over time agrees with our previous findings that selection for alleles contributing favourably to performance has increased their frequency over time. This result is in concordance with the positive relationship found between A HC and performance, indicating that an individual with a higher A HC has a greater accumulation of these favourable alleles in their genome.
Of the 135,572 individuals included in our racing performance analysis, the average F was 0.139 and the average A HC was 1.973. The large difference in these values further demonstrates that the selection for favourable alleles derived from the individuals in the early breed formation has increased their frequency over time.
We found that F levels rapidly increased after the bottleneck at the foundation of the breed. F then increased slowly in relation to A HC over later generations ( Figure S9c). There is a collection of individuals from 1930 onwards that have a lower F level than the majority of the population.
These are the result of a parent that has an unknown pedigree or from an outbreeding event, often with one parent originating from a different continent. Although the F of these individuals is low, or 0, most of them have an A HC value above 0. This indicates that both parents have inbreeding events in their own pedigree, which is not captured in F (see S2). We suggest that analysing both F and A HC is needed to thoroughly examine inbreeding trends and effects in a population over time.

S6: Pedigree structure and missing ancestors
Although the individuals in our analysis trace back most of their pedigree lines to the founders of the population, a small number of ancestors have incomplete pedigree information. There are a number of reasons why their pedigrees are not completely recorded 33 . Firstly, an ancestor may not be registered in the stud book because its owner could not pay the stud book fees. This was particularly relevant during the Great Depression in the 1930s. Before 1980, it was acceptable in Australia for horses to be registered for racing without having a complete pedigree in the stud book. This was based on the assumption that to be competitive in Thoroughbred races, these horses would have to be of Thoroughbred origin. Pedigree records of some horses were lost when they were shipped from England to Australia in the early 19 th century. For horses with American bloodlines, many pedigree records were lost during the American Civil War.
Additionally, when DNA testing was introduced in the late 20 th century, one individual was found to have false parentage. The proportion of ancestors with missing pedigree information by year is displayed in Fig. S10. These individuals accounted for 1.4% of the total ancestors in our pedigree file.
We estimated the proportion of missing ancestors by generation for all individuals used in our racing performance analysis (Fig. S11). No individuals in the racing performance data set have missing parents, and most individuals (80%) have a complete ancestry up to 6 generations (Fig.   S11). Considering that the majority of F is captured in the first 6 generations of a pedigree 22 , we consider that this pedigree structure makes the inbreeding estimates used in our analyses highly accurate.           S10: Proportion of ancestors with incomplete pedigree information from the genealogy of the Australian Thoroughbred horses that were used in the racing performance analysis (n=119,637). The date of birth for these individuals was listed as after the studbook was closed (in 1792), so are not considered to be founders of the Thoroughbred population. Data points are given as the proportion of individuals with missing pedigree information from all individuals born over a 10-year period.