Distortion of Mendelian segregation across the Angus cattle genome uncovering regions affecting reproduction

Nowadays, the availability of genotyped trios (sire-dam-offspring) in the livestock industry enables the implementation of the transmission ratio distortion (TRD) approach to discover deleterious alleles in the genome. Various biological mechanisms at different stages of the reproductive cycle such as gametogenesis, embryo development and postnatal viability can induce signals of TRD (i.e., deviation from Mendelian inheritance expectations). In this study, TRD was evaluated using both SNP-by-SNP and sliding windows of 2-, 4-, 7-, 10- and 20-SNP across 92,942 autosomal SNPs for 258,140 genotyped Angus cattle including 7,486 sires, 72,688 dams and 205,966 offspring. Transmission ratio distortion was characterized using allelic (specific- and unspecific-parent TRD) and genotypic parameterizations (additive- and dominance-TRD). Across the Angus autosomal chromosomes, 851 regions were clearly found with decisive evidence for TRD. Among these findings, 19 haplotypes with recessive patterns (potential lethality for homozygote individuals) and 52 regions with allelic patterns exhibiting complete or quasi-complete absence for homozygous individuals in addition to under-representation (potentially reduced viability) of the carrier (heterozygous) offspring were found. In addition, 64 (12) and 20 (4) regions showed significant influence on the trait heifer pregnancy at p-value < 0.05 (after chromosome-wise false discovery rate) and 0.01, respectively, reducing the pregnancy rate up to 15%, thus, supporting the biological importance of TRD phenomenon in reproduction.

Analytical models of transmission ratio distortion.Allelic parameterization of TRD.As described by Casellas et al. 4,13 , for a particular locus, the probability of allele transmission (P) from heterozygote parents (A/B) to offspring was parameterized including one overall TRD effect (α) on a parent-unspecific model or differentiating between sire-(α s ) and dam-specific TRD effects (α d ) on a parent-specific model: Flat priors (uniform distribution) were assumed for all TRD parameters within a parametric space ranging from − 0.5 to 0.5.Under a Bayesian implementation, the conditional posterior probabilities of the TRD parameters are defined as: where; y is the column vector of genotypes of the offspring generation.The likelihood of data is a multiplication of the corresponding probabilities for each offspring as: where n is the total number of offspring and P off and y i is the probability and the genotype of the i th offspring, respectively.The probability of the genotype of each offspring was defined by parents' genotypes and TRD parameters.Thus, the probability of a heterozygous offspring from a heterozygous-by-heterozygous mating becomes: Detailed information about the implemented algorithms were described in Id-Lahoucine et al. 5 and Id-Lahoucine 26 .
Genotypic parameterization of TRD.As developed by Casellas et al. 12 , genotypic parameterization can be modeled by assuming additive (α g ) and dominance (δ g ; or over-/ under-dominance) parameters, regardless of the origin of each allele.Following Casellas et al. 14 , the probability of the offspring (P off ) from heterozygous-byheterozygous mating are: For heterozygous-by-homozygous mating, correction for overall losses of individuals in terms of genotypic frequency are needed to guarantee P off (AA) + P off (AB) + P off (BB) = 1.Thus, genotypic frequencies in offspring from AA × AB mating as an example become: Under a Bayesian implementation, the conditional posterior probabilities of the TRD parameters are defined as: Flat priors were assumed for both α g and δ g within a deepened parametric space (i.e., the parametric space of a parameter is conditioned to the other parameter).Thus, the parameter space for α g initially ranges [− 1, 1] with a p(α g ) = ½ and becomes conditioned to δ g when δ g > 0, being restricted to [− 1 + δ g , 1 − δ g ] with a p(α g ) = 2/ (2-2 × δ g ).On the other hand, the parametric space for δ g competent ranges [− 1, |α g |] with a p(δ g ) = 1/(1 + α g ).Notice that these conditions were made to avoid negative probabilities for a given offspring from a particular mating.

Statistical analyses.
The analyses of TRD were evaluated SNP-by-SNP and using a sliding windows approach for haplotypes of 2-, 4-, 7-, 10-and 20-SNP across 92,942 SNPs.For haplotype analyses, the biallelichaplotype procedure described by Id-Lahoucine et al. 5 was implemented following the same parameterization described above.The analyses were performed within a Bayesian framework using TRDscan v.1.0software 5 with a unique Monte Carlo Markov chain of 110,000 iterations where the first 10,000 iterations were discarded as burn-in.The statistical relevance of TRD was evaluated using a Bayes factor 27 (BF).The BF estimates was obtained across iterations with a lag interval of 10 iterations.Both allelic and genotypic parameterizations were compared using the deviance information criterion 28 (DIC).In order to optimize the TRD analyses, the following steps were considered following Id-Lahoucine et al. 5 .Firstly, a minimum of 1,000 informative offspring was considered to guarantee minimal statistical power to characterize TRD across the whole genome.Secondly, a minimum number of informative parents (≥ 20 heterozygous sires and/or ≥ 100 heterozygous dams) were considered to minimize possible false TRD from genotyping errors.As post analyses criteria, the approximate empirical null distribution of TRD 5 at < 0.001% margin error was applied in order to exclude TRD generated by chance (i.e., gametes sampling fluctuations).In the same way, regions with few heterozygous sires displaying full skewed transmission and completely explaining the observed TRD in the corresponding region were discarded as potential genotyping errors.Subsequently, regions with a large credible interval for TRD effects (i.e., coefficient of variation > 20%), potentially as a result of unstable convergence, were filtered out.Finally, in order to combine and integrate all the results to obtain clear highlighted peaks of TRD across the whole genome, the kernel smoothing 29,30 (parametric technique) was applied.The smoothed estimate of BF for the ith base pair (bp) within the range κ i to κ n , was calculated using weighted Gaussian kernel ( where σ is the bandwidth and (κ i -κ j ) is the distance in base pairs.Following Id-Lahoucine 26 , the smoothing process was implemented with a bandwidth of 500,000 bp, which is suggested to be a rationale distance to obtain a considerable initial number of candidate regions in TRD analyses.

Characterization of TRD effects on reproductive phenotypes.
As an additional analyses, the effects of TRD regions (SNPs or haplotypes) were evaluated using the heifer pregnancy trait as recorded in the whole American Angus database.To determine the effects of the alleles, pregnancy rate between matings at risk and control were compared in two ways (for each region separately): AB × AB (risk) with AA × -(control) and AB × -(risk) with AA × AA (control).This first comparison allows to determine the impact of recessive TRD regions whereas the second is useful for allelic TRD regions.The rationale behind these matings is that we do not expect to observe BB offspring for recessive TRD regions, thus, both heterozygous parents are needed for the test.On the other hand, the presence of one single heterozygous parent is enough for testing allelic TRD regions as AB offspring could also present reduced viability.This interaction effect was included in the following animal model: where; PHN was the phenotypic recorded as binary traits (i.e., pregnant or not pregnant), INT is the interaction effect between parent genotypes (recorded as 1 and 0 for mating at risk and control, respectively), CG is a contemporary group (fixed effect comprised of the unique combination of herd-breeding year-season-breeding group-synchronization), ADH is the age the heifer's dam (fixed effect), HA is heifer age at breeding (covariate), SS is first service sire (random effect), A is the animal additive genetic effect and e is the random residual term.The effects included in the model are similar to those used in the national genetic evaluation of American Angus (Angus Genetics Inc., St. Joseph, MO, USA).The analysis was performed using a linear model (assuming Gaussian distribution for random effects).The animal additive genetic follows a multivariate normal distribution, i.e., MVN(0, Gσ a 2 ), where σ a 2 was the genetic variance and G was the genomic relationship matrix constructed with 88,959 SNPs (minor allele frequency > 0.001) using VanRaden's first method 31 .The significance of the interaction effect was tested with a t-test.The total number of genotyped heifers with a pregnancy record was 21,297.The total number of pregnancy records where at least one parent is genotyped was 70,869.When considering the maternal grandsire genotype (i.e., the sire of the heifer), the number of informative records increased to 76,719.

Results and discussion
Characterization of TRD on Angus genome.Single nucleotide polymorphism and haplotype alleles were identified exhibiting distorted segregation ratios with decisive evidence (BF ≥ 100 according to Jeffreys' scale 32 ) across the Angus genome.After the implementation of the different strategies to minimize possible TRD artifacts, a total of 99,580 genomic regions with TRD were identified (including totally or partially overlapped windows) after exclusively keeping the allele-region (SNP or the haplotype allele) with the highest BF.Among them, 5,027 corresponded to SNPs and 9,913, 14,106, 18,088, 21,368 and 31,078 corresponded to haplotype windows of 2-, 4-, 7-, 10-and 20-SNP, respectively.This large number was a result of the sliding window approach, the different window sizes applied and the level of linkage disequilibrium (LD).Thus, it is important to mention that the signals of TRD observed for individual SNPs and/or short haplotype windows are also observed in windows linked to them.Given that the different TRD patterns observed across adjacent regions were potentially generated from one single mutation.Here, we assumed that the best candidate allele harboring the causal variant (or in strong LD with it) will correspond to the allele-region with the highest BF 5 .Thus, after combining and integrate all the results taking into account the LD using the smoothing process, 990 core regions has been highlighted across the whole genome.Within these regions, 139 regions were excluded as they were plausibly explained by genotyping errors or convergence instability after individually checking (visual inspection) the mean and standard deviation of TRD parameters and the corresponding distribution of the offspring across matings.Notice that for genotyping errors could be anticipated when checking the number of heterozygous sires (with at least 10 offspring) that transmitted one allele with a probability > 90% and the distribution of their offspring.Following Id-Lahoucine et al. 5 , the strategy used is based on discarded TRD that was generated fully from few heterozygous sires (e.g., < 3) with a large number of offspring, these sires potentially are homozygous and genotyped incorrectly as heterozygous.

Relevant insight of TRD findings on Angus genome with deleterious alleles.. The whole Angus
genome was characterized with 851 non-overlapping TRD regions, being 177 SNPs and 258, 165, 103, 78 and 70 haplotype windows of 2-, 4-, 7-, 10-and 20-SNP, respectively.Among these findings, it is important to highlight that the majority of regions were detected with more than one of the applied models (i.e., parent-unspecific and -specific allelic model, genotypic model).Despite this overlap, different statistical evidence was observed for TRD estimates for the different models, suggesting different degrees of fit, and consequently, distinctive patterns of inheritance.
Allelic patterns.The majority of TRD regions (657) presented an allelic pattern (i.e., identified with the allelic model with strong relevance).Loci with parent-specific TRD were 3 and 131 for dam-and sire-TRD, respectively.In order to target the most promising regions, following Id-Lahoucine 26 , a moderate-to-high TRD of > 0.20 was considered with at least 5,000 under-represented offspring.That is, 52 regions were selected as the most relevant (Table 1 (first 20 regions) and S1 Table (full list)).The average number of under-represented offspring across 52 regions was 10,099, being 41,008 the maximum number of under-represented offspring (Fig. 1).This finding shows that under the hypothesis of lethality, potentially 41,008 offspring would be lost given one single deleterious allele.This particular region was found with 79,200 informative offspring and a frequency of 0.08 (corresponding to the haplotype allele that is under-transmitted).In addition, among these regions, the penetrance (TRD magnitude) via sire and dam was equal or slightly different in 51 TRD regions (S1 Table ).In contrast, only one single region exhibited sire-specific TRD whereas was null via dam (Fig. 2).It is important to add, even those regions had moderate-to-high TRD signals, part of them may have TRD linked to specific families and where further research is required to better target the causal mutation.On the other hand, it is very interesting to mention that one single region of the allelic pattern was observed with opposite sire-and dam-TRD, where sires showed a preferential transmission of one allele (3,055 AB and 1,014 BB offspring from AB (sire) × BB mating) whereas dams showed a preferential transmission for the opposite allele (1,074 AB and 1,870 BB offspring from BB × AB (dam) mating).In fact, an opposite sire-and dam-TRD were also observed on other regions displaying an excess or deficiency of heterozygous offspring (e.g., sire-TRD = − 0.03, dam-TRD = 0.04, 14,406 AA, 16,618 AB and 10,300 BB from AB × AB mating), but this remarkable region showed a peculiar pattern which adds complexity to TRD phenomenon in cattle.
Genotypic patterns.The genotypic model highlighted 19 regions with recessive patterns (Table 2) and 9 with either deficiency or excess of heterozygous offspring.Here, a minimal 10 ≥ non-observed homozygous offspring was required to target recessive TRD.Thus, the number of non-observed homozygous offspring for these regions with recessive pattern ranges from 10 to 564.The lethality among these regions was diverse, some with potentially full lethality (i.e., full absence of homozygous haplotype) or with reduced viability of offspring in homozygous state.Here, 9 haplotypes were detected with full absence of homozygous offspring.The lethality on other regions was observed with different degrees, comprising the smallest change of mortality to 40%.For an illustrative example, a specific region (AR.9) with 176 AA, 1,528 AB and 740 BB offspring from AB × AB matings had an anticipated rate of mortality of 76% ((740-176)/740*100)); notice that 1,528 (AB) / 740 (BB) ≈ 2 maintains the expected Mendelian ratio.It is important to consider that for this pattern the reduced viability was observed only on homozygous offspring and not heterozygous offspring as in the case of allelic TRD patterns.
On the other hand, for the detected regions with recessive patterns, 3 physically close haplotypes (AR.16.a, AR. 16  www.nature.com/scientificreports/heterozygous sires and dams (Table 2), which potentially points to the same causal mutation (SNP, deletion, etc.).The LD between these 3 haplotype alleles (biallelic-haplotype genotypes) were 0.76, 0.43 and 0.51.This result gives extra evidence supporting the TRD found in this particular region.

Model comparison.
The DIC values across models supported the inheritance pattern of TRD region described in the previous sections.Specifically, for recessive TRD regions the genotypic model was favored in comparison to the allelic model with differences up to 209.40 DIC units (average across the 19 regions was 38.99).In contrast, among the regions with allelic pattern (52), 49 fit better for the allelic model, displaying deference of DIC units ranging up to 630.76 with an average of 116.07 DIC units.The remaining regions (3 from 52), despite displaying low DIC for the genotypic model, the distribution of offspring across matings presented an allelic pattern.Their DIC advantage was coming from the combination of additive and dominance parameters that maximizes the likelihood of data, resulting in a similar or better fit for both models.

Validation and comparison of TRD phenomenon and lethality across breeds.. In general, when
comparing TRD findings between Angus and Holstein breeds 26 , the observed prevalence and magnitude of TRD were higher in Angus population.Whereas the number of regions in Angus was 851 with an average of 0.27 overall TRD magnitude, the number of regions identified in another study from our group in Holstein genome was 604 with an average of 0.22 TRD magnitude 26 .In relation to statistical evidence, 814 and 560 regions presented a log 10 (BF) ≥ 10 for overall TRD for Angus and Holsteins, respectively.It is important to mention that this is not a limitation of statistical power because the number of trios used in Holsteins (283,791) was even slightly superior to Angus (205,954).The observed differences between breeds could be explained partially by the different genotype density used for TRD analyses in both breeds, where higher density SNP array was used in Angus (92,942 SNPs) compared to Holsteins (47,910 SNPs).The advantage of using high-density genotypes, which enables the whole genome to be explored more deeply, allows the potential discovery of more candidate deleterious alleles.On the other hand, similar patterns of overall and sire-TRD were observed in both Angus and Holstein breeds in similar positions across the genome.Among the 851 and 604 characterized TRD regions in Angus and Holsteins, respectively, 353 regions presented similar allelic TRD patterns with 46 of them being specific sire-TRD.Regarding the recessive TRD pattern, only one single region identified in Angus with recessive pattern was physically close to a known lethal allele located in BTA21:21,184,869-21,188,198 (AR.16,Table 2) in Holstein cattle with recessive inheritance as well (Holstein haplotype 0 33,34 ).The causative mutation in Holstein haplotype 0, responsible for the brachyspina syndrome, was a 3.3 Kb deletion in the FA complementation group I (FANCI) gene 34 .If we assume no recent common ancestor between both breeds, it is probably the results of independent www.nature.com/scientificreports/mutations in the same genes which generated similar TRD patterns in both breeds, and consequently, may support the biological function of those genes on reproduction-related traits.Within the same context, one of the detected regions with recessive TRD pattern overlapped with a previously reported candidate lethal allele by Jenko et al. 24 located in BTA14:8,064,004-8,927,881 (AR.11,Table 2) in Aberdeen Angus.This reported haplotype was found to be associated with decreased insemination success and longer interval between insemination and calving 24 .The candidate gene for this haplotype was Zinc finger and AT-hook domain containing (ZFAT) which is associated with prenatal or perinatal lethality in the Mouse Informatics Database 24 .In addition, previously characterized lethal alleles by Jenko et al. 24 in Simmental (BTA13:73,746,516-74,973,171) and Limousin (BTA23:27,923,154-28,649,349) were also physically overlapping with our findings, specifically among the relevant 52 allelic TRD regions (AA.33 and AA.44, S1 Table ).On the other hand, among the 7 recessive lethal haplotypes reported by Hoff et al. 23 in Angus, 3 were found overlying with our results in our Angus data but displaying allelic patterns: BTA8:62,040,920-63,000,189 (AA.21),BTA1:27,786,985-29,095,768 (AA.2) and BTA4:82,467,969-83,996,686 (AA.14).Hoff et al. 23 identified a candidate gene located in BTA1, glycogen branching enzyme (GBE1), which found to produce recessive phenotypes in mammals.
Validation of the identified TRD regions using reproductive phenotypes: heifer pregnancy trait.Significant effects of TRD were found in the heifer pregnancy data.In total, 64 and 20 regions showed significant effects at p-value < 0.05 and 0.01, respectively.Particularly, when comparing between AA × AA and AB × -(mating risk), 49 and 12 regions displayed significant effect for the interaction effect on parent genotypes (at p-value < 0.05 and 0.01, respectively; Table 3 (first 30 regions) and S2 Table (full list)).The number of significant regions after controlling false discovery rate (FDR) at chromosome level 35,36 was 8 and 2 (at q-value < 0.05 Table 3. Effects of transmission ratio distortion (TRD) regions on heifer pregnancy and distribution of pregnancy among at risk (one or both parents carrying the deleterious allele) and control matings.a Number of SNPs on the window; b Parents' genotypes; c Number of matings; FDR: chromosome-wise false discovery rate; Full list of regions is provided in S2 and 0.01, respectively).The maximum observed effect was − 0.085.Hence, whereas for non-risk mating (i.e., AA × AA) the average pregnancy rate was 0.87, the observed pregnancy rate reduced down to as low as 0.74, that is, 15% reduced the pregnancy rate.It is important to mention that these effects were supported by the distribution of the pregnancy rate among both sire × dam and sire × maternal grandsire matings.The use of sire × maternal grandsire matings, allows increasing the number of informative matings by using phenotypes of non-genotyped heifers.These results support the relevance of the allelic TRD pattern, where the presence of the deleterious allele in one single parent is enough to reduce the pregnancy success of the animals.In addition, among these 49 regions, only 3 regions (Reg.12 (AA15), Reg.21 (AA.24) and Reg.44 (AA.50)) presented high TRD magnitude (> 0.20) and exhibiting more than 5,000 under-represented offspring.However, the average TRD magnitude and the number of under-represented among the 49 regions significant with the heifer pregnancy was 0.32 and 2,608, respectively (Table 3 and S2 Table).
On the other hand, for AB × AB risk mating (recessive pattern), 15 and 8 regions displayed significant effect for the interaction effect on parent genotypes (at p-value < 0.05 and 0.01, respectively; Table 4).After chromosomewise FDR, the number of regions reduced to 4 and 2 (at q-value < 0.05 and 0.01, respectively).The region (Reg.52; Table 4) with the largest observed effect was − 0.27, with a pregnancy rate of 0.58 (corresponding to the 31% reduced the pregnancy rate) but with only 9 informative records, using maternal grandsire matings, the observed pregnancy rate was 0.72 with 106 records.Only one of the recessive TRD region (AR.18,Table 1; Reg.63, Table 4) showed a significant effect on heifer pregnancy, with a significant effect of − 0.115, reducing the pregnancy rate to 0.75, that is, 11% reduced the pregnancy rate.In addition, it is important to mention that those regions, found with a significant effect when comparing between AA × -and AB × AB matings, a reduced pregnancy rate was observed in the distribution of AA × AA and AB × -matings in some of these regions as well.In fact, their allelic TRD pattern anticipates that one single copy of the deleterious allele could generate a pregnancy loss and not only in the presence of the two copies (homozygous state).Finally, TRD regions that do not impact pregnancy rate are still important, as they potentially impact a different stage of the reproductive cycle, emphasizing the importance of investigating the consequences of all TRD regions.

Conclusions
The analysis of a large genomic dataset allowed the characterization of TRD of the whole genome of Angus breed.Different parametrization uncovered 19 regions with recessive patterns (potential lethality for homozygote individuals) and 52 regions with allelic patterns.The allelic TRD discoveries exhibited complete or quasi-complete absence for homozygous individuals in addition to under-representation (potentially reduced viability) of carrier (heterozygous) offspring and also parent-specific TRD patterns.Using heifer pregnancy data, 64 and 20 regions showed significant effects at p-value < 0.05 and 0.01, reducing the progeny rate up to 15%.After chromosomewise false discovery rate, the number of regions decreased to 12 and 4 at q-value < 0.05 and 0.01, respectively.The overlapping of TRD regions with recently published candidate lethal alleles in Angus supported the consistency of TRD findings.These novel findings in Angus present candidate genomic regions putatively carrying lethal and semi-lethal alleles providing opportunities to reduce the rates of embryonic losses or death of offspring as a way of improving fertility and fitness in beef cattle populations.

Figure 1 .Figure 2 .
Figure 1.Number of observed and expected offspring for each sire and dam mating and offspring genotypes (sire × dam:offspring) of the TRD region with the highest number of under-repented offspring on the Angus genome.Individual SNP with an overall TRD = 0.21 and log 10 (BF) = 3,553.34.

Table 1 .
Potential candidate lethal or semi-lethal haplotype alleles with allelic transmission ratio distortion (TRD) patterns (allelic parametrization) in Angus cattle.

Table 2 .
Potential candidate lethal or semi-lethal haplotype identified with recessive transmission ratio distortion (TRD) patterns (genotypic parametrization) in Angus cattle.a Number of SNPs on the window; b Parents' genotypes; c Offpsring genotype; BF: Bayes factor; Full list of regions is provided in S2 Table.

Table 4 .
Effects of transmission ratio distortion (TRD) region on heifer pregnancy trait and distribution of pregnancy among at risk (both parents carrying the deleterious allele) and control matings.a Number of SNPs on the window; b Parents' genotypes; c Number of matings, FDR: chromosome-wise false discovery rate.