Introduction

The non-Mendelian segregation of markers, known as distorted segregation, is a common biological phenomenon and has been reported since the early twentieth century (Mangelsdorf and Jones, 1926; Sandler et al., 1959; Rick, 1966; McCouch et al., 1988; Paterson et al., 1988; Brummer et al., 1993; Xu et al., 1997; Kaló et al., 2000; Lu et al., 2002; Barchi et al., 2010). It may lead to a biased estimate of the recombination fraction and affect the accuracy of linkage groups (Lorieux et al., 1995a, 1995b). For example, slight but significant segregation distortion results in a reduced estimate of the recombination fraction (Cloutier et al., 1997; Kaló et al., 2000), and an overwhelming number of heterozygous individuals in the F2 population leads to a false genetic linkage of markers (Kaló et al., 2000) and the overestimation of the recombination fraction (Lashermes et al., 2001). These conclusions are not contradictory and can be clearly explained. More specifically, two linked segregation distortion loci (SDL) underestimate the recombinant fraction in most cases and overestimate the recombinant fraction under an additive model with opposite additive effects (Zhu et al., 2007). Therefore, the importance of accurate genetic linkage groups necessitates an in-depth study of marker segregation distortion.

To date, several approaches have been proposed to construct linkage groups. Lander and Green (1987) developed a multi-point method using a Hidden Markov chain model. Jiang and Zeng (1997) extended the multi-point method suitable for dominant and missing markers. However, a question remains how can distorted markers be utilised in the construction of linkage groups? The simplest method is to exclude significantly distorted markers from linkage groups, but this treatment usually reduces the coverage and saturation of the genome (Wang et al., 2005). The most common method is to insert distorted markers into a linkage group. If the new linkage group is seriously different from the old one, the recombination fraction between distorted markers should be re-estimated. However, the traditional approach does not work well because a new variable, selection coefficient, is involved (Kärkkäinen et al., 1996; Kreike and Stiekema, 1997; Faris et al., 1998). To overcome this issue, Lorieux et al. (1995a, 1995b) regarded the selection coefficient as a parameter and adopted the maximum likelihood method to estimate the recombination fraction and selection coefficient simultaneously under a fitness model. Compared with the traditional method, this approach leads to more precise linkage groups, and new software, named MapDisto, is available (Lorieux, 2012). Recently, Zhu et al. (2007) further extended the multi-point method suitable for distorted, dominant and missing markers under the framework of a quantitative genetics model for viability selection (Luo et al., 2005). However, epistatic distorted markers have been not considered in the above methods.

Epistasis, the interaction between loci, has been shown to have a strong association with segregation distortion (Bomblies et al., 2007; Alheit et al., 2011). Epistatic SDL has a significant implication for inbreeding depression (Phillips, 2008), which is mainly manifested as hybrid male or female sterility. Törjék et al. (2006) reported that marker segregation distortion is due to reduced fertility caused by epistasis. Kubo et al. (2008) showed that hybrid male sterility is caused by epistasis between two novel genes, S24 and S35, on rice chromosomes 5 and 1. Similar results have also been found in Drosophila (Chang and Noor, 2010), alfalfa (Li et al., 2011), rice (Xie and Chen, 2012; Yang et al., 2012) and Arabidopsis lyrata (Leppälä et al., 2013). Thus, the Dobzhansky–Muller model, in which hybrid inviability is assumed to be caused by epistasis (Dobzhansky, 1936; Muller, 1942), has been widely accepted. In addition, McMullen et al. (2009) investigated genome-wide segregation distortion among nested association mapping populations and indicated that epistasis affected fitness. Therefore, epistatic SDL should be considered in the construction of precise linkage groups.

In this study, we integrated the fitness model for viability selection with the liability model and developed a new method to correct the recombination fraction between epistatic distorted markers in backcross and F2 populations. A series of simulated data sets along with a real data set was analysed to validate the proposed method, and the statistical properties of the new method were summarised and confirmed.

Materials and methods

Genetic model in a backcross population

The new method in this study was developed on the basis of a backcross population. The extension to F2 populations is mentioned briefly in a subsequent section. In this study, the recombinant fraction between epistatic distorted markers was corrected, and the molecular marker information from all n individuals was used to detect the epistatic SDL under the liability and fitness models. The gametic and zygotic selections in the backcross are the same. Thus, the two cases are discussed together.

Liability model

If the selection in a backcross is controlled by two linked SDL, with a recombinant fraction of r, the liability zj of the jth individual may be described by the following model:

where ak is the main effect of the kth SDL (k=1, 2); i is the epistatic effect between the two SDL; two genotypes for any one locus are assumed to be SS and Ss, respectively; xjk is the dummy variable defined as xjk=1 for SDL homozygote SS and as xjk=−1 for SDL heterozygote Ss; and εjN(0, σ2) is a normally distributed residual error. In addition, set σ2=1 for convenience (Luo et al., 2005). The model (1) can be simply expressed as

We hypothesise that the liability is subject to natural selection. An individual will survive if zj0 and will be eliminated from the population if zj<0. As all of the sampled individuals have survived from the viability selection, the liability of each observed individual will follow a truncated normal distribution with a cumulative probability:

This result may be considered to be the relative fitness for individual j and is denoted by Φ(Xjb). Because four possible genotypes for two linked SDL exist, the relative fitness (l=1,…,4) can be easily defined. Therefore, the expected frequencies of the four genotypes after selection are easily calculated and are listed in Table 1.

Table 1 Expected frequencies of four genotypes under the liability and fitness models in a backcross population

Fitness model

In the fitness model, the viability coefficients for the S1s2, s1S2 and s1s2 gametes relative to S1S2 are defined to be v, u and x, respectively, which means that the fitnesses for S1S1S2S2, S1S1S2s2, S1s1S2S2 and S1s1S2s2 in the backcross are 1, v, u and x, respectively. The case u=v=x=1 indicates no selection, which is a typical Mendelian segregation. Therefore, the expected frequencies (l=1,…,4) of the above four genotypes among surviving individuals are also easily calculated and are listed in Table 1.

Relationship between parameters in the above two models

The expected frequencies of one genotype under the liability and fitness models should be the same, that is, (l=1,…,4). Therefore, the relationship between parameters in the two models can be expressed as

Likelihood function and parameter estimation in a backcross

Although the genotypes of two SDL in the above two models are unobserved, the genotypes of markers flanking with the SDL are observed. Assume that two loci, S1 and S2, are located between markers A and B and between markers C and D, respectively, and that the recombination fractions between A and S1, between S1 and B, between B and C, between C and S2 and between S2 and D are r1, r2, rBC, r3 and r4, respectively. The expected frequencies of the 16 observed genotypes of markers A, B, C and D are calculated and listed in Table 2.

Table 2 Expected frequencies of the 16 genotypes of markers A, B, C and D under the epistatic SDL genetic model in a backcross population

Let nk and pk (k=1,…,16) be the observed number and expected frequencies of the kth genotype for the four markers and be the total number of all individuals. The likelihood function in a backcross is

However, the maximum likelihood estimate in equation (5) is complicated. Thus, the complete information that includes all 64 genotypes for four markers and two SDL was used to construct the likelihood function, which is expressed as

where pkl and nkl (k=1,…,16; l=1,…,4) are the expected frequency and the observed number for the kth marker genotype and the lth SDL genotype, respectively, and . Theoretically, the Newtow–Raphson method may be used to obtain the maximum likelihood estimates in equation (6). Here, we adopt the expectation–maximisation (EM) algorithm (Dempster et al., 1977). The logarithm likelihood function is

where . The maximum likelihood estimate of each parameter is found by setting its partial derivative to zero and solving the equation to obtain

where , , , , and . The estimates for r1 and r2 were used to correct the recombination fraction between markers A and B: rAB=r1+r2−2r1r2; similarly, rCD=r3+r4−2r3r4. When m markers are located in a linkage group, the number of estimates for rAB is . Among these estimates, some may be overestimated and some may be underestimated; in this study, the median is our suggested estimate, which is validated by Monte Carlo simulation experiments. Although only selection parameters u, v and x were estimated, these parameters in the fitness model can be transferred to those in the liability model using equation (4). Therefore, only the estimates of parameters in the fitness model are given in this study.

Variance of recombination fraction

The expected Fisher’s information score of the recombination fraction is given by

Where ln L=(nAB+nab)ln(1−r)+(nAb+naB)ln r+nAb ln v+naB ln u+nab ln xn ln[(1−r)(x+1)+r(u+v)]. For large samples, the variance of r was estimated by

Genetic model under zygotic selection in the F 2 population

Liability model

The liability zj of the jth F2 individual under study could be described by the following model:

where ak and dk are the additive and dominant effects of the kth SDL (k=1, 2), respectively; i, j12, j21 and l are the additive-by-additive, additive-by-dominant, dominant-by-additive and dominant-by-dominant interaction effects of the two SDL, respectively; xj.. is the dummy variable defined as xjk1=1 and xjk2=0 for SDL homozygote SS, xjk1=0 and xjk2=1 for SDL heterozygote Ss and xjk1=−1 and xjk2=0 for SDL homozygote ss (k=1, 2); and the other variables are similar to those in model (1). As nine possible genotypes for two linked SDL exist, the relative fitness fl (l=1,…,9) can be easily calculated, and both the results and the expected frequencies are listed in Table 3.

Table 3 Expected frequencies of the nine genotypes under zygotic and gametic selection in the F2 population

Fitness model

Two SDL under study are linked with a recombination fraction of r. In zygotic selection, the viabilities of S1S1S2s2, S1S1s2s2, S1s1S2S2, S1s1S2s2, S1s1s2s2, s1s1S2S2, s1s1S2s2 and s1s1s2s2 relative to S1S1S2S2 are assumed to be v2, v1, u2, x4, x3, u1, x2 and x1, respectively. Their expected frequencies (l=1,…,9) are also listed in Table 3.

Relationship between parameters in the above two models

The expected frequencies of one genotype under the liability and fitness models should be the same, that is, (l=1,…,9). Therefore, the relationship between parameters in the two models can be expressed as

Genetic model under gametic selection in the F 2 population

Liability model

The genetic model under gametic selection is the same as model (11). Assume that female gametes and their mating processes are normal, that is, the frequencies of female gametes S1S2, S1s2, s1S2 and s1s2 are (1−r)/2, r/2, r/2 and (1−r)/2, respectively. If the marker segregation ratio shows deviation from the Mendelian ratio, the distortion is derived from the male gametes of an F1 plant. Note that the frequencies of the nine genotypes under gamete selection are same as those under zygotic selection and that genotypes S1S1S2S2, S1S1s2s2, s1s1S2S2 and s1s1s2s2 are uniquely derived from the crosses S1S2/S1S2, S1s2/S1s2, s1S2/s1S2 and s1s2/s1s2, respectively. Thus, the frequencies of male gametes S1S2, S1s2, s1S2 and s1s2 are 2(1−r)f1/d1, 2rf3/d1, 2rf7/d1 and 2(1−r)f9/d1, respectively, and the expected frequencies of the nine genotypes in F2 can be calculated as in Supplementary Table S1 (listed in Table 3). If we compare columns 4 and 6 in Table 3, the following equations can be found: 2f2=f1+f3, 2f4=f1+f7, 2f6=f3+f9, 2f8=f7+f9 and 2f5=f1+f9=f3+f7.

Fitness model

Let the viabilities of male gametes S1s2, s1S2 and s1s2 relative to S1S2 be vg, ug and xg, respectively. The expected frequencies of the nine genotypes under gametic selection are listed in Table 3.

Relationship between parameters in the above two models

The expected frequencies of one genotype under the liability and fitness models should be the same, that is, (l=1,…,9). The relationship between parameters in the two models can be expressed as

which is the same as equation (4) in the backcross. In fact, this relationship is reasonable. Under the situation of gametic selection, selection occurs during the gamete production stage but not the mating process. As we know, these gametes are formed in the F1 plant stage, which is similar to a backcross.

Likelihood function and parameter estimation in the F2 population

If pl and nl (l=1,…,9) are the expected frequencies and observed number of the kth genotype for the two SDL, and is the total number of individuals, then the general likelihood function in F2 can be expressed as

where pl is under gametic selection or under zygotic selection.

Parameter estimation under zygotic selection

The genotypes of an SDL are unobserved if the SDL does not reside at the marker position. As described in a backcross, the information for four markers flanking with the two SDL can be used to estimate all of the parameters. However, there are 4096 (64 × 64) gamete combinations and 729 genotypes for four markers and two SDL. Using this calculation, it is time consuming to estimate the parameters. To reduce the running time, the information for three markers (A, B and C) flanking with the two SDL (S1 and S2) is utilised. The order of these loci are A, S1, B, S2 and C, and the recombination fractions for the four linked intervals are r1, r2, r3 and r4, respectively. There are 27 genotypes (observed) for three markers and nine genotypes (unobserved) for the two SDL. Thus, the complete information likelihood function is

where pkl and nkl (k=1,…,27; l=1,…,9) are the expected frequencies and observed numbers of the kth marker genotype and the lth SDL genotype, respectively. The logarithm likelihood function and the partial derivative for each parameter are given in Supplementary Material A. The estimations of the parameters are

where d, , , , and (i=1,2,3,4; j=1,2) are listed in Supplementary File zygotic.xls. The estimates for r1 and r2 are used to estimate the recombination fraction between markers A and B: rAB=r1+r2−2r1r2, which is the corrected recombinant fraction. When m markers are located in a linkage group, the number of estimates for rAB is m−2. Similarly, the median is the suggested estimate.

Parameter estimation under gametic selection

Four parameters, r, ug, vg and xg, under gametic selection need to be estimated. The procedures and algorithm for the parameter estimation are similar to those under zygotic selection. Similarly, we obtain

where d, (i=1,2,3,4), , and are listed in Supplementary File gametic.xls. The strategy for estimate of r is the same as that under zygotic selection.

Variance of recombination fraction

The variances of recombination fraction r under gametic and zygotic selection in the F2 population are

respectively, where K=2r(u1+v1+x1+1)+2(2r−1)(2x4u2v2x2x3)−2(x1+1) and D=(1−r)2(x+1)+2r(1−r)(u2+v2+x2+x3)+r2(u1+v1)+2(1−2r+2r2)x4.

Detection of selection type in the F 2 population

The χ2-test of Pham et al. (1990) is used to determine whether the numbers of AA, Aa and aa genotypes in F2, nAA, nAa and naa follow the Mendelian segregation ratio of 1: 2: 1

If the test is significant, selection exists. To further clarify the selection type (that is, gametic vs zygotic), Lorieux et al. (1995b) suggest two χ2 tests,

where is the allelic frequency of A in F2. Gametic selection occurs if but not is significant; zygotic selection occurs if is significant (Lorieux et al., 1995b).

Statistical properties

At present, there are three approaches available. The first is the method that does not consider the effect of distorted markers, named method I, implemented by MapMaker v3.0 (Lander et al., 1987) or JoinMap v4.0. The second is the method that considers the effect of distorted markers, named method II (Lorieux et al., 1995a, 1995b; Zhu et al., 2007). The third is the new method described in this study, which considers the effect of epistatic distorted markers. Compared with methods I and II, some properties of the new method in a backcross population are summarised below:

  1. a)

    The new method is equal to method I when u=v=x=1, and the new method is equal to method II when u≠1, v≠1 and x=1. This finding means that the new method is general and that methods I and II are specific.

  2. b)

    An unbiased estimate for the recombinant fraction can be obtained when x+1=u+v or for method I; x=uv or for method II; and for all situations for the new method.

  3. c)

    The overestimate for the recombinant fraction occurs when or u+v>x+1 for method I and or uv>x for method II. The underestimate for the recombinant fraction occurs when or u+v<x+1 for method I and or uv<x for method II.

  4. d)

    Two linked SDL affect the estimates of the recombinant fraction for all marker intervals within the two linked SDL. The evidence is shown in Supplementary Material B.

  5. e)

    The variance of recombinant fraction r for the new method is equal to and less than that for method I when u+v=x+1 and u+v<x+1, respectively. If u+v>x+1, two situations occur: the variance of r for the new method is greater and less than that for method I when and , respectively. The evidence is shown in Supplementary Material C.

Results

Monte Carlo simulation

Effect of heritability, marker interval length and sample size on the estimate of map distance

Nine equally spaced markers were simulated on a single-chromosome segment in a backcross population. Two SDL were placed at the middle of the second and seventh marker intervals. Three levels were set up for each factor in Monte Carlo simulated experiments: (1) SDL heritability, 2, 5 and 10%; (2) interval length between adjacent markers, 5, 10 and 15 cM; and (3) sample size, 100, 200 and 300. All of the simulated parameters are shown in Supplementary Table S2. For each parameter combination, 200 replicated experiments were conducted, and the absolute bias and s.d. among the estimates from the 200 replicates were used to estimate the precision. All of the results for the backcross population are listed in Figures 1 and 2. The results showed that all of the estimates from the new method were unbiased (Figure 1). The two linked SDL do not affect the estimates of the map distances for marker intervals 1 and 8 (outside the two SDL) but do affect the estimates of the map distances for marker intervals 2–7 (within the two SDL) when methods I and II are adopted (Figure 1). In addition, the absolute bias and s.d. of the new method increase as the SDL heritability and marker interval length increase, and they decrease as the sample size increases (Figures 1 and 2). Similar results are also observed in F2 (Supplementary Figures S1 and S2).

Figure 1
figure 1

Effect of SDL heritability (a), interval length (b) and sample size (c) on the estimate of map distance in a backcross population. (a) Interval length, 10 cM; sample size, 300; (b) SDL heritability, 5%; sample size, 300; and (c) SDL heritability, 5%; interval length, 10 cM.

Figure 2
figure 2

Effect of SDL heritability (a), interval length (b) and sample size (c) on the s.d. of the estimates from the new method in a backcross population. (a) Interval length, 10 cM; sample size, 300; (b) SDL heritability, 5%; sample size, 300; and (c) SDL heritability, 5%; interval length, 10 cM.

Effect of the SDL genetic model on linkage map construction

Eight genetic modes of SDL, listed in Figure 3 and Supplementary Table S3, were set up to investigate the effect of the SDL genetic model on the map distance under the liability and fitness models. The sample size was 300, and the marker interval length was 10 cM. The other parameters were the same as those in the above simulated experiment. All the results in the backcross are listed in Figure 3. The results were as follows: (1) all the estimates from the new method were unbiased. (2) Using method I, the estimates under SDL genetic modes 5–8 were unbiased because and u+v=x+1 (Figures 3e–h, and Supplementary Table S3); the estimates under SDL genetic modes 1 and 3 were underestimated because and u+v<x+1 (Figures 3a and c, and Supplementary Table S3); and the estimates under SDL genetic modes 2 and 4 were overestimated because and u+v>x+1 (Figures 3b and d, and Supplementary Table S3). Using method II, the estimates under SDL genetic modes 7 and 8 were unbiased because and uv=x (Figures 3g and h, and Supplementary Table S3); the estimates under SDL genetic modes 1, 3 and 5 were underestimated because and uv>x (Figures 3a, c and e, and Supplementary Table S3); and the estimates under SDL genetic modes 2, 4 and 6 were overestimated because and uv>x (Figures 3b, d and f, and Supplementary Table S3). (3) The bias was proportional to the above related size difference. For example, the bias of the estimates from method I in Figure 3d is larger than that in Figure 3b because in Figure 3d is larger than 0.62 in Figure 3b.

Figure 3
figure 3

Effect of SDL genetic mode on the estimate of map distance in a backcross population. The SDL parameters are as follows: (a) a1=a2=i=0.5; (b) a1=−a2=−i=0.5; (c) a1=a2=0,i=0.5; (d) a1=a2=0,i=−0.5; (e) a1=−a2=0,i=0.5; (f) a1=a2=0.5,i=0; (g) a1=−0.5,a2=i=0; and (h) a1=0.5,a2=i=0. The relationship among the parameters in the liability and fitness models is shown in Supplementary Table S3: and u+v<x+1 (a, c); and u+v>x+1 (b, d); and u+v=x+1 (eh); and uv<x (a, c, e); and uv>x (b, d, f); and uv=x (g and h).

Effect of selection type in the F2 population on linkage map construction

Gametic and zygotic selections of SDL in the F2 population were simulated to investigate the effect of the selection type on map distance. SDL heritability was set at 5%, sample size was 300 and marker interval length was 10 cM. The other parameters were the same as those in the first simulated experiment. All of the data sets were first analysed by the χ2-test to determine the selection type. The results are listed in Table 4 and are consistent with the theoretical results. Each data set was then analysed twice: once under gametic selection and once under zygotic selection. The purpose was to determine which method was better in the case of inconsistent selection types of adjacent markers. The results are listed in Table 5. The results showed that new method works well under consistent selection types of adjacent markers. If gametic selection occurs at the ith locus and zygotic selection occurs at the (i+1)th locus, how to select the method for parameter estimation was unclear. As a result, the absolute bias under zygotic selection is less than that under gametic selection. Therefore, we recommend the zygotic selection approach to address this case.

Table 4 χ2-tests for marker segregation distortion, and gametic and zygotic selection
Table 5 Comparison of the gametic and zygotic model methods under gametic and zygotic selection in the F2 population

Real data analysis in rice

To further demonstrate the new method, a real data set for a backcross population (Oryza sativa/Oryza longistaminata//O. sativa) (Causse et al., 1994) was downloaded from the McCouch RiceLab website (http://ricelab.plbr.cornell.edu/Causse_at_al_1994) and re-analysed. The data set is composed of 617 markers on 12 chromosomes. On the basis of 12 linkage groups constructed by Mapdisto v1.7.7 (Lorieux, 2012), all of the map distances between flanking markers were corrected by software package DistortedMap of the new method (Supplementary file DistortedMap). All of the results are listed in Supplementary Table S4 and Supplementary Figure S3. To further illustrate the new method, all of the map distances on chromosome 3, with several severely distorted segregation regions, were calculated by methods I, II and new (Table 6). As a result, in regions with normal Mendelian segregation, the estimates of the recombinant fraction by the above three methods were similar, such as for markers CDO375, RCH and RZ696, and almost all the estimates for u, v and x were closer to 1 than those in regions with distorted segregation. In the distorted segregation region between markers RZ585 and RZ284, the χ2-test for marker RZ284 was significant (χ2=18.60, P=1.61e−5), and the map distances of 4.75 and 5.29 cM, calculated by methods I and II, respectively, were less than 6.52 cM, calculated by the new method, indicating that the results from methods I and II were underestimates because u+v=0.93<x+1=1.30 for method I and uv=0.19<x=0.30 for method II.

Table 6 Comparison of the map distances on chromosome 3 from methods I, II and the new method in a rice data analysis

Discussion

Although the new method proposed for linkage map correction in this study is based on the epistatic genetic model of SDL, it is suitable not only for the above model but also for normal (Supplementary Table S5) and distorted (Figure 3) markers. When no SDL is identified in a linkage group, the estimates for map distances by the above three approaches are almost unbiased and close to the true values (Supplementary Table S5). We also calculated the s.d. of the estimates by two approaches: one using Fisher’s information (SD1) and the other using the variation of the estimates across 200 replicates (SD2). For the former, similar results were observed across the above three methods; for the latter, slightly increased results were found from method I to II and from method II to the new method (Supplementary Table S5). These findings are reasonable because the number of parameters gradually increased in the above three methods, and the accumulated errors from their corresponding estimates were also increased. If multi-SDL are considered in a linkage group, the corrected results for the genetic distance are more accurate using the new method than using methods I and II (Supplementary Table S5). However, SD1 and SD2 are slightly larger for the new method than for methods I and II. The theoretical evidence is given in Supplementary Material C.

With respect to statistical properties 2 and 3 in the backcross, the evidence exists. Using method I, the recombinant fraction is estimated by . If the expectations of n2, n3 and n in the liability and fitness models are used to estimate n2, n3 and n, respectively, the two properties can be demonstrated. In the liability model, . If , then , which is an unbiased estimate; if , then , an overestimate; and if , then , an underestimate. In the fitness model, . By using a similar approach, the statistical properties 2 and 3 can be obtained. Using method II, the recombinant fraction is estimated by (Lorieux et al., 1995a). In the fitness model, the estimate is changed to . If uv=x, then , which is an unbiased estimate; if uv>x, then , an overestimate; and if uv<x, then , an underestimate. In the liability model, . By using a similar approach, the statistical properties 2 and 3 can also be obtained. These properties have been confirmed by the Monte Carlo simulation studies and real data analysis in this study.

If two SDL of one SDL-by-SDL interaction are placed in the same linkage group, this interaction does not affect the estimate of the recombinant fraction outside the SDL intervals. In Supplementary Material B, the estimates for the recombinant fractions outside the SDL intervals are obtained as R1=(nAb+naB), R2=(nBc+nbC)/n, R3=(nDe+ndE)/n and R4=(nEf+neF)/n. Obviously, the four estimates are independent of the viability parameters, which are evidence of the above result. Similar evidence was also observed in the Monte Carlo simulation studies. If two SDL of one SDL-by-SDL interaction are placed in different linkage groups, this interaction does not affect the estimates of the recombinant fraction. In Supplementary Material B, the estimates for the two recombinant fractions involved this interaction are obtained as and . Obviously, the two estimates for both r1 and r2 are independent of the viability parameters, representing evidence of the above result. In addition, the results in Figures 2g and h showed that the estimate for the recombination fraction is unaffected by the distorted markers due to only one SDL in one linkage group. This finding is consistent with the previous results in Bailey (1949), Lorieux et al. (1995a, 1995b) and Zhu et al. (2007).

In linkage group construction, some multi-point approaches are widely used. Among these approaches, Lander and Green (1987) first proposed a Markov chain multi-point approach to utilise missing markers. Jiang and Zeng (1997) then extended the method of Lander and Green (1987) to address dominant and missing markers, and Zhu et al. (2007) further extended the multi-point method to address distorted, dominant and missing markers. In this study, epistatic distorted markers are also considered as well. In fact, once the transition probability matrix H(r) from markers A to B is determined, the multi-point method including epistatic distorted markers should work well. Here, we provide these matrices as follows:

for a backcross population,

for the gametic selection approach in F2, and

for the zygotic selection approach in F2.

In animal and plant genetics, epistasis for viability selection has been detected in the studies of Chang and Noor (2010), Li et al. (2011) and Kubo et al. (2008). Thus, the method in this study should be developed. This method may be extended to additional biparental populations, for example, recombination inbred lines. The new method deals only with recombinant fraction correction, not with linkage group construction. With respect to this construction, the Mapmaker, Mapmanager, Joinmap and Mapdisto programs are available. Once linkage groups have been constructed and distorted markers exist, the new method can be used to correct bias.

Data archiving

All simulated datasets are available from the public website: http://jpkc.njau.edu.cn/swtj/show.asp?classid=44&classtype=26. The real dataset can be retrieved from: http://ricelab.plbr.cornell.edu/Causse_at_al_1994.