Introduction

Crossovers in meiosis break the physical linkage among loci and allow formation of recombinant chromosomes and ensure chromosome segregation in meiosis I (Koehler et al. 1996, Hassold et al. 2021). Although crossovers and the resultant recombinations have been studied for more than a hundred years (Sturtevant 1913; Sturtevant 1915), many related questions remain unanswered. Due to their profound importance in sexual reproduction, substantial research efforts have focused on better understanding the among-organism and individual variation of crossover rate (e.g., Stapley et al. 2017; Haenel et al. 2018), and on the other hand, recent technologies have been utilized for detecting recombinations at the gamete level (Dréau et al. 2019; Bell et al. 2020; Yang et al. 2022). Crossovers also have implications for statistical properties essential in population genetics, such as the variance of genetic relatedness (Veller et al. 2020). However, one aspect that has gained little attention in the era of high-throughput sequencing is the interpretation of genetic map distances.

Recombinant gametes or offspring can be utilized to build linkage maps that quantify the physical order and map distance (i.e., expected number of crossovers) between loci. Map distances are estimated with mapping functions that attempt to account for the non-additivity of the recombination frequencies due to multiple crossovers between adjacent loci. The two most widely recognized mapping functions are probably those of Haldane (1919) and Kosambi (1944), (e.g., Lynch and Walsh 1998; Visscher et al. 2006; Hill and Weir 2011; Otto and Payseur 2019). However, modern sequencing methods and increasing marker density have reduced the utility of these functions in linkage map reconstruction; the probability of multiple crossovers between closely positioned adjacent loci is negligible and all mapping functions yield essentially the same result, r = d, where r is recombination frequency and d is map distance in Morgans (e.g., Purcell et al. 2007). Map distances are estimated only for adjacent loci; they are, by definition, additive over longer intervals. Consequently, recombination frequencies over intermediate or long map distances (e.g., 50 cM) do not follow any simple association.

Although the importance of mapping functions in linkage map construction has decreased over time with increasing access to dense marker data, inverse mapping functions have recently been utilized to predict genetic shuffling in meiosis (Veller et al. 2019) and from that the variance in genetic relatedness (Veller et al. 2020). It is technically trivial to translate map distance to recombination frequency with an inverse of mapping function, but as all mapping functions effectively yield the same map for high-density data, it is not clear which inverse mapping function to use. Veller et al. (2020) observed that empirical variance in genetic relatedness among human (Homo sapiens) subjects did not match those predicted with the inverse of Kosambi function, indicating that the inverse of Kosambi function is invalid for translating map distances into recombination frequencies.

Here, we show with empirical data from humans and from nine-spined (Pungitius pungitius) and three-spined (Gasterosteus aculeatus) sticklebacks, that the inverse of Kosambi, Haldane, or linear mapping functions do not translate additive map distances correctly to recombination frequencies. To that end, we propose a new approach to translate map distances to recombination frequencies using a piecewise function based on the probability of no crossovers between the markers. We demonstrate this approach and its performance with empirical data from humans, nine- and three-spined sticklebacks.

Materials and methods

Number of crossovers and recombination frequency

An odd number of gametic crossovers between two loci cause recombination, and the recombination frequency of two loci is equal to the probability of an odd number of gametic crossovers between them. Assuming that there is no chromatid interference, any positive number of crossovers in the bivalent leads to equal proportions of recombinant and non-recombinant gametes, while the absence of crossovers between two loci always leads to non-recombinant gametes. From this follows that recombination frequency, r, can be expressed as a function of probability of no crossovers between two loci in the bivalent p0, so that r = ½(1 − p0) (Mather 1938; Weeks et al. 2009). To translate map distance into recombination frequency, the association between map distance and p0 is necessary.

Map distance of two loci is the expected number of gametic crossovers between them. However, the distance does not tell the variation around the expectation or the likelihood for no bivalent crossovers (p0) in that particular interval. Depending on the number of crossovers in the bivalent and their localization, the same map distance can be associated with different values of p0 and recombination frequency (Fig. 1). To address this ambiguity, we formulate a piecewise function p0(k) that gives the probability of no crossovers between the two loci in the bivalent with k crossovers.

Fig. 1: Number and location of crossovers affect the recombination frequency.
figure 1

A An example of a chromosome with always two crossovers in the bivalent so that one occurs in the yellow and the other one in the blue area. The locations of the crossovers within their distinct regions are independent. B The table shows the probability for 0, 1, or 2 crossovers between the markers (p0, p1, and p2, respectively) and the expected number of crossovers in the bivalent, the resulting recombination frequency of the marker pair (in the gamete) and the estimated map distance for every marker pair using different mapping functions. C Graph showing the relationship between the recombination frequency and map distance with different mapping functions; the linear function was capped at 0.5, shown with the dashed line. Notably, intervals A–C, A–D, and A–E have the same recombination frequencies but different map distances, whereas marker pairs A–C and B–D have the same map distances but different recombination frequencies.

Derivation of p0(k)

For deriving the p0(k) function, a model of crossover localization per map distance is needed. We assume that when there are k crossovers in the bivalent they occur in k distinct regions, and within those regions the exact localization is independent of other crossovers in the bivalent. Let d be the map length of the whole chromosome. If there are k(k > 0) crossovers in the bivalent, they are assumed to occur in k non-overlapping regions of equal size, \(\frac{d}{k}\), such that every crossover occurs in a different chromosomal region. Within each region, their localizations are uniformly distributed and independent across the regions. For a given marker pair, \(p_0 = \mathop {\sum}\nolimits_{k = 0}^n {P\left( {\left. {{{{\mathrm{no}}}}\,{{{\mathrm{crossovers}}}}} \right|\,k} \right)} P\left( {k\,{{{\mathrm{crossovers}}}}} \right) = \mathop {\sum}\nolimits_{k = 0}^n {p_kp_0\left( k \right)}\), where n is the highest possible number of crossovers in the bivalent, pk is the probability of k crossovers in the bivalent and p0(k) is the probability of no crossovers between the markers when there are k crossovers in the bivalent. For markers at map positions mi and mj (mj ≥ mi), p0(k) is defined as:

$$p_0\left( k \right) = \left\{ {\begin{array}{*{20}{l}} {1,\,if\,k\, = \,0} \\ {1 - \frac{{\left| {m_j - m_i} \right|}}{{\frac{d}{k}}},\,if\,\lceil\frac{{m_j}}{{\frac{d}{k}}}\rceil = \lceil\frac{{m_i}}{{\frac{d}{k}}}\rceil\,and\,k\, > \,0} \\ {\left( {1 - \frac{{\left| {b_i - m_i} \right|}}{{\frac{d}{k}}}} \right)\frac{{\left| {b_j - m_j} \right|}}{{\frac{d}{k}}},\,if\,\lceil\frac{{m_j}}{{\frac{d}{k}}}\rceil - \lceil\frac{{m_i}}{{\frac{d}{k}}}\rceil = 1\,and\,k\, > \,0} \\ {0,\,if\,\lceil\frac{{m_j}}{{\frac{d}{k}}}\rceil - \lceil\frac{{m_i}}{{\frac{d}{k}}}\rceil\, > \,1\,and\,k\, > \,0} \end{array}} \right.$$
(1)

where \(\lceil{\frac{{m_i}}{{\frac{d}{k}}}}\rceil\) and \(\lceil{\frac{{m_j}}{{\frac{d}{k}}}}\rceil\) are the crossover regions of the markers mi and mj, respectively, and bi and bj are the upper boundaries for these regions, respectively, so that \(b_i = \lceil{\frac{{m_i}}{{\frac{d}{k}}}}\rceil\frac{d}{k}\) and \(b_j = \lceil{\frac{{m_j}}{{\frac{d}{k}}}}\rceil\frac{d}{k}\). Notation refers to ceiling function, and \(\lceil{\frac{{m_j}}{{\frac{d}{k}}}}\rceil\) is the least integer greater than, or equal to \(\frac{{m_j}}{{\frac{d}{k}}}\).

This function is also applicable for multiple loci. In the case of three markers (mi, mj, mk), the recombination frequencies derived with this function meet the criteria \(r_{ij} + r_{jk} \ge r_{ik}\), where rij, rjk, and rik are the recombination frequencies between markers mi and mj, mj and mk, and mi and mk, respectively (Karlin and Liberman 1978; Weeks 1994; see also Supplementary Methods). Example of applying the function is presented in Supplementary Fig. S1.

Our approach is built on three premises: (1) Absence of chromatid interference; (2) even spacing of crossovers due to crossover interference; and (3) the fact that the number of bivalent crossovers in meiosis I varies across individuals, chromosomes, sexes, and species. The second and the third premises have been demonstrated several times in the literature and are merely facts. Crossover interference causes even distribution of crossovers by their physical distances (micrometers) (Zhang et al. 2014; Zickler and Kleckner 2015) and approximately by base pairs, as the chromatin packing ratio per nucleus is roughly constant (see Supplementary Fig. S2 in Veller et al. 2019). However, the spatial distribution per map distance cannot be concluded directly. The number of crossovers per chromosome is not constant but varies between chromosomes, nuclei, individuals, sexes, and species (e.g., Stapley et al. 2017) and hence affects the spatial distribution of crossovers (Charles 1938; Zhang et al. 2014). While the absence of chromatid interference is a common assumption in the literature (e.g., Weinstein 1936; Zhao and Speed 1996; Sandor et al. 2012), this has been empirically tested and confirmed only in a few organisms, including humans and yeast (Zhao et al. 1995; Mancera et al. 2008; Hou et al. 2013; Wang et al. 2019).

In contrast to existing mapping functions, the proposed model is not based on the renewal process (Zhao and Speed 1996) and it does not model crossover interference parametrically. Instead, crossover interference is implemented structurally by assuming that crossovers occur in close proximity less often than would be expected by chance in the absence of crossover interference.

We applied the function for sex-specific recombination data in the 21 chromosomes of nine- and three-spined sticklebacks and in the 22 human autosomes. The total map lengths were derived from the linkage maps and the likelihoods for different numbers of crossovers in the bivalent were inferred from the observed number of crossovers (see ‘Inference of crossover frequency’ below). Implementation of the p0(k) function in R is provided in the ‘Data availability’ section.

Stickleback linkage maps

The crossover frequencies and locations were estimated from the linkage maps described in Kivikoski et al. (2021). For the nine-spined stickleback, high-density linkage maps (22,468 markers informative to conclude crossover) were reconstructed with Lep-MAP3 software (Rastas 2017) from a data set of 133 parents and 938 F1 offspring. The parental fish, 46 females and 87 males, were wild-caught individuals from the Baltic Sea coast of Finland (Helsinki, 60 °13’N, 25 °11’E) that were artificially crossed in laboratory to produce the aforementioned F1 offspring (Kivikoski et al. 2021, see also Rastas et al. 2016). Five females were each crossed with a different male, forming five full-sib families, and the other 41 females were each crossed with two different males, which formed 41 half-sib families.

Identification of the single nucleotide polymorphisms (SNPs) of the parental and the F1 fish were based on whole-genome sequencing of the parents (5–10X coverage; Illumina Hiseq platforms, BGI Hong Kong) and DarTseq (Diversity Arrays Technology, Pty Ltd) genotyping of the F1 fish (Kivikoski et al. 2021). The read mapping was conducted with BWA-mem (ver. 0.7.15, Li 2013) and the variants were called with SAMtools mpileup (ver 1.9, Li et al. 2009) following the Lep-MAP3 software pipeline (Rastas 2017). The linkage maps were built with Lep-MAP3, and the number of paternal and maternal crossovers were inferred from the observed changes in the haplotype phase of the F1 offspring. Crossovers could not be inferred for four crosses with only single offspring, and the final dataset included 934 offspring in total.

The three-spined stickleback linkage maps were based on previously published sequencing data (Pritchard et al. 2017). In short, ninety wild-caught parental fish (Baltic Sea, Helsinki, Finland, 60°13’N, 25°11’E) were crossed such that the males (n = 30) were each crossed with two different females (n = 60). This yielded 60 and 30 full-sib and half-sib families, respectively, with 517 F1 offspring. Genotyping of the parental and the F1 fish were based on genotype-by-sequencing, according to the Restriction-site Associated DNA (RAD) sequencing protocol of Elshire et al. (2011). The crossing, rearing protocols and sequencing data are explained in more detail in Leder et al. (2015) and Pritchard et al. (2017). For this study, the RAD reads from Pritchard et al. (2017) were mapped to three-spined stickleback reference genome (v4, Peichel et al. 2017) and the variants were called following the Lep-MAP3 pipeline as for the nine-spined sticklebacks. This yielded 28,187 informative markers to build linkage maps with Lep-MAP3.

For all linkage maps (maternal and paternal maps of the nine- and three-spined stickleback), the genetic distances between the adjacent markers were calculated from recombination frequency with the Haldane mapping function. The distances are additive for non-adjacent markers. There were, on average, 1070 and 1342 markers per chromosome in the nine- and three-spined stickleback maps, respectively. Hence, the inter-marker distances were short: on average 19,571 bp corresponding to 0.054 cM and 0.106 cM in the paternal and maternal maps of the nine-spined stickleback, respectively, and 15,461 bp corresponding to 0.043 cM and 0.075 cM in the paternal and maternal maps of the three-spined stickleback, respectively. As all conventional functions yield very similar results for small recombination frequencies, the choice of the mapping function has a minor impact on the map distances.

Analysis of human recombination data

To evaluate the general applicability of the new function, we analyzed human data from Halldorsson et al. (2019). This consisted of sex-specific linkage maps (their Supplementary Data S1 and S2) and the crossover data (their Supplementary Data S4). Crossover locations and counts were obtained from the column ‘medsnp’ of the sex-specific linkage maps for the 41,092 probands with both paternal and maternal crossover information. All crossovers were used irrespective of their status regarding the gene conversions (complex, non-complex, or not assessed; see Halldorsson et al. 2019). Moreover, no probands were discarded based on the total number of crossovers in them; the highest number of crossovers per proband per chromosome was 17 maternal crossovers in chromosome 13. The number of markers in the linkage maps ranged from 17,894 to 90,036 depending on the chromosome. We used R (ver. 4.1.1 R Core Team 2018) with seed value 2021 to sample 1.5% of the markers of every chromosome, which yielded 268–1351 markers (i.e., 35,778–911,925 marker pairs) per chromosome. For every marker pair, we estimated the sex-specific recombination frequency by calculating the proportion of the studied probands (n = 41,092) with an odd number of crossovers between the markers.

Inference of crossover frequency

Assuming there is no chromatid interference, the probability of observing k crossovers in a randomly sampled meiotic product depends on the number of crossovers in the bivalent so that \(P( k ) = {n \choose k}\left( {\frac{1}{2}}\right)^n\), where n is the number of crossovers in the bivalent and 0 ≤ k ≤ n (Weinstein 1936). The number of crossovers in the bivalent varies not only between the sexes and chromosomes but also between individuals and individual meioses (see Broman and Weber 2000 for an example). Therefore, the observed crossovers in the gamete pool are a sample of crossovers from meioses with a different number of crossovers in the bivalent. As the sampling function above is known, the multinomial distribution of the number of crossovers in the bivalent can be estimated. We used the expectation-maximization (EM) algorithm of Yu and Feingold (2001) to estimate the multinomial distribution for different numbers of crossovers in bivalent. The algorithm approximates the multinomial distribution that maximizes the likelihood of the data consisting of the numbers of meiotic products with 0 … N observed crossovers in each chromosome. According to Yu and Feingold (2001), for data where the highest number of observed crossovers in a single meiotic product is N, it is sufficient to estimate the multinomial distribution between 0 and 2N−1. The algorithm was applied separately for maternal and paternal crossovers and for each chromosome by pooling all meiotic products (n = 934, n = 517, n = 41,092 for nine- and three-spined stickleback and human data, respectively, Supplementary Tables S1S6). The estimated multinomial distribution includes the maximum-likelihood estimate for no crossovers in the bivalent. We applied the bootstrapping test of Yu and Feingold (2001) to estimate if, in case of an estimate above zero, the deviation from that is statistically significant (p < 0.05). For the chromosomes with non-significant p-values, we used a restricted multinomial distribution that restricts the likelihood of no crossovers to zero for further analyses. This choice was made to assume the obligate crossover as a null hypothesis.

Chromosomes with p-value below 0.05 were chr15 of the three-spined stickleback (paternal meioses) and in humans chr21 and chr22 (both maternal and paternal meioses) and chr3 (paternal meioses). For humans, meioses with no crossovers in chromosomes 21 and 22 have been previously reported in cytological studies (e.g. Wang et al. 2017; Hassold et al. 2021), but we are not aware of such findings for the chr3 and further verification is needed. For the sticklebacks, this is the first study testing the obligate crossover hypothesis.

Performance assessment of functions

The performance of p0(k) and the three inverse mapping functions in predicting recombination frequency from map distance were assessed by calculating the mean absolute error of the predictions and the intra-chromosomal component of genetic shuffling, \(\overline r\) (Veller et al. 2019). The equations for the inverse mapping functions were: \(r = \frac{1}{2}( {1 - e^{ - 2d}})\) (Haldane 1919), \(r = \frac{1}{2}\,tan\,h\left( {2d} \right)\) (Kosambi 1944), and \(\left\{ {\begin{array}{*{20}{c}} {r = d,\,when\,d\, \le \,0.5} \\ {r = \frac{1}{2},\,when\,d\, > \,0.5} \end{array}} \right.\) (linear). In all functions, r is the recombination frequency and d is the map distance in Morgans.

The mean absolute error of the predicted recombination frequency was calculated as \(\frac{1}{n}{\sum} {\left| {r_e - r_p} \right|}\), where re and rp are empirical and predicted recombination frequencies per marker pair, respectively, and n is the total number of marker pairs. The intra-chromosomal component of \(\overline r\) is the part of genetic shuffling due to crossover rate and localization. A higher number of crossovers, and their even distribution increase \(\overline r\), while a low crossover rate and terminal or aggregated localization decrease it. Here, we calculated the intra-chromosomal component of \(\overline r\), by first predicting the recombination frequency from map distance and then converting it to shuffling according to Eq. (10) of Veller et al. (2019) \(\mathop {\sum}\nolimits_{i\, < \,j} {r_{ij}/{{\Lambda} \choose 2}}\), where rij is the rate of shuffling, i.e. recombination frequency of locus pair (i, j), Λ is the number of loci, and \({{\Lambda} \choose 2}\) is the number of locus pairs.

Results

Map distance and recombination frequency do not have a fixed association

Mapping functions formulate a fixed association between recombination frequency and the expected number of crossovers in a gamete, the map distance. However, the same additive map distance can be associated with different recombination frequencies depending on the crossover positions and the context of the focal loci (Fig. 1). This demonstrates that inverse mapping functions have limitations in predicting recombination frequency from map distance. Maternal crossovers in chromosome 8 of the nine-spined stickleback have a distribution similar to the idealized example (Fig. 2), suggesting that a structural model of crossover localization may be applicable in predicting recombination frequency from map distance. Another limitation of the mapping functions (and their inverses) is that they do not account for the variation in the number of bivalent crossovers. Inferred probabilities for bivalent crossover counts show that the number of crossovers varies between sexes, among chromosomes, and in meioses in all three studied species (Fig. 3, Supplementary Tables S1S6). This further indicates that the inverse of the Kosambi or Haldane mapping function may not be suitable for different species, sexes, or chromosomes.

Fig. 2: Observed spatial distribution of maternal crossovers in gametes with two maternal crossovers.
figure 2

A Each dot corresponds to one offspring (n = 265) and coordinates show the map positions of the two crossovers, ‘A’ referring to the crossover closer to chromosome start and ‘B’ to the one closer to chromosome end (r = 0.029, p = 0.64). B Density plots show the distribution of the two crossovers. Densities are calculated in 5 cM windows. In both panels, the dashed-line rectangles show the expected distribution of crossovers expected under the model presented here.

Fig. 3: Observed (gametic) and inferred (bivalent) crossover frequency distributions in the nine-spined stickleback (P. pungitius), three-spined stickleback (G. aculeatus), and human (H. sapiens).
figure 3

Each bar shows the proportion of offspring with a certain number of crossovers (Observed) and the inferred proportions of meiosis with a certain number of crossovers in the bivalent in maternal and paternal meioses (Inferred). The inference is based on the expectation-maximization algorithm by Yu and Feingold (2001). Chromosomes are ordered from shortest to longest by length in base pairs. Crossover counts 6–11 and 12–21 are grouped for readability.

Inverse Kosambi and Haldane mapping functions underestimate recombination frequency

For the three studied species, empirical recombination frequencies lie between the predictions of the inverse Kosambi and linear function (Fig. 4, Supplementary Figs. S2S4). This shows that inverse Kosambi and Haldane mapping functions systematically underestimate the recombination frequencies and the linear function overestimates them. The poor performance of the inverse Haldane function is not surprising as it assumes no crossover interference. However, the fact that the inverse Kosambi function, which does implement crossover interference, systematically underestimates recombination frequencies implies that its crossover interference model does not reflect the underlying biology.

Fig. 4: Examples of empirical recombination frequencies (left panels) and predictions by the new function (right) in the nine-spined stickleback (top), three-spined stickleback (middle), and human (bottom).
figure 4

Solid lines show the three inverse mapping functions; the linear function was capped at 0.5, shown with the dashed line. Each orange and blue dot is a marker pair in maternal and paternal data, respectively.

The new p0(k) function outperforms the existing functions

We evaluated the new function against existing functions by calculating the mean absolute error of predictions and the intra-chromosomal component of genetic shuffling for the two sexes of three different species. The predictions made with the new p0(k) function depict the pattern of empirical data (Fig. 4, Supplementary Figs. S5S7), and the mean absolute error of those predictions are clearly lowest in five out of the six cases. In nine-spined stickleback males, most meioses have one crossover and the linear function gives a slightly lower error than p0(k) (Table 1). The excellent overall performance (Supplementary Tables S7S9) demonstrates that the new function works for different species and on different types of chromosomes. Consistent with the mean absolute error, the new function is superior when assessed on the intra-chromosomal components of \(\overline r\) estimated from the empirical and predicted recombination frequencies. For each species and for both sexes, the predictions made by the p0(k) function gave results closest to the empirical results (Table 2), demonstrating its potential for the application.

Table 1 Means of a per-marker pair absolute error in the predicted recombination frequencies for both sexes of the three study species.
Table 2 Intra-chromosomal components of genetic shuffling (\(\overline r\)) for autosomes.

Altogether, our analyses show that the inverses of the Kosambi and the Haldane mapping functions systematically underestimate the recombination frequencies. Although the linear function works for chromosomes with an overall crossover rate close to 1 (map length ca. 50 cM), the p0(k) function does not underestimate the recombination frequency to the same extent as the other two functions. Overall, the new p0(k) function gives qualitatively and quantitatively the best results.

Discussion

Intrinsic limitations of mapping functions

Map distance tells the expected number of crossovers between two loci and can be used as a proxy of recombination; the longer the map distance the higher the recombination frequency. In principle, translating a map distance to a recombination frequency is trivial, one only needs an inverse of a mapping function such as Kosambi’s (1944) or Haldane’s (1919). However, the problem with this approach is that in modern high-density linkage maps, map distances of non-adjacent loci are additive and an inverse mapping function is not guaranteed to give the correct recombination frequency.

The intrinsic limitations of mapping functions, namely that they describe the interference only at a general level and do not account for the variation in the strength of crossover interference or the number of crossovers, have been previously recognized (Crow 1990; Zhao and Speed 1996; Otto and Payseur 2019). However, empirical tests on the performance of inverse mapping functions with modern data are scarce.

Inverse Kosambi and Haldane mapping functions err with high-density data

Here, we demonstrated that additive map distance can yield an array of different recombination frequencies (Fig. 1) and showed with empirical data from humans and nine- and the three-spined sticklebacks that the inverse of Kosambi and Haldane mapping functions underpredict the recombination frequencies (Fig. 4). We also formulated a new function that outperforms those mapping functions in this task and yields lower error (Table 1) and a more accurate estimate of genetic shuffling (Table 2). The fact that inverse mapping functions fail to predict recombination frequencies implies that those models of crossover interference do not predict crossover localization correctly.

Similar findings were reported by Veller et al. (2020), who showed that the per-chromosome variances of genetic relatedness estimated with the inverse of Kosambi function were higher than those from cytological data that should approximate “true” variance. Our results regarding both sexes of all three studied species were concordant: inverse Kosambi and Haldane functions systematically underpredict recombination frequencies and consequently genetic shuffling (\(\overline r\)), causing overestimation of variance in genetic relatedness, which decreases as a function of \(\overline r\) (Veller et al. 2020).

Stickleback males have primarily one crossover per bivalent (Fig. 3, Supplementary Tables S2 and S4) and in those chromosomes the linear function had the best performance (Supplementary Tables S7S9). The linear function predicts recombination frequency equal to map distance, whereas the p0(k) gives the map distance per the total map length of the chromosome, which yields an underestimate in certain chromosomes. This indicates that the linear function is adequate and should be preferred for chromosomes that mainly have one bivalent crossover, especially in species where this is a norm, such as in Caenorhabditis elegans (Meneely et al. 2002; Hollis et al. 2020) or Lepidoptera (Davey et al. 2017).

Accurate estimates of recombination frequency are needed in predicting genetic shuffling and from that the variance in genetic relatedness. On the other hand, map distance per base pair (cM/Mbp) is used as a measure of recombination in comparative genomics, especially in non-model organisms (Stapley et al. 2017; Martin et al. 2019). This metric is easy to obtain from linkage maps, but as shown here, meaningful interpretation of map distance requires knowledge of the recombination process and interpretation with an inverse mapping function can lead to biased estimates.

Limitations of the approach

In contrast to existing methods, the approach presented here builds on the fact that the crossover rate varies among organisms and meioses, which leads to differences in spatial distributions of crossover sites, and this variation should be implemented in the model. However, this approach requires knowledge about the probability of different numbers of crossovers in the bivalent, which is not needed by other methods and that cannot be concluded directly from the total map length. These can be inferred from gametic crossovers obtained from linkage maps (as done here) or analysis of haploid offspring (Liu et al. 2015). Alternatively, cytological methods can be used to obtain the bivalent crossover counts directly (Froenicke et al. 2002; Wang et al. 2017). Especially with gametic crossovers that contain the sampling variance caused by the fact that only two chromatids are involved in one crossover event, the sample size must be sufficient to obtain reasonably accurate probability distributions. Implicitly, gametic crossovers were, as a null expectation, assumed to present an unbiased sample of those in the bivalent and that meiotic drive or selection for crossover count has not occurred. However, effects of natural selection could also be involved, for example if the offspring are studied in later life-stages instead of direct investigation of gametes.

In the p0(k) function itself, the most important assumption is the uniform distribution of crossovers per map distance and the arbitrarily defined breakpoints of the “segments”. These assumptions are simplifications that allow mathematically tractable formula. The distribution of crossovers per map distance can be studied from the gametic crossovers only to some extent because they present a subset of those in the bivalent. However, the likelihood for gametes with a certain number of crossovers that show all bivalent crossovers can be calculated and the inherent uncertainty can be estimated (Supplementary Methods). Based on the distribution of gametic crossovers per map distance, the assumptions of the model approximate the data in many chromosomes (Supplementary Figs. S8S10). However, the non-random distribution of error per map distance shows that all assessed functions have a systematic bias (Supplementary Figs. S11S13).

The motivation for the approach presented here came from wild individuals of the nine-spined stickleback, which is not a canonical organism to study crossovers. In contrast to humans for example, the number of crossovers in meiosis varies very little, especially in males. Despite the overall differences in crossover rates (map lengths) of human and sticklebacks, the presented method was demonstrated to work for human data as well.

Conclusions

The most salient finding of this study is that the inverse of the Kosambi and the Haldane mapping functions systematically underpredict recombination frequencies. Another caveat of using inverse mapping functions to translate additive map distances to recombination frequencies is that they yield one prediction per map distance, which does not match empirical findings. These findings demonstrate that (intermediate) map distances must be interpreted with care and context-specifically. We also formulated a piecewise function that allows the association between map distance and recombination frequency to be ambiguous. The fact that this function outperforms existing mapping functions in this task indicates that its implementation of crossover interference is more concordant with the data when compared with functions devised earlier. However, the presented function does not replace mapping functions in building linkage maps; it only replaces their inverses in predicting recombination from map distance.