A high-resolution map of non-crossover events reveals impacts of genetic diversity on mammalian meiotic recombination

During meiotic recombination, homologue-templated repair of programmed DNA double-strand breaks (DSBs) produces relatively few crossovers and many difficult-to-detect non-crossovers. By intercrossing two diverged mouse subspecies over five generations and deep-sequencing 119 offspring, we detect thousands of crossover and non-crossover events genome-wide with unprecedented power and spatial resolution. We find that both crossovers and non-crossovers are strongly depleted at DSB hotspots where the DSB-positioning protein PRDM9 fails to bind to the unbroken homologous chromosome, revealing that PRDM9 also functions to promote homologue-templated repair. Our results show that complex non-crossovers are much rarer in mice than humans, consistent with complex events arising from accumulated non-programmed DNA damage. Unexpectedly, we also find that GC-biased gene conversion is restricted to non-crossover tracts containing only one mismatch. These results demonstrate that local genetic diversity profoundly alters meiotic repair pathway decisions via at least two distinct mechanisms, impacting genome evolution and Prdm9-related hybrid infertility.

Finally, human NCO events show a strong overall bias towards G/C bases (68%) 18,19 , as opposed to A/T bases [20][21][22] , occurring via an unknown mechanism. This phenomenon is thought to have driven regional differences in the GC-content of many species genome-wide [23][24][25] . Possible causes of this GC-bias include either subtle event initiation biases 24,26 , or heteroduplex DNA repair pathways 27 . However, simple models of heteroduplex DNA repair favouring G/C bases at mismatching bases are difficult to reconcile with the fact that most NCO events convert a contiguous set of SNPs, with no evidence of repair template switching. Moreover, not all SNPs in individually studied hotspots show GC bias 28 . It has been unclear to what extent these findings for humans might generalise to other species. Importantly, lack of power has prevented resolution thus far of fundamental questions about meiotic recombination, including any precise estimate of the length of underlying NCO tracts 18,19,29,30 , the total number of homologous recombination events per meiosis, and where NCO events position relative to PRDM9 binding sites and DSBs, although one recent study suggested a fairly broad distribution 27 .
To investigate links between genetic variation and repair outcomes, we mapped both CO and NCO events in mice, including mice humanized at Prdm9 17 , in both sexes. Critically, we also gathered complementary H3K4me3 and DMC1 ChIP-seq data (DMC1 data generated elsewhere 31 ) in the male parental, or closely related, animals 14,16,17,19,31,32 , allowing us to analyse PRDM9 binding and DSB formation. Because H3K4me3 marks PRDM9 binding sites, at individual hotspots H3K4me3 signal strength approximates PRDM9 binding levels. DMC1 marks DSB sites prior to repair processing, so DMC1 signal strength at individual hotspots increases both with the rate at which DSBs occur, and the average time until these DSBs are repaired. Together, these data provide an unprecedented opportunity to investigate each step of 7 To find both CO and NCO events, we developed and applied an HMM-based algorithm (Supplementary Note) to infer "background" states (B6/B6, B6/CAST and CAST/CAST) across the genome in each mouse to test potential gene conversions against (Fig. 1b). CO events correspond to background changes. SNPs with genotypes not matching their local background represent possible NCO events, but sequencing errors also mimic NCO events. Following careful filtering to exclude such errors (Methods and Supplementary Table 1) we identified 183 NCOs and 295 CO events on autosomes from the 11 F2 animals (Fig. 1c) and 1,392 NCOs and 2,205 COs in the F5 mice (Fig. 1d). This represents ~3-fold more NCO events identified by direct sequencing, which avoids ascertainment biases, than in the largest previous mammalian study 19 , which was performed in humans -allowing inter-species comparisons. Sequencing-based validation of F2 events (Methods) estimated that 91% of the identified NCO events are real, while using simulations we estimate our power to identify those NCO events containing at least 1 SNP as 63% or above (Methods and Supplementary Fig. 1a, b). In the F5 mice, we were able to identify both de novo and parentally inherited NCO and CO events; and we were able to assign a subset to the maternal or paternal meiosis (Methods). NCO events can be assigned to a genetic background by determining whether they result from a DSB on the B6 or CAST chromosome.

Overall event properties
NCO and CO events, as well as DMC1 and H3K4me3, show enrichment nearer to telomeres ( Fig. 1e and Supplementary Fig. 1c). This is broadly similar to patterns observed in other mice [33][34][35] and humans 15 , although COs show stronger enrichment than NCOs. Surprisingly and in contrast to events in human females, at least 99.4% of observed NCO events were "simple" and comprised contiguous tracts of converted SNPs, with no non-converted SNPs amongst them.
Similarly, 99.4% of COs were simple background switches. This implies that complex NCOs are extremely unusual in mice. Moreover, we observed a very high overlap of both CO and NCO events with recombination hotspots, stronger than observed in humans 15,18,19 . In F2 mice (whose parents have the same F1 genetic background as the ChIP-seq samples), 96% of CO events and 92% of NCO events overlap ChIP-seq peaks (adjusted for false-positive NCO events and chance overlap; 84% unadjusted). This reduces only modestly in F5 mice, where only Prdm9 Humcontrolled recombination hotspots are active (Supplementary Table 2), so hotspots identified in the heterozygous F1 mouse are still informative for meioses occurring in F4 mice. Thus, recombination hotspots identifiable by ChIP-seq account for essentially all recombination in mice, with little recombination in the remainder of the genome. Our findings confirm that female recombination mainly occurs in hotspots also active in male mice, in which our ChIP-seq data were gathered 36 .
NCO and CO events both occur in individual hotspots with probability approximately proportional to their estimated heat using either DMC1 or H3K4me3 signal strength (Fig. 2b, c): over 50% of all hotspot-associated F2 NCO or CO events occur in only the 4,000 hottest hotspots. Strong dominance of Prdm9 Castcontrolled over Prdm9 Hum -controlled hotspots is observed for both event types, and in our ChIP-seq data (Fig. 2d, e and Supplementary Fig. 2a,   b). Because binding sites for the humanized allele have not experienced evolutionary hotspot erosion 14 , this phenomenon cannot explain the dominance of the Prdm9 Cast allele. Instead, it could be due to a greater number of strong PRDM9 Cast binding targets genome-wide or possibly due to higher expression of Prdm9 Cast .
After accounting for sampling variation (Methods), we estimated correlation between recombination rates at different scales (Fig. 2f, g and Supplementary Fig. 2c, d). This revealed sex differences, and strong (>70%) correlation between NCO and CO rates, although we also find very strong evidence that these events do differ in their positioning along the chromosome, especially at broad scales, and the NCO rate is much higher than the CO rate at all scales.

Length, number, and positioning of NCO tracts
We leveraged the high SNP density in our system and large number of events to estimate the underlying NCO event tract lengths (accounting for the fact that if a NCO event does not contain a SNP, it is not observed; Methods), separately for hotspots controlled by Prdm9 Cast and Prdm9 Hum . The data show relatively good fits to an exponential distribution (Fig. 2h), but with significant differences in estimated mean NCO tract length (p=0.0018): 30 bp for Prdm9 Cast (95% confidence interval (CI) 25-35 bp), and 41 bp for Prdm9 Hum (35)(36)(37)(38)(39)(40)(41)(42)(43)(44)(45)(46)(47)(48). This is unexpected and implies that Prdm9 alleles can differ in basic properties of how recombination events resolve. These tract length estimates are at the lowest end of the broad existing estimates for humans and mice 18,19,29,30 .
Previous studies using microscopy have reported 200-400 visible DMC1 foci marking individual DSB sites per meiosis in mice 29,37,38 . However, some of these DSBs, e.g. those occurring on the X-chromosome in males, might invisibly repair using the sister chromatid, which is present following replication. We directly estimated the total number of DSBs that repair from the homologous chromosome per meiosis. Using our tract length estimates (Methods), we inferred an average total of 300.5 DSBs (95% CI 258. 5-370.5) Fig. 3a, d), slightly less strongly than clustering of mapped DSBs 42 , but with a far tighter range than the DMC1 and H3K4me3 ChIP-seq signals, which identify single-stranded resection tracts around DSBs and histone methylation resulting from PRDM9 binding, respectively (Fig. 3e, f). CO events spread more broadly (Fig. 3b, c), consistent with previous studies 11,39,42 . Thus, NCO gene conversion appears restricted to sites very close to initiating DSBs themselves, and more distantly positioned NCOs 27 appear to only occur rarely.

GC-biased gene conversion is controlled by SNP density and explains complex NCO and CO events
Our NCO events show strong evidence of AT-to-GC bias, though initially weaker than seen in humans 18 , for both Prdm9 Cast -controlled (64%) and Prdm9 Hum -controlled (60%) hotspots (p<2x10 -9 ; Supplementary Table 3). We next focussed on NCO events within Prdm9 Humcontrolled hotspots for further investigation, because the genomic GC-content has not evolved alongside this allele. We tested for a difference in NCO tracts containing a single SNP with those containing multiple SNPs (Fig. 4a). Surprisingly, this revealed GC-bias to occur exclusively in single-SNP NCO tracts, which show a near-identical GC-bias (68%) in both males and females (Supplementary Table 4). In complete contrast, no bias (p=0.92) is seen for all multiple-SNP tracts combined, and the difference relative to single-SNP tracts is highly significant (p=1.1x10 -7 ). We exactly replicated this finding (p=5.6x10 -4 ) in NCO events within hotspots in humans 19  A restriction of GC-bias to single-SNP tracts might reflect either some GC-biased process preventing longer events occurring, or a direct impact of the number of SNPs within heteroduplex DNA on whether GC-bias occurs. To distinguish between these possibilities, we stratified SNPs by distance to their nearest SNP and measured their GC-bias if they fell within NCO events (Fig. 4b). Strikingly, SNPs near to other SNPs, and therefore almost always coconverted with them, show no GC-bias evidence. Conversely SNPs further than typical NCO tract lengths, >100 bp from the nearest SNP, show the ~68% bias observed in humans, in whom SNP density is much lower 18,19 . This implies that local genetic diversity itself influences GCbiased gene conversion at NCOs, and therefore there must be at least two distinct processes operating to repair heteroduplex stretches formed at DSBs, one which is strongly GC-biased, and another which dominates when multiple mismatches exist, and shows no GC-bias.
To further characterise GC-bias, we estimated conversion rates of different types of SNP in the donor and recipient chromosomes at single-SNP NCO sites ( Fig. 4c and Supplementary Note).
We normalised these relative to their conversion rates in multi-SNP events ( Supplementary Fig.   3b), or to flanking SNP composition (Fig. 4c), both of which show no GC-bias and gave nearidentical results. The simplest model which can explain the data is if there are two distinct conversion rates, with observed NCO rates lower if the recipient chromosome (i.e. the homologue in which the DSB occurs) carries a G or a C, and higher if the recipient carries an A or a T. For example, G/C transversions appear to convert at the lower rate. This could be explained by a model where a GC-biased process can resolve heteroduplex DNA in favour of the recipient chromosome, if it carries a G and/or C base -effectively "blocking" conversion of that base. If so, higher local heterozygosity, which disrupts this process, would be expected to actually increase local NCO rates.
Interestingly, we do not observe a consistent GC-bias for CO events, which are accompanied by long conversion tracts of ~500 bp in size 30 . However, we did observe a very small number of "complex" events, incorporating non-converted markers surrounded by converted markers, and resulting from the same meiosis. We hypothesised that these might result from occasional operation of the GC-biased process. If so, the above results suggest that complex events might result from "blocking" of conversion of particular markers where the recipient chromosome carries a G or C base. This motivates examining the non-converted markers surrounded by converted markers within complex CO and NCO events. We observed a total of 12 such markers within NCO events and 7 within CO events. Remarkably, for 18 of these 19 cases the recipient chromosome carries a G or C base (p= 7.6x10 -5 by 2-sided binomial test). Therefore, in our mice essentially all of the complex NCO and CO events we observe can be explained in terms of the action of a GC-biased process which normally only operates within single-SNP conversion tracts. A recent study of one human hotspot 43 found a similar GC-bias of 87-100% for complex male CO events, so it seems likely this process operates across species.
Moreover, the bias of nearly 100% towards the recipient carrying a G/C, compared to the ~68% bias of all single-SNP NCO events, might suggest that non-biased heteroduplex repair occurs even in some tracts containing only a single heteroduplex site. For example, the bias might only impact either G or C recipient bases, but not both, which would cap the bias at NCO sites to (at most) 67%, very close to the observed fraction.

Hotspots where the homologue is not bound by PRDM9 show reduced homologous recombination
For NCO events, we can identify on which homologue the underlying DSB occurred. In the F2 mice, we observed a bias: 60% of the observed NCOs were initiated on the B6 background (p<10 -3 ). This is due to the behaviour of the Prdm9 Cast allele, which accounts for 80% of observed NCO events (Fig. 2e), and which shows a strong preference for binding to the B6 background (Supplementary Fig. 5a; 66% of NCOs, p<10 -3 ), explained by evolutionary hotspot erosion of CAST-controlled hotspots on the CAST genetic background 14,16 . As expected, the Prdm9 Hum allele binds and initiates recombination events equally on both backgrounds ( Supplementary Fig. 5a, p=0.63). The fraction of NCOs initiating on the B6 background correlates highly with the fraction of DMC1 and H3K4me3 ChIP-seq signals originating from that background ( Supplementary Fig. 5b, e). Because the ChIP-seq data only reflect male meiosis but observed NCO events originate from both males and females, this high correlation implies similar hotspot behaviour in both sexes. The similar increases in PRDM9 binding, DSB formation, and NCO formation on the B6 chromosome imply that no strong compensation mechanism acts to equalise the number of DSBs or recombination events on different homologues, although weaker compensation that we lack power to detect might occur.
Recombination hotspots can be separated into "asymmetric" cases where DSBs occur mainly on one homologous chromosome, and "symmetric" cases where DSBs occur equally on both homologues. Using H3K4me3 and DMC1 ChIP-seq data, we estimated the fraction of PRDM9 binding and DSB formation on the B6 vs. CAST chromosome in each hotspot 17 . Among the most asymmetric hotspots, we observed SNPs or indel polymorphisms within 96% of identified motifs overall ( Supplementary Fig. 5f), implying their asymmetry is almost always driven by sequence changes disrupting PRDM9 binding on one homologue. Therefore, asymmetric binding, and hotspots, are conserved between the sexes and between F2 and F5 animals.
As in other hybrid mice 17 , we observed that the ratio of DMC1 to H3K4me3 ChIP-seq signal increases roughly two-fold at asymmetric hotspots compared to symmetric hotspots ( Supplementary Fig. 5g), a fact most easily explained by delayed repair of DSBs forming at positions whose homologue is not bound by PRDM9 17 . We tested whether such asymmetry might also disrupt DSB repair processing, by quantifying asymmetry (Methods) and measuring the numbers of NCO and CO events actually occurring in asymmetric vs. symmetric hotspots, relative to their expectations according to DMC1 and H3K4me3 signal strength (Methods and Supplementary Note).
H3K4me3 signal strength approximates PRDM9 binding levels, while DMC1 levels at individual hotspots reflect numbers of DSBs (as well as repair timing). As we found across events overall ( Fig. 2), with all else being equal we expect any two groups of hotspots that are matched to have the same total DMC1 or H3K4me3 signal to also have similar numbers of CO and NCO events.
However, when we grouped Prdm9 Hum -controlled hotspots according to their (a)symmetry we instead observed a strong depletion of both NCO and CO events in the most asymmetric hotspots (p=10 -27 and p=10 -23 , respectively, after controlling for factors influencing power; Fig. 5), whether DMC1 or H3K4me3 was used. We replicated this signal for Prdm9 Cast ; for both males and females; and for de novo and inherited events in F5 mice, as well as events in F2 mice (Supplementary Note and Supplementary Fig. 6), so this is a general property of asymmetric hotspots in both sexes. Because the Prdm9 Hum allele in particular did not co-evolve alongside the mouse genome, asymmetric hotspots controlled by this allele reflect chance genetic variation disrupting PRDM9 binding sites on one homologue or the other, implying a mechanistic impact of asymmetry on recombination independent of hotspot erosion or other evolutionary forces.
Importantly, we found that this homologous recombination deficiency is driven by PRDM9 binding asymmetry alone, rather than SNP diversity elsewhere within hotspots (Methods).
Furthermore, for DSBs occurring on the less-bound chromosome of asymmetric hotspots, we found that NCO events occur at the expected rate for symmetric hotspots (Supplementary Note).
This implies that when DSBs occur at asymmetric hotspots on the more frequently bound chromosome, the resulting lack of observed NCO or CO recombination events must be specifically driven by a lack of PRDM9 binding to its homologue, rather than any impact of diversity per se, even within PRDM9 binding motifs.
Because H3K4me3 reflects the level of PRDM9 binding, the lack of homologous recombination at asymmetric hotspots might in principle be explained if DSBs occur less often at these sites.
However, this seems impossible to reconcile with the dramatic excess DMC1 signal observed at asymmetric hotspots for a given level of H3K4me3 signal. Instead, our results can be explained if, at DSB sites whose homologue is not bound by PRDM9, repair sometimes takes place using the sister chromatid rather than the homologous chromosome. This must occur in both sexes.
Sister-based repair can explain the lack of observable COs and NCOs in such cases, and the identical magnitude of depletion of each type of event, implying that asymmetric hotspots fundamentally disrupt interactions between homologous chromosomes. This implicates PRDM9 binding motifs as one genetic "signal" allowing identification of DNA on the homologue, to facilitate homologous repair.
Interestingly, sister chromatid repair is thought to operate on meiotic DSBs on the X chromosome in males 12 , which, similar to autosomal DSBs in asymmetric hotspots, exhibit a very strong increase in DMC1 signal relative to H3K4me3 signal, probably owing to the late timing of their repair 17 . If repair via the sister chromatid occurs later in general, this would explain the observed elevation of DMC1 signal relative to H3K4me3 at asymmetric hotspots.

Discussion
Although complex NCO events are common in human females 18,19 , with an incidence increasing with maternal age, we found that such events are near absent in mice. We suggest that this difference may reflect differences in the timespan of dictyate arrest, which occurs before the completion of female recombination, and lasts decades in humans vs. months in mice. These findings support the hypothesis that complex NCO events in humans might reflect the repair of non-programmed DNA damage occurring over time, consistent with the fact that they are different in other ways: they mainly occur outside PRDM9 hotspots 19 , are often >1 kb long, and show GC-bias regardless of their size.
Our results indicate a sex-averaged NCO rate in mice carrying humanized Prdm9 of around 10 -6 per base. This is strikingly below human estimates of around 4.1x10 -6 and 7.7x10 -6 in males and females respectively 19 , meaning humans show even greater increases in the NCO:CO ratio relative to mice -for unknown reasons. Nonetheless, our minimum estimates of total DSB counts in mice are consistent with previous microscopy studies, suggesting that most DSBs in mice repair using the homologous chromosome, at least those not found in asymmetric hotspots or impacted by GC-biased gene conversion. Speculatively, perhaps a higher rate of NCO events in humans may aid synapsis by providing more potential inter-homologue interaction sites during meiosis. However, an elevation in homologous recombination events also increases the potential for mispairing at some sites, with consequences including diseases caused by non-allelic homologous recombination.
Using DMC1 31 and H3K4me3 ChIP-seq data, one can infer that DSBs form at a small subset 11 of PRDM9 binding sites in each cell on average proportionally to the rate of PRDM9 binding to each site 17 . However, we find that the processing, repair and eventual recombination outcomes at each PRDM9 binding site all depend strongly on the sequence of the homologue. In this study, we revealed for the first time that both CO and NCO events are depleted, in both sexes, at "asymmetric" recombination hotspots where one homologue is not strongly bound by PRDM9 (primarily due to mutations in the PRDM9 binding motif on the homologue; Fig. 5). We also showed that this reduction in homologous recombination cannot be explained by an increase in genetic diversity alone, as has been suggested 16 ; only nearby SNPs that abolish PRDM9 binding symmetry have any effect on the CO or NCO rate. Furthermore, for DSBs occurring on the lessbound homologue in asymmetric hotspots, we found that NCOs occur at the expected rate for symmetric hotspots. This proves that homologous recombination outcomes at each DSB are strongly influenced by PRDM9 binding to the homologue and/or modifying its histones. That is, when the homologous site is not bound by PRDM9, the DSB is less able to repair by homologous recombination, which is critical for synapsis and fertility.
Others have recently demonstrated a depletion of COs at asymmetric recombination hotspots relative to DMC1 signal strength 16 . However, their data could not discriminate between the following causal possibilities: (i) genetic diversity per se was responsible; (ii) DMC1 elevation through repair delay entirely drove the signal; or (iii) these sites were preferentially repaired instead by NCO recombination. Here, we have ruled out these possibilities by demonstrating an equal depletion of CO and NCO recombination events exclusively at asymmetrically bound sites, even relative to expected event numbers from PRDM9 binding data (H3K4me3). This in turn allows us to confidently infer that some process downstream of DSB formation, but upstream of the CO vs. NCO repair decision, is disrupted at asymmetric sites. We suggest that this disrupted process may be homology search, rather than mismatch repair 16 (which would impact NCO events occurring on both homologues similarly). One hypothesis is that PRDM9 binding and/or chromatin marks on the unbroken homologue assist the homology search machinery with the challenging task of finding the correct homologous template for repair; without this assistance, homology search and homologous recombination are impaired at asymmetric sites. Together with previous work 16 , our data indicate that suppression of recombination is a general property of asymmetric hotspots. This mechanism can explain the wider asynapsis and infertility seen in male mice where asymmetric hotspots predominate 17,44 , although additional factors must act to explain sex differences in hybrid fertility.
We also confirmed our previous finding that when DSBs form at asymmetric hotspots, they show a two-fold excess of DMC1 signal relative to H3K4me3 signal on the more strongly bound homologue, which is consistent with delayed DSB repair 17 . Taking into account their elevated DMC1 signals, asymmetric hotspots behave oddly, by showing a combination of (at least in males) slower DSB repair, and an inability to engage with their homologues for this repair, via either CO or NCO, in both sexes. These phenomena could be explained by a model in which many DSBs in asymmetric hotspots fail to interact with their homologue and are instead repaired late from the sister chromatid. That is, asymmetric hotspots may behave like DSBs on the X chromosome in males, which repair late and from the sister chromatid, and show excess DMC1 signal 12,17 . Other models would involve either more or fewer DSBs occurring at asymmetric hotspots and seem unlikely, because they would require strong pairing of homologues prior to DSB formation in order to distinguish symmetric and asymmetric binding sites.
In addition to altering PRDM9 binding symmetry and the resulting recombination outcomes, local genetic differences between homologous chromosomes can also alter the process of GCbiased gene conversion (gcBGC) at NCO sites. We confirmed gcBGC operates downstream of DSB formation, and this implies it must act on repair of heteroduplex DNA with mismatching bases, formed during DSB repair towards recombination. We found that in both humans and mice, within hotspots gcBGC acts almost exclusively on potential conversion tracts containing only a single SNP (i.e. mismatch), with essentially identical bias in each species (68% of NCOs convert A/T to G/C) 18,19 . This single-SNP preference can explain why most multi-SNP NCO tracts are simple stretches of markers without "cherry-picking" of markers converted towards GC. Because heteroduplex DNA is expected to form for all possible NCO and CO tracts containing SNPs, not just those containing single SNPs, our results imply more than one pathway for heteroduplex repair (Fig. 6). In the first, gcBGC acts to favour the strand on which the DSB occurs: if this strand carries a G or C at the SNP, conversion is prevented from occurring. We calculated that such a process would need to block gene conversion at G/C recipient sites 53% of the time to account for the observed 68% overall GC-bias in observed events ( Fig. 6 and Supplementary Note). Almost all observed complex NCOs and COs, though very rare, appear to be explained by this GC-bias preventing conversion of individual markers, with the background on which the DSB occurred carrying a G or C base in 95% of such non-converted markers we observed. Similar behaviour was observed in a study of male COs within a single human hotspot 45 . Among suggested drivers of mammalian gcBGC 20 , these properties of strong (almost 100%) base-specific and strand-specific biases, and operating at very fine scales (single SNPs), appear most consistent with the action of base excision repair (BER) rather than mismatch repair (MMR) proteins.
The alternative, non-GC-biased repair pathway of heteroduplex DNA instead can act on multiple-SNP stretches, and must show a strand bias -this time favouring the incoming strand, copied from the homologous chromosome on which the DSB did not occur (Fig. 6). Otherwise, if heteroduplex mismatch repair had no strand bias, half of potential NCOs would repair invisibly, and so to account for our estimate of the NCO rate (~274 NCOs per meiosis) there would need to be twice as many DSBs per meiosis (~600), which is far outside the range of previous experimental observations 29,39,40 . Moreover, although resolution of heteroduplex DNA towards the broken chromosome within potential NCO events would be invisible, at 19 observed CO events within highly (>95%) asymmetric hotspots containing a mutation within their PRDM9 motif, we observe transmission of the "cold" binding site allele to offspring in 95% of cases. Therefore, for those longer conversion tracts within CO events at least, heteroduplex repair appears overwhelmingly biased towards the unbroken homologue. If this mechanism for resolving heteroduplex mismatches is also used for single-SNP stretches in some cases, this would explain why gcBGC appears weaker for NCO events in general, compared to complex NCO and CO events. We suggest that this non-GC-biased process, impacting longer stretches including multiple SNPs, is consistent with properties of MMR proteins, several of which are known to be essential for meiosis in mice 45,46 . If these hypotheses are correct, it is interesting that BER and MMR appear able to favour different strands.
Together, these gcBCG results imply that SNP density within hotspots influences recombination events downstream of DSB formation. For example, the same SNP will show different conversion rates and biases in different individuals, depending on nearby heterozygosity patterns in those individuals. Interestingly, this predicts a slightly higher NCO gene conversion rate in more diverse than less diverse regions. Another unexpected influence on NCO events, again impacting downstream of DSB formation, is the PRDM9 allele, with PRDM9 Hum -controlled NCOs having an average length 11 bp (37%) longer than PRDM9 Cast -controlled NCOs. It is unclear whether this reflects PRDM9 binding directly, or some indirect impact, e.g. how PRDM9 binds relative to nucleosome positions.
The effects of genetic diversity on PRDM9 binding and DSB formation are increasingly well understood [14][15][16][17] . However, there has been little prior knowledge of how genetic diversity can influence recombination after DSB formation (though the converse-how recombination affects genetic diversity-is highly studied). Here we have uncovered the first examples of local genetic diversity affecting both the ability to undergo homologous repair (due to polymorphisms altering PRDM9 binding symmetry), and the decision to repair via GC-biased gene conversion (depending on the number and spacing of local sequence polymorphisms), both downstream of DSB formation. Interestingly, our results predict that the reduction of homologous recombination at asymmetric hotspots should somewhat mitigate hotspot erosion caused by the overtransmission of alleles that disrupt PRDM9 binding 14,17 . Further work will need to investigate the exact mechanisms by which PRDM9 binding to the unbroken homologue promotes homologous repair, or the exact repair pathways that can explain the observed GC-biased or strand-biased NCO recombination outcomes.

Mouse breeding, library preparation and sequencing
CAST/Eij (CAST) mice were sourced from MRC Harwell (UK). The C57BL/6J (B6) line humanized at the Prdm9 zinc-finger array (B6 Hum ) was generated previously 17 . Breeding of CAST and B6 Hum mice (F0) was carried out in both directions (using females and males of each type) to generate (B6xCAST)F1 hybrid, heterozygous offspring. To study properties of the humanized Prdm9 allele, we genotyped as previously 17  We sequenced to obtain coverage of ~10x for the F0/F4 mice, and 20x for the F2 and F5 mice via the Illumina Hiseq2500 (F0 and 4 F2 mice) or Hiseq4000 platforms (remaining mice).
Genomic DNA was fragmented to an average size of 500 bp and incorporated into libraries using established Illumina paired-end protocols (Nextera DNA Library Prep). Sequencing reads were aligned to mm10 using BWA 48 (v. 0.7.0) followed by Stampy 49 (v. 1.0.23, option bamkeepgoodreads). We used Picard tools (v. 1.115) (http://broadinstitute.github.io/picard) to merge bam files from different lanes for the same sample and identify duplicate reads.
GenomeAnalysisTK-3.3-0 (GATK) was used for local Indel realignment followed by base quality score recalibration, and variant calling, using known Indel targets and SNPs between B6 and CAST from the Mouse Genome Project (MGPv4) data 50 . We filtered variants using GATK's Variant Quality Score Recalibrator (VQSR), employing the set of variants present on the Affymetrix Mouse Diversity Genotyping Array as a set of true positive variation 51 . We used the annotations "HRun", "HaplotypeScore", "DP", "QD", "FS", "MQ", "MQRankSum", and "ReadPosRankSum" to train VQSR, and a sensitivity threshold of 90% for the true positive set to define the set of newly genotyped sites that passed VQSR filtration. To remove potential hidden heterozygous sites within the F0 individuals, we removed all variants not genotyped as matching the homozygous reference allele in B6, or the homozygous alternative allele for CAST from MGPv4 50 . We obtained 13,946,562 and 13,940,079 reliable autosomal SNPs from the F2 and F5 samples, or roughly one SNP for about every 170 bp, which were used for downstream analysis.

Identifying unique NCO and CO events
Using the HMM method described in the Supplementary Note to define a background state (homozygous CAST background, heterozygous, or homozygous B6 background) along the genome in each mouse, we identified state changes as CO events. Autosomal genotypes in F2 and F5 mice conflicting with their background were investigated as potential NCO events, but mainly represented sequencing errors. The region chr6:37000000-56000000 (mm10) was removed since it was observed to be not fully inbred in the F0 founders. We filtered to remove false positive sites (Supplementary Table 1), apparently heterozygous sites in the F0 mice, false heterozygous calls exhibiting unequal numbers of reads supporting the two alleles, false homozygous calls due to low read depth, and others. The number of potential converted sites dropped greatly e.g. from 863,082 SNPs potentially converted to 183 distinct identified NCOs within the F2 mice.
For COs and NCOs identified in F5 animals, they were treated as inherited if the parents carried an identical event, and otherwise de novo. We used a previously described HMM algorithm 52 to identify parent-of-origin in de novo COs (those occurring in the germ cells of the F4 parents).
For de novo NCOs, we were only able to confidently assign parental origin when one of the parents was heterozygous at the converted sites, while the other was homozygous. In this case, the NCO must be inherited from the heterozygous parent.
We removed duplicate inherited CO and NCO events, yielding a set of unique events for downstream analyses. Of 1,575 observed NCO events, only 9 were "complex" and involved background switching within the event. Of 1,116 observed de novo CO events from F2 and F5 animals, 7 were complex e.g. a CO accompanied by a NCO event.

NCO validation by Sanger sequencing
To validate a subset of NCO events detected in F2 mice (19 within and 9 outside a hotspot), we PCR-amplified short regions (around 200 bp) overlapping the identified NCO sites using genomic DNA from the 2 F0 mice, the F2 mouse carrying the NCO, and up to 3 other related and/or unrelated F2 mice, using standard conditions (cycling conditions and primer sequences available upon request). PCR products were purified using the QIAquick PCR Purification Kit (Qiagen) and analysed by direct Sanger sequencing (Source Bioscience, UK). Sequence data comparison and analysis was carried out using Chromas LITE (version 2.1.1). By comparison of genotypes, we identified true NCOs vs. false positives. This confirmed all 19 NCOs overlapping a hotspot (100%), and of 9 non-hotspot cases, 4 were confirmed (44%). This suggests almost all hotspot-overlapping NCO events are likely real, and a higher false positive rate in NCOs away from hotspots. Given that 84.2% of the F2 NCO events overlap hotspots (Supplementary Table   2), we estimate an overall fraction of validated detected NCO events as 0.842 + 0.44 x 0.158, i.e.

Estimating power to identify NCOs
To estimate the power of our method to detect NCO events of varying tract lengths, we simulated NCOs with different mean tract lengths and ran our pipeline for identifying NCO events, including our filters. Because F2 events are controlled by both Prdm9 Hum and Prdm9 Cast and F5 de novo events are controlled by Prdm9 Hum alone, we performed two sets of simulations by using data from 11 F2 samples and 72 F5 samples. Because most recombination events overlap hotspots, we simulated NCOs in hotspot regions. For each mean tract length, we sampled 2,000 hotspots with probabilities proportional to their H3K4me3 enrichment. Within each hotspot, we sampled the centre of the NCO tract according to the distribution of NCOs around PRDM9 motifs after correcting for SNP density, and we sampled its tract length from an exponential distribution with a pre-defined mean tract length (which we varied from 10 to 100 bp with step size 10 and from 150 to 300 bp with step size 50). Sampled NCO tracts containing 0 SNPs were not counted as potentially detectable. Across these 2,000 tracts, different animals possessed different ancestral backgrounds. For each tract in each animal, we checked if any of the other animals had a different ancestral background consistent with a gene conversion event in the first animal. If so, we sampled such a "donor" mouse (other events were ignored). We copied the sequencing information corresponding to the converted sites from the donor mouse, such as the allele depth, and we copied the sequencing information for the background from the recipient, such as mate-pair information. Then, we applied the same filters to this simulated sequencing data at each sampled tract. We calculated our power by dividing the total number of simulated tracts left after filtering by the total number of simulated tracts overlapping at least one SNP (Supplementary Figure 1a, b).
Lysis was performed in 1% SDS lysis buffer (1% SDS, 10 mM EDTA, 50 mM Tris pH 8.0, 2x protease inhibitors). Sonication was performed in a Bioruptor Twin sonication bath (Diagenode) at 4°C for three 5-minute periods of 30s on, 30s off at high power. Sonicated lysates were diluted 1:10 in IP wash buffer (containing 500 mM LiCl) instead of dilution buffer for antibody incubation. This yielded roughly 1 ng of ChIP DNA per testis. ChIP and total chromatin DNA samples were sequenced in multiplexed paired-end Illumina HiSeq2500 libraries (rapid run), yielding 63-71 million 51-bp read pairs per replicate after filtering (one ChIP replicate per testis plus one input sample). Sequencing reads were processed and peaks were called as described in our previous work 17,53 . Haplotype assignment of ChIP signal and removal of PRDM9independent H3K4me3 peaks were performed as described 17 . The percentage of ChIP-seq read pairs originating from signal (as opposed to background) was estimated to be 87.4%, a significant improvement over our prior, less stringent, experimental method (which yielded 62-71% of read pairs from signal) 17 .

DMC1 ChIP-seq
DMC1 ChIP-seq data were generated elsewhere 31 . Briefly, single-stranded DNA sequencing (SSDS) DMC1 ChIP-seq was performed as described previously 54 , using testes from a male (B6xCAST)F1 Prdm9 Hum/Cast mouse. ChIP and total chromatin DNA samples were sequenced in multiplexed paired-end Illumina HiSeq2500 libraries (rapid run), yielding 252 million 51-bp read pairs. We processed the data following the algorithm provided by Khil et al. 2012 to map the reads to mm10 and obtain type I reads 54 . We then called DMC1 peaks as previously 17 . We defined NCO and CO events as occurring within hotspots if they were less than 1 kb away from either a DMC1 peak or an H3K4me3 peak (covering 4% of the genome).

Testing for correlation of different recombination-related phenotypes at chosen scales
We assumed that given an underlying vector of (binned) mean values " along the genome, the k th recombination-related quantity (number of observed recombination or NCO events in various classes), %" , follows a Poisson distribution with mean %" , in interval . The " means vary along the genome and represent the underlying recombination rate parameters; this model is accurate provided (as is likely to be case) for a single meiosis, the number of expected events in each bin is small. Then the variance

Motif Analysis
We used a Bayesian, ab initio motif finding algorithm to identify motifs within DSB hotspots 17,53 . For each DSB hotspot that is controlled by Prdm9 Cast , a 1,000 bp sequence (centred on the hotspot centre) was extracted from the reference sequence (mm10). Ab initio motif identification was performed on the centre 600-bp sequences from the top 1,000 hotspots (ranked by DMC1 heat) that contained no bases overlapping annotated repeats. Motif calling proceeded in two stages: seeding motif identification, and motif refinement. Each seeding motif was obtained by first counting all 10-mers present in all input sequences, and from the top 50 most frequently occurring 10-mers, the one with the greatest over-representation in the central 300 bp of each peak sequence was chosen. This seeding 10-mer was then refined for 50 iterations as described in Davies et al., 2016. This refined motif was then force-called on the full set of the hotspots (without filtering) by re-running the refinement algorithm, providing a probability of motif occurrence within each hotspot, and also identifying the most likely motif location in each case.
This motif was reported for each peak, along with position and strand. We did the same for DSB hotspots controlled by Prdm9 Hum and a 48-bp human motif was identified. We identified distinct sequence motifs, and their locations, within 97% of hotspots controlled by Prdm9 Cast and 74% of hotspots controlled by Prdm9 Hum (Supplementary Fig. 3a) 14,17,19,55 .
We used the SNPs generated as described above to determine whether each motif contains a SNP within its span. The distance from a motif to an event was defined as the distance from the centre of the motif to the nearest converted marker (lower bound for NCOs), or zero if a converted marker fell within the motif itself. We associated events <1 kb from a motif with that motifcontaining hotspot.

Estimation of NCO tract length for human-controlled and CAST-controlled events
To estimate NCO tract length, we assumed the converted tract follows an exponential distribution with rate parameter λ, where 1/λ is the mean tract length. While exponential tract lengths are not a fully accurate model, we can view this as a summary of tract properties, estimating the probability of co-conversion of pairs of markers as the distance between them increases. We computed a composite likelihood function for our NCOs and estimated λ via maximal likelihood. Specifically, for each converted site, viewing this site as a "focal" site, we examined the SNPs nearby and recorded for each SNP its distance from the focal SNP, and whether that SNP was also converted. If the SNP was also converted, then it was still in the gene conversion tract, otherwise it was not. Using this approach allowed our approach to be independent of SNP density, because we conditioned on SNP positions in our analysis. The probability that a SNP nearby a converted site is also converted is where d is the distance from the nearby SNP to the converted site. The probability that a SNP nearby a converted site is not in the tract is 1-Pr(in). All the NCOs are independent so we can multiple these probabilities for each SNP in the windows to get the (composite) likelihood of the data: Pr D = Pr in T (1 − Pr (in)) V@T
Here x=1 if the SNP nearby is also converted and x=0 otherwise. By maximising the likelihood using grid search for 1/λ from 1 to 1000 with step 0.1, we gained an estimate of tract length.
Because pairs of SNPs are not in fact independent, this is not a true likelihood (though the resulting estimator is statistically consistent as the number of independent conversion events increases), and so to estimate uncertainty in the resulting estimates, we utilised bootstrapping of NCO events.
To perform bootstraps, we separated autosomal genomes into 258 non-overlapping 10 Mb blocks (the last block in each chromosome is shorter than 10 Mb). We re-sampled 258 blocks with replacement, where the probability of sampling each block is proportional to the length of that block, and from the resulting bootstrapped set of NCOs, re-estimated tract length via the same procedure. CIs were calculated from a total of 10,000 bootstraps. We implemented this procedure for two sets of NCO events; those overlapping human-controlled and those overlapping CAST-controlled hotspots, respectively.

Hotspot symmetry estimates
Sequence differences between the CAST and B6 genomes allowed us to quantify the fraction of ChIP-seq signal (either DMC1 or H3K4me3), coming from the B6 and CAST chromosomes.
This also allowed us to determine whether individual hotspots in these hybrids were 'symmetric', with DSBs occurring equally on both chromosomes, or 'asymmetric', with a preference towards either the CAST or B6 chromosome.
Using SNPs distinguishing the B6 and CAST genomes, each type I read pair from a hybrid DSB library (DMC1 ChIP-seq) was assigned to one of the categories 'B6', 'CAST', 'unclassified' or 'uninformative' as in 17 , replacing PWD with CAST. For each DSB hotspot, the B6 cutting ratio was then computed as the fraction of 'B6' reads mapped within 1 kb of the hotspot centre, over the sum of 'B6' and 'CAST' reads in that region. We followed a similar approach for H3K4me3 ChIP-seq, further correcting for background signal as in 17 . For both DMC1 and H3K4me3, we required ≥10 informative reads to define the B6 cutting ratio.
To order hotspots based on their symmetry, if the fraction of cuts estimated on the B6 and CAST chromosome were x and 1-x, respectively, we defined the overall hotspot "symmetry" as 4x(1-x), which ranges from 0 for hotspots with events completely on one chromosome to 1 for hotspots with events occurring equally on both chromosomes 17 Supplementary Fig. 6 we used average homologous heat, defined as 2hx(1-x), which averages homologous heat over the strand an event occurs on.

Estimation of the fraction of asymmetric/symmetric hotspots containing a motif-disrupting variant
To estimate the proportion of hotspots of different levels of initiation on B6/CAST chromosomes containing SNPs within their PRDM9 binding motifs, we filtered to include only hotspots containing a clear motif (posterior probability >0.99), and at least 20 informative reads in our DMC1 data in order to accurately estimate the proportion of reads from B6, and 5 sequencing reads from each homologue covering the motif region, to provide power to identify variants if present. Supplementary Fig. 5f shows the fraction of hotspots in each binned level of initiation on the B6 chromosome containing a SNP or Indel (using GATK prior to VQSR, or Platypus).

Testing whether asymmetry or SNP density affects the resolution of recombination events
We fitted a generalised linear regression model to discern whether hotspot asymmetry or local Note.

Data availability
The datasets generated and analysed during the current study will be made available in public repositories (GEO and SRA) prior to publication. The H3K4me3 ChIP-seq data are currently available with GEO accession GSE119727.

Code availability
The computer code developed for the analysis of the datasets in the current study will be made available in Github prior to publication.

Ethical compliance
All experiments involving research animals received local ethical review approval from the