Maximum-likelihood method identifies meiotic restitution mechanism from heterozygosity transmission of centromeric loci: application in citrus

Polyploidisation is a key source of diversification and speciation in plants. Most researchers consider sexual polyploidisation leading to unreduced gamete as its main origin. Unreduced gametes are useful in several crop breeding schemes. Their formation mechanism, i.e., First-Division Restitution (FDR) or Second-Division Restitution (SDR), greatly impacts the gametic and population structures and, therefore, the breeding efficiency. Previous methods to identify the underlying mechanism required the analysis of a large set of markers over large progeny. This work develops a new maximum-likelihood method to identify the unreduced gamete formation mechanism both at the population and individual levels using independent centromeric markers. Knowledge of marker-centromere distances greatly improves the statistical power of the comparison between the SDR and FDR hypotheses. Simulating data demonstrated the importance of selecting markers very close to the centromere to obtain significant conclusions at individual level. This new method was used to identify the meiotic restitution mechanism in nineteen mandarin genotypes used as female parents in triploid citrus breeding. SDR was identified for 85.3% of 543 triploid hybrids and FDR for 0.6%. No significant conclusions were obtained for 14.1% of the hybrids. At population level SDR was the predominant mechanisms for the 19 parental mandarins.

proposed that, citrus, 2n eggs result from the abortion of the second meiotic division in the megaspore. This hypothesis was corroborated by molecular marker analysis for clementine (Citrus clementina Hort. ex Tan.) 35,36 . The method proposed by Cuenca et al. 37 was successfully applied in populations of 2n ovules of 'Fortune' mandarin and 'Nules' clementine, and it was concluded that SDR was the main restitution mechanism and that partial chromosome interference occurs 36,37 . By contrast, Chen et al. 38 proposed that 2n eggs of sweet orange (C. sinensis (L.) Osb.) resulted from first meiotic division restitution.
The origin of 2n gamete formation greatly impacts the gametic structures and, therefore, the polyploid populations and the efficiency of breeding strategies. Under FDR, non-sister chromatids retain parental heterozygosity from the centromere to the first crossover point,. Under SDR, the two sister chromatids are homozygous between the centromere and the first crossover point (Figure 1 5 ). As a consequence, several studies based on genetic markers indicate that FDR gametes transmit 70-80% of the parental heterozygosity, but SDR gametes transmit only 30-40% 9,19,[39][40][41][42] . Thus, a tighter distribution is expected in FDR-derived populations than in SDR ones because a higher percentage of the parental genome is transferred intact, resulting in a more uniform gamete production 43 . Therefore, insights into the meiotic nuclear restitution mechanisms that produce unreduced gametes are crucial for the optimisation of breeding strategies based on sexual polyploidisation 44 .
The identification of the mechanisms driving the formation of 2n gametes is complex. However, the use of cytological or marker analysis on polyploid progeny provide accurate or additional information on these mechanisms 9,19,45 . Molecular cytological approaches have been used successfully, including the unequivocal identification of genomes and recombinant segments in the sexual polyploid progenies 11,14,[45][46][47] . Molecular marker analysis is also a valuable tool for the estimation of parental heterozygosity restitution (HR) through diploid gametes to polyploid progenies and, therefore, to identify the mechanisms underlying unreduced gamete formation 22,35,38,39,41,48,49 . Several previously developed methods are based on the analysis of HR rates for randomly chosen unmapped markers 38 . These methods require the analysis of a large set of molecular markers to encounter, by chance, the loci with HR lower than 50% that are only found under SDR 50 . However, when HR over 50% is observed for all loci, no definitive conclusion can be reached without a prior knowledge of their location relative to a centromere. Significant FDR conclusions are therefore difficult to obtain with such non-mapped markers. Half-tetrad analysis (HTA; 51 , based on multiple linked loci, is a powerful method for mapping centromeres or for determining the mode(s) of 2n gamete formation. Tavoletti et al. 10 developed a multilocus maximum-likelihood method of HTA that permits the estimation of both the relative frequencies of FDR and SDR 2n gametes and the centromere location within a linkage group without relying on previously identified centromeric markers. The models described therein are all based on population analysis and suppose complete chiasma interference.
Cuenca et al. 37 proposed an approach that takes into account different models of chromosome interference (i.e., no interference, partial interference or complete chiasma interference) when testing for FDR and SDR, and for mapping centromeres to linkage groups. This approach is based on functions of heterozygosity restitution (HR) at the population level along a chromosome in relation to locus-centromere distance (d) 52 . Indeed, under FDR or SDR, HR is a direct function of the crossing over frequency between the considered locus and the centromere. It is, therefore, possible to implement the function (HR 5 f(d)) according to the FDR and SDR hypotheses while also taking into account different models of chromosome interference ( Figure 2).
In the present work, we propose a maximum-likelihood approach to test the SDR/FDR mechanism based on the HR of unlinked markers located close to the centromere of different chromosomes. This approach can be applied at the individual or population level. We simulated 2n gamete populations arising from FDR or SDR. This enabled us to identify the number of independent markers necessary to test in order to draw significant conclusions at the individual level in relation to marker/centromere distances, as well as the minimum population size necessary to be able to draw significant conclusions when analysing a defined number of unlinked markers.
As a concrete application this new method has been used for investigating the unreduced gamete formation in citrus. Taking advantage of the centromere locations 36 within the nine linkage groups of the clementine reference genetic map 53 , we selected centromeric markers and used the proposed maximum-likelihood method to (i) check the potential variability of origin between individuals for two genotypes in which SDR was proposed to be the predominant polyploidisation mechanism as determined by population analysis ('Fortune' mandarin 37 , and clementine 35,36 , and (ii) shed light on the mechanism leading to unreduced gamete formation in a range of mandarin genotypes used as female parents in 2x 3 2x triploid breeding programs.

Results
Statistical method for the identification of meiotic restitution mechanism. Identification of the restitution mechanism at an individual level. For loci heterozygous for the parent producing the 2n gamete, the probabilities of a 2n gamete being heterozygous or homozygous as a consequence of FDR or SDR mechanisms are direct functions of the marker-centromere distance.
To estimate such probabilities, the function relating HR rate and locus-centromere distance 37 , derived from the Cx(Co) 4 partial chiasma interference model developed by Zhao and Speed 52 and Foss et al. 54 , could be used. Indeed, Cuenca et al. 37 showed that this model fit better to 'Fortune' mandarin data (SDR mechanism) than total or no interference models. However, since selected markers are located close to centromeres (as explained above), for our data, the Cx(CO) 4 model and the total interference model are equivalent (Figure 2). To simplify mathematical calculations of probabilities, the total interference model was used. Marker-centromere distances (d) in Morgan units were estimated from the centromere locations 36 in the clementine reference genetic map 53 .  For each restitution model, the probability of a single unreduced gamete [P(G)] presenting the observed allelic configuration for i unlinked markers (M i ) is the product of the probabilities of the observed genotype at each locus, P(G) 5 pP Mi , and therefore the LOD value to compare the SDR/FDR models is the sum of the LOD at each locus, where P Mi and LOD Mi are the probability and the LOD value of the observed genotype at the locus I, respectively. As an example, if three unlinked loci (M 1 , M 2 and M 3 ) were heterozygous, homozygous and homozygous, respectively, the probabilities of observing such gametes [P(G); (M 1He -M 2Ho -M 3Ho )] are, respectively, under SDR and under FDR The LOD value used to compare the probabilities of SDR/FDR models is where d i is the distance from the locus i to its centromere. LOD scores greater than 3 (the probability of the observed gamete is more than 1000-fold higher under the SDR model than the FDR one; LOD3) or greater than 2 (the probability of the observed gamete is more than 100-fold higher under the SDR model than the FDR one; LOD2) were considered as thresholds indicating that SDR was the mechanism involved in the single unreduced gamete formation, whereas LODs below 23 (or 22) indicate that FDR was the underlying mechanism; for LOD scores between 23 and 3 (or between 2 and 22), we considered that the mechanism could not be determined significantly.
Identification of the restitution mechanism at population level.
Considering an infinite population of 2n gametes and a single locus, the probability of observing a sample of gametes [P(Pop)] with j heterozygous and k homozygous individuals under the SDR and FDR model are, respectively: where C is a combinatory coefficient constant for the observed sample. Therefore, If i independent loci are analysed, the probabilities of the observed sample of gametes occurring under the SDR [P SDR (Pop)] or FDR [P FDR (Pop)] models are the products of the probabilities of the observed sample at each locus and therefore, where P(M iHe ),P(M iHo ), ji, ki, and d i are, respectively, the probability of heterozygous individuals, probability of homozygous individuals, number of heterozygous individuals, number of homozygous individuals and distance to centromere for the locus i. At the population level, LOD scores greater than 3 were considered to indicate that SDR was the mechanism involved in unreduced gamete formation, whereas LODs below 23 indicated that FDR was the underlying mechanism. When LOD scores between 23 and 3 were obtained, we considered that the mechanism could not be significantly determined.
Studies to check the power of the method. We assessed the power of our method using simulated samples of diploid gametes arising from either the FDR or SDR mechanisms. From a theoretical infinite population with heterozygous and homozygous genotype frequencies directly defined by the considered locus-centromere distances [(P FDR (M He ) 5 (1 2 d); P FDR (M Ho ) 5 d; P SDR (M He ) 5 2d; P SDR (M Ho ) 5 (1 2 2d)] as explained above), individual gametes with information for nine markers (the haploid number of chromosome in Citrus) were randomly generated. Then, the LOD values of these gametes were calculated as described above. We estimated the proportion of gametes with significant solutions at LOD3 (LOD value. 3 or ,23) and LOD2 (LOD value. 2 or ,22) when analysing 1-9 markers mapped at the same centromere distance, but in different chromosomes, and for distances ranging from 0 to 20 cM.
Gamete populations were also generated in order to estimate the theoretical number of hybrids that would need to be analysed to obtain significant conclusions for a mechanism, depending on the number of markers used and the marker-centromere distances. From each theoretical population (FDR and SDR populations), 200 replicates of populations (with 1-100 gametes/population) were randomly generated. The generated population LODs were calculated as described above and, for each number of considered markers at a given centromere distance, we identified the minimum number of gametes needed in order to be able to reach a true significant conclusion for at least 99% of the generated populations (99% of replicates with LOD. 3 for SDR or LOD ,23 for FDR).
From 1000 randomly selected gametes with nine independent markers (at the same distance from their respective centromere) from a theoretical SDR and FDR infinite population, we analysed the percentage of replicates with significant LOD value (i.e., LOD3 and LOD2) at a given distance considering the data from 1-9 markers.
Curves corresponding to a significant true answer are shown in Figure 3. All curves display a vertical drop to 0, corresponding to the distance when the maximum theoretical LOD score (when all considered markers are in the most favourable combination for the model) is below the considered threshold. Compared with LOD3, the LOD2 threshold allows maintenance of the progressive decrease of the significant answer with increasing distance. As distance increases, more markers are needed to maintain a high level of significance.
At LOD3, the usefulness of only one marker is null for both the SDR ( Figure 3a) and the FDR (Figure 3b) models at a very low marker distance from the centromere (0.1 cM). At 5 cM, at least five (for SDR) and six (for FDR) markers are necessary to maintain a 90% true significant identification of the mechanism. When all markers were at least 10 cM from centromeres, nine markers were necessary to provide a 90% true significant answer for the SDR population, but only 78% significant true answers were obtained with nine markers for a FDR population. At 15 cM and nine markers, the true identification rates fall to 44% and 24% for SDR and FDR, and, at 20 cM, to 6.6% and 0%, respectively.
If the LOD2 threshold is considered, a single marker was informative in the first cM interval for the SDR model ( Figure 3c) but significant replicate number decreases very quickly for FDR (Figure 3d). At 5 cM, at least four and five markers were necessary to provide 90% of true significant identification for SDR and FDR populations, respectively. With all markers at 10 cM from centromeres, at least eight markers were necessary to provide 90% true significant answers with an SDR or FDR population. For nine markers, the rate of true significant identification is improved for the SDR population at 15 cM and 20 cM (70% and 19%, respectively) as well as for the FDR population (59% and 14%, respectively) when compared with LOD3.
The rate of false identification (FDR significant conclusion [i.e., LOD ,23 or LOD ,22] for a SDR population, or reciprocally) is very low for both models (SDR or FDR), whatever the centromere distance and the number of considered loci. At LOD3, it is under 0.1% for all conditions and it remains below 1% for the LOD2 threshold ( Figure S1).
At the population level ( Figure 4), due to the probabilities of the 2n gamete genotypic structure under FDR and SDR models becoming similar as the distance to centromere rises, the number of hybrids needed to obtain significant conclusions for a mechanism increases as an exponential function and is more pronounced when analysing a single marker only.
For a concrete locus-centromere distance, the number of hybrids (h m ) needed is related to the number of markers analysed as: h m 5 h 1 /m, being h 1 the number of hybrids needed for one marker and m, the number of markers analysed. For example, for a SDR population model, at 20 cM, 58 hybrids are necessary if analysing only one marker, 29 are necessary for two markers, and 20 are necessary for three markers. The number of hybrids needed to provide the same level of conclusive answer is slightly lower for FDR (50 hybrids for one marker at 20 cM). With these population sizes, no false mechanism identification occurred for the generated populations.
Inference of allelic configuration of triploid hybrids and corresponding 2n gametes. Assignment of allelic configuration in heterozygous triploid hybrids was performed using the MAC-PR method for SSR markers 55 ( Figure S2) adapted for Citrus by Cuenca et al. 37 . However, this method uses a 151 dosage correction from the relative allele signals for heterozygous diploid parents (A 1 5A 2, A 1 5A 3 or A 3 5A 4 ). Therefore, for markers displaying A 1 A 2 3 A 1 A 3 configuration in the parents, among the heterozygous triploid hybrids only the A 1 A 2 A 2 /A 1 A 1 A 2 or A 1 A 3 A 3 /A 1 A 1 A 3 configurations can be determined using these methods, while no direct allele dosage estimation can be obtained for a triploid with A 2 /A 3 heterozygosity without a reference for the relative A 2 /A 3 allele signal. Similarly, for markers displaying the A 1 A 2 3 A 3 A 4 configuration, it is not possible to directly estimate allele dosage for the heterozygous triploid hybrids. In these situations, it is possible to use a 151 dosage correction between A 1 and A 3 (for example) from the peak ratios of A 1 A 2 A 3 triallelic hybrids observed in the same family.
A concrete example can be the genotype assignment for the ''Ellendale 3 Fortune'' population (Additional file 1) and the mCrCIR07F11 marker. ' shows 160/162/164 allele configuration for the same marker, and therefore, allows using a 151 dosage correction for relative allele signals between 160/164 and 162/164. All this 151 dosage corrections allow inferring the allele dosage for this marker in the remaining hybrids within this population.
Identification of the unreduced gamete parental origin. For each hybrid, determination of the 2n gamete origin was carried out by identifying the parent that passed double genetic information to the hybrid. For markers displaying A 1 A 2 3 A 1 A 1 or A 1 A 2 3 A 1 A 3 configurations, the identification of A 1 A 2 A 2 or A 2 A 2 A 3 (i.e., double dosage of A 2 , the allele specific to the female parent) configurations in the hybrid would imply a female origin of the 2n gamete. For the second combination, the observation of A 1 A 3 A 3 or A 2 A 3 A 3 (i.e., double dosage of A 3 , the allele specific to the male parent) would indicate a male origin.  For markers displaying A 1 A 2 3 A 3 A 3 configurations in the parents, the identification of A 1 A 2 A 3 , A 1 A 1 A 3 , or A 2 A 2 A 3 configurations in the hybrid resulted from a maternal origin of the unreduced gamete, while A 1 A 3 A 3 or A 2 A 3 A 3 resulted from a paternal origin.
For markers with A 1 A 2 3 A 3 A 4 parental configuration, the identification of the following genotypes (A 1

A 1 A 3, A 1 A 1 A 4, A 1 A 2 A 3, A 1 A 2 A 4, A 2 A 2 A 3, A 2 A 2 A 4 ) and (A 1 A 3 A 3, A 2 A 3 A 3, A 1 A 3 A 4, A 2 A 3 A 4, A 1 A 4 A 4, A 2 A 4 A 4 )
implied, respectively, female and male origin of the 2n gamete.
Once the parental origin of the 2n gamete was identified, the inference of the allelic configurations of the unreduced gametes from triploid hybrid genotyping was carried out as previously described by Cuenca et al. 37. A summary of triploid genotypes allowing inference of the 2n gamete genotype and origin, either directly or by inferring allele doses from diploid parents or reference triploid hybrids, is given in additional table S1. Loci with complete differentiation between the parents (A 1 A 2 3 A 3 A 4 or A 1 A 2 3 A 3 A 3 ) are by far the best configurations as they allow unequivocal identification of the 2n gamete parent and unambiguous determination of 2n gamete structure. When the parental origin of a 2n gamete has been determined by triploid patterns at other loci, the 2n gamete structure can be inferred for all triploid hybrids for the loci sharing a single allele between the two parents.
Following the previous example for the ''Ellendale 3 Fortune'' population (Additional file 1), hybrid #1 shows 152/160/162 allele configuration for the mCrCIR07F11 marker. This situation allows the unequivocal identification of the maternal parent as the 2n gamete producer for this hybrid. Similarly, the observed configurations for the rest of the hybrids within this population (152/160, 152/ 162, 160/164 and 162/164) allow the identification of the maternal parent as the 2n gamete producer for all hybrids with information for this marker. Once the female parent has been identified as the 2n gamete producer for a hybrid, for example hybrid#1, we can infer the 2n female gamete and male gamete configurations from the allelic and dosage observations for the other markers. In the situation that it is not possible to infer the 2n gamete producer (hybrids #30, #36, #57 and #69), additional markers have been analysed.
In this work, 543 citrus triploid hybrids were analysed and allelic patterns of the markers (Additional file 1) allowed unequivocal identification of the origin of the double dosage for each analysed triploid hybrid. Female parents were the unreduced gamete producers leading to triploid hybrids for all studied parental combinations. No triploid hybrid arising from unreduced pollen was found. It was therefore possible to infer the maternal 2n gamete genotypes for all hybrids and loci.
Identification of the restitution mechanism at the individual level in citrus. Between 4 and 7 SSR and InDel markers have been used to analyse all 543 triploid hybrids. Allelic segregation for homozygous diploid gametes has been analysed within each family by a chisquared test. Some markers deviated from the 151 expected ratio in populations with a reduced number of hybrids. Considering population with more than 20 hybrids, only the mCrCIR06B05 marker in ''Fortune''-derived populations (x 2 5 5,531; p-value 5 0,018) and for the CF-ACA01 and CI07C07 markers in ''Hernandina 3 Nadorcott'' population (x 2 5 9,524; p-value 5 0,002 and x 2 5 6,737; p-value 5 0,009, respectively) showed significant segregation distortions.
Heterozygosity restitution ranged between 0% and 100% for the analysed 2n gametes, with a mean value of 14,87%, whereas for markers, HR ranged between 0% and 54%, with a mean value of 15,49%. Distribution of HR for both hybrids and markers is clearly biased to values near 0% (Table S3).
LOD score testing the SDR/FDR hypothesis was estimated for each individual 2n gamete from its inferred genotype, as described in the statistical method section. Positive LODs were found for 523 hybrids of the 543 analysed ( Figure 5), suggesting a large global predominance of the SDR mechanism. The LOD distribution for clementine 2n gametes is displaced to higher values when compared with the distribution for 'Fortune' and other mandarin 2n gametes Fifty-seven diploid gametes occur with LOD between 9 and 10, and these correspond mostly to the 'Fina' clementine progeny ( Figure 5).
When using LOD2 as the threshold, the percentage of gametes with unidentified origins decreased to 9%. Gametes attributed to SDR increased to 90.1%, with significance achieved for an additional three clementine gametes, another ten from 'Fortune' and an extra 11 from other mandarins. No additional 2n gametes arising from FDR were identified.
Identification of the restitution mechanism at population level in citrus. At the population level, all LOD scores were greater than 3, even for small populations with fewer than five hybrids. Therefore, SDR was identified as the preeminent restitution mechanism producing 2n megagametophyte for all female parents analysed (Table 1).

Discussion
A powerful maximum-likelihood method to compare FDR and SDR hypothesis at the individual and population level has been developed. In sexual polyploidisation, polyploids are generated by the formation of unreduced diploid gametes. From the cytogenetic point of view, two types of meiotic nuclear restitution leading to 2n gamete formation are considered, FDR and SDR 5,9,56,57 .
The identification of the meiotic restitution mechanisms driving the formation of unreduced gametes is complex. However, molecular marker analysis is useful in such identification, and several methods, generally assuming complete chiasma interference, have been developed previously. The method proposed by Cuenca et al. 37 , based on the HR restitution curve along a linkage group, allows simultaneous identification of the restitution mechanism, raw centromere location, and comparison of several chromosome interference models. This approach is based on the analysis of genotype frequency in relatively large populations and provides global results of the preeminent mechanism; however, determination of the potential coexistence of the two mechanisms in the same progeny was not possible.
In this study, a maximum-likelihood approach based on marker HR with centromeric loci was developed and successfully applied both at the individual and population levels. Knowledge of marker-centromere distances greatly improves the statistical power of the comparison between the SDR and FDR hypotheses. For example, in this study, the restitution mechanism was identified in 'Fortune' as SDR at the population level with a LOD(SDR/FDR) of 933, whereas for the same population using 12 markers without information regarding marker-centromere distance, but with HR values under 50% 37 , the mechanism was identified as SDR with a LOD value of only 6.8. With the method proposed in the present paper, conclusions at the population level could therefore be obtained from smaller numbers of progeny and fewer markers than with non-located markers. The theoretical limits of our method were assessed by the simulation of populations arising from FDR or SDR mechanisms. At the population level, considering that the independent markers used are at the same distance from their respective centromeres, the power of the statistical test was directly linked to the product of the number of markers and the number of individuals. That means that the efficiency would be the same for n individuals with m markers as for 2n individuals with m/2 markers. Moreover, the necessary n?m geno-typing points increase exponentially with increasing distance of the marker to the centromere. For example, to obtain a significant answer higher than 99%, it would be necessary a n?m higher than fifty-seven for markers at 20 cM, while a n?m value higher than eight and four would be sufficient for markers at 5 cM and 1 cM, respectively. The selection of markers as close as possible to their centromere is therefore a key element for successful analysis when low numbers of individuals and markers are used.   In the study of citrus 2n gamete progenies, significant results were obtained for all analysed populations, even for populations lower than five individuals.
One major improvement of our approach over existing methods is that it allows the identification of the restitution mechanism for each individual unreduced gamete. Simulation studies indicated that the proximity of markers to the centromeres is a key factor. With markers closer than 5 cM, five markers are sufficient to result in 95% significant answers, but significance diminishes to less than 78% and 0% for nine markers at 10 cM and 20 cM from their centromeres, respectively.
The importance of selecting markers very close to the centromere to obtain significant conclusions at the individual level is illustrated by the results of our citrus analysis. Indeed, a very high percentage of significant results at the individual level (95.4%) and with high LODs were obtained for the 'Fina' clementine progeny analysed with markers closer to centromeres than the other progenies.
Other mechanisms than meiotic restitution, also leading to unreduced gamete formation have been described, like pre-meiotic and post-meiotic genome doubling. However, both these mechanisms have only rarely been documented in plants 4 . Nevertheless, genetic configurations of the resultant unreduced gametes would be different than FDR or SDR-gametes.
In animals, pre-meiotic genome doubling leads to parthenogenesis 58 . Doubled chromosome number is reduced through meiosis and the resulting daughter chromosomes pair in the first meiotic prophase with their genetically identical counterpart. As a result, the genotype of the parent is passed on to the offspring unchanged. Analysing centromeric markers, this situation could be confused with FDR mechanism, if all markers resulted fully heterozygous in the offspring. However this situation was observed for only one of the 543 citrus diploid gametes analysed in the present work.
In case of post-meiotic doubling, meiotically formed haploid spores undergo an extra round of genome duplication, and consequently yield fully homozygous 2n gametes. This situation could be also obtained in case of SDR, if all analysed centromeric markers resulted fully homozygous in the offspring. In the present work, 268 unreduced gametes resulted fully homozygous, but some heterozygous loci were observed in other unreduced gametes within the same populations, discarding a complete post restitution model at population level. At individual level, the analysis of telomeric markers allow analysing if homozygosity is maintained along the chromosome arm, and therefore concluding if the diploid gametes resulted from post-meiotic doubling or SDR. As an example, out of the 87 diploid gametes of ''Fina'' clementine analysed in the present study, 58 were totally homozygous for the 6 centromeric loci analysed. However, for the same population analysed with 104 markers including centromeric and telomeric loci, the HR at individual level ranged between 25% and 65% 36 . This broader marker study totally discard the pre-and post-meiotic doubling mechanisms at individual level. Similarly, additional marker information for the other families (data not shown) discarded the pre-and post-meiotic doubling hypothesis.
2n megagametophytes arising from SDR are the preeminent source of triploid occurrence in 2x 3 2x hybrid populations using mandarin-like parents. Spontaneous occurrences of citrus triploid hybrids arising from the union of 2n megagametophytes with haploid pollen have been noted since the seventies 34,32,59 . However, the frequency of such events is generally low 32,60 and extensive breeding programs based on this type of hybridisation require very effective methodologies for embryo rescue and ploidy evaluation of large progenies mandarins 32 . To date, very few cases of citrus triploid hybrid occurrence in 2x 3 2x crosses from unreduced pollen have been reported 35,38 ; our unpublished results].
In this study, the mechanism leading to triploid formation in 2x 3 2x crosses was elucidated, both at individual and population level, for nineteen varieties used as female parents.
When using the LOD3 threshold, SDR was identified as the restitution mechanism for 85.3% of the analysed triploid hybrids, no significant conclusions were obtained for 14.1% of the hybrids, and 0.6% of the analysed triploids were derived from FDR (one triploid hybrid arising from 'Ellendale' and two arising from 'Fortune'). When the LOD2 threshold was considered, the percentage of individuals with unidentified origin decreased to 9% and SDR levels increased to 90.1%. Moreover, we conducted individual level analysis of previously studied 'Fortune' mandarin progeny 37 and the progeny arising from 'Fina' 36 , and we confirmed SDR at the individual level for most hybrids, which concurs with the global-level conclusions proposed in these two studies. In the current study, six clementine genotypes were also analysed to discover their unreduced gamete formation mechanism. Results indicate that SDR is the most probable mechanism in the clementine group, in agreement with previous conclusions of Luro et al. 35 . For the other mandarin varieties, SDR was also the most probable mechanism at the individual level and, therefore, also at the population level. Taken together, our data and those of others suggest that SDR is the major mechanism underlying unreduced megagametophyte formation in most mandarin genotypes.
The mechanism leading to unreduced eggs or pollen was previously elucidated for several plant species 4,12 . Bretagnolle and Thompson 5 identified that both FDR and SDR are responsible for 2n pollen formation, while SDR is more frequent in the formation of 2n eggs. In potato, 2n pollen arises predominantly by FDR 16 , while 2n megagametophytes arise most frequently by SDR 61 , although SDR-FDR mixture in the formation of 2n eggs has been also found 62 . Bilateral sexual polyploidisation can arise either from FDR and SDR in Lilium 8,47,63 and alfalfa 22 . Moreover, other examples of plant species where FDR and SDR may occur simultaneously has been described 5 , underlining the influence of genotype and environment on the expression of meiotic abnormality factors 64,65 .
Implications for citrus triploid breeding. The genetic and phenotypic consequences of FDR and SDR gametes are highly divergent, and are of potential importance for breeding applications, due to the different parental heterozygosity rate that each mechanism transmits to the polyploid progeny 4 .
Under FDR, the resulting 2n gametes are heterozygous from the centromere to the first crossover point, and hence the gametes retain most parental heterozygosity and epistatic interactions. With the SDR mechanism, the resulting 2n gametes are homozygous from the centromere to the first crossover point, but retain parental heterozygosity on the telomeric regions 12 . As a result, SDR-2n gametes confer a lower level of heterozygosity than FDR-2n and show a corresponding greater loss of parental epistasis 5 If an SDR origin of 2n gametes is assumed for most mandarins, sexual polyploidisation may lead to a reduced average of HR and, therefore, loss of epistatic interactions. Therefore, when compared with interploid crosses using doubled diploids 67,68 , the sexual polyploidisation strategy should produce more polymorphic progeny by creating a larger number of new multilocus allelic combinations 4 . This provides the opportunity to select innovative products within the perspective of market segmentation as a commercial strategy.
Consequences of the SDR restitution mechanism would be clearly apparent for a character controlled by a single gene. If the gene is heterozygous in the female parent, most unreduced gametes will be homozygous for that gene if it is located near the centromere, but gametes will be mostly be heterozygous for the gene if it is telomereproximal (partial interference model; 37 ). Recently, Cuenca et al. 69 analysed the inheritance of resistance to Alternaria brown-spot fungal disease in citrus triploid progenies arising from crosses between diploid parents. They demonstrated that the resistance was controlled as a recessive trait by a single locus located near a centromere (10.5 cM from the centromere of chromosome 3). If a susceptible female parent is heterozygous, the SDR mechanism leads to approximately 80% homozygous unreduced gametes, half of having two resistant alleles. As Alternaria resistance is a major selective trait when maternal heterozygous parents are used, sexual polyploidisation is a more effective strategy than the use of interploid crosses, which will result in only 16.7-22.5% of progeny being resistant.
For dominant traits controlled by a single centromeric locus, interploid crosses should be more interesting than 2x 3 2x crosses. For characters controlled by major loci more distant than 30 cM from the centromere, the efficiency of the two triploid breeding strategies would be relatively similar. This information is now being used routinely in the mandarin triploid breeding program carried out in Spain 70 .

Methods
Plant materials. Analyses were performed using 543 triploid hybrids derived from 19 different mandarin genotypes as female parents in 2x 3 2x cross populations ( Table 2). The mandarin genotypes include six clementine and 13 hybrid mandarins. Triploid hybrids were grown at the 'Instituto Valenciano de Investigaciones Agrarias' orchards in Moncada, Valencia, Spain. Practical details for the establishment of triploid populations from 2x 3 2x crosses by embryo rescue and triploid selection by flow cytometry can be found in Aleza et al. 32 . All triploid genotypes in the present study were selected after ascertaining their hybrid nature by molecular marker analysis (data not shown). Taxonomic information about both female and male parental accessions is given in additional table S2 according to the standard classification system for the Citrus genus 71,72 .  Selection of centromeric markers for the analysis of 2n gamete origin and formation mechanisms. Triploid citrus hybrids obtained in 2x 3 2x hybridisations arise from unreduced megagametophytes [32][33][34][35]59,60 . Therefore, markers heterozygous for the female parent and displaying polymorphism between the two parents were primarily selected for the molecular characterisation of triploid hybrids and analysis of 2n gamete origin.
Centromere positions in all nine clementine chromosomes are known 36 . Molecular markers within 20 cM of the centromere were used in this study because centromereproximal markers are more informative with regard to the mechanisms of 2n gamete formation than centromere-distal markers 53 . Within this range, the lowest expected HR rate is greater than 80% for FDR, while the highest HR for SDR is 40% (Figure 2). Twenty-five markers were selected for genotyping the triploid progeny. Between four and seven of these centromeric markers were used for genotyping each population ( Table 3).
Genotyping of triploid hybrids. DNA extraction. Leaf DNA of triploid hybrids and their parents was isolated using the Plant DNAeasy kit from Qiagen Inc. (Valencia, CA, USA), following the manufacturer's protocol.