Comparative linkage mapping of diploid, tetraploid, and hexaploid Avena species suggests extensive chromosome rearrangement in ancestral diploids

The genus Avena (oats) contains diploid, tetraploid and hexaploid species that evolved through hybridization and polyploidization. Four genome types (named A through D) are generally recognized. We used GBS markers to construct linkage maps of A genome diploid (Avena strigosa x A. wiestii, 2n = 14), and AB genome tetraploid (A. barbata 2n = 28) oats. These maps greatly improve coverage from older marker systems. Seven linkage groups in the tetraploid showed much stronger homology and synteny with the A genome diploids than did the other seven, implying an allopolyploid hybrid origin of A. barbata from distinct A and B genome diploid ancestors. Inferred homeologies within A. barbata revealed that the A and B genomes are differentiated by several translocations between chromosomes within each subgenome. However, no translocation exchanges were observed between A and B genomes. Comparison to a consensus map of ACD hexaploid A. sativa (2n = 42) revealed that the A and D genomes of A. sativa show parallel rearrangements when compared to the A genomes of the diploids and tetraploids. While intergenomic translocations are well known in polyploid Avena, our results are most parsimoniously explained if translocations also occurred in the A, B and D genome diploid ancestors of polyploid Avena.


Description of Supplementary Data (Accompanying .xlsx file)
GBS raw sequence read data are deposited at the NCBI short read archive under project numbers PRJNA517481 for the diploid the A. strigosa X wiestii mapping population, and PRJNA517323 for A. barbata.
However we also want to make available the genotype calls, error calls, and homologs identified in this study, since these are likely to be more accessible to researchers wishing to extend/examine our work.

1) GBS Markers mapped in Avena barbata
For each biallelic locus and Presence-Absence Variant, we give the sequence of the GBS tags (alleles), the inferred position on the linkage map, and the genotype calls for all RILs.
2) GBS Markers mapped in Avena strigosa X wiestii As above for the diploid mapping population 3-5) Genotyping Error Calls For each of the stringently filtered loci that we used to construct the framework map, we screened the map for likely genotyping errors as described in Supplementary Methods.The data file lists the genotype calls for each RIL before and after error checking, and compares the inferred map positions for the corrected vs uncorrected linkage maps.

6) Homologous Markers across maps
Sorted by species, with the lower ploidy mapping population on the left.Gives name, and map location for each of the inferred homologous tag pairs, and the number of base pair mismatches 7) Paralogs within A. barbata As above for Homologs between species.Sorted by map position with the lower number linkage group (e.g.Ab_1 vs Ab_12) on the left

Supplementary Methods
We present here some illustrations and details of our analysis.Note that figures referred to in these supplementary methods are labelled A-D, to distinguish them from figures S1, S2, etc. cited in the main text.

MSTMap
The first step of the MSTMap algorithm separates loci into linkage groups (prior to ordering the markers) based upon the number of recombination events observed between them.Since loci on separate chromosomes may show less than 50% recombination by chance, a threshold number of observed recombinations is set based on the probability of observing a given number of recombinations between unlinked loci.Then linkage groups are formed such that loci within a group have observed recombination less than the threshold to at least one other member of the group.Threshold is set by the researcher through a parameter, e, which defines the probability of observing a given recombination level by chance between unlinked markers (through a calculation given in Wu et al., 2014 45 ).A high value of e gives a single linkage group with numerous gaps of ~50 cM between adjacent markers.We gradually reduced e in successive runs until the correct number of linkage groups was obtained.Further reductions of e would eventually begin to break up linkage groups at some of the longer gaps between markers (such as the 20cM gap on Ab_2).
Haplotag Clusters, Paralogs and Presence/Absence Variants (PAVs) The Haplotag pipeline (Tinker et al, 2016 44 ) treats the entire GBS tag sequence as a haplotype which may represent a segregating allele at a locus.This is in contrast to the SNPbased approach which considers each SNP as a separate (though of course possibly tightly linked) locus.Figure A shows an example of Haplotag output for a cluster of seven tags with similar sequences but with more than one SNP.These haplotypes can be resolved into three segregating biallelic loci and one segregating presence absence variant.Loci 1 and 2 appear to be tandem repeatsthey show very tight linkage disequilibrium and map to the same position (Fig B).Locus 3 is unlinked to loci 1 and 2. Haplotag identifies these loci by eliminating tag pairings that could not be alleles at a single locus.For example, a model that posits the tags 140113 and 140115 (mesic alleles at locus 2 and 3) are alleles at a single locus would be rejected because it would be missing from too many taxa as well as exceeding the threshold for heterozygosity.The remaining Tag 140117 shows fixed differences between the parents (it is only present in the xeric ecotype), and shows Mendelian segregation.It also shows strong association with the xeric allele at loci 1 and 2 in the recombinants and maps near to those loci (Fig B) Mapping Secondary Loci Figure B illustrates our method for placing secondary loci (less stringently filtered), PAV's, and RFLP and AFLP markers on the linkage map.The recombination fraction is computed for each marker with each bin of the framework map.Recombination fractions are roughly 0.5 for most bins, indicating no linkage.Map positions are assigned to the bin with the lowest recombination fraction.
The traces show the loci and PAV from the cluster of tags shown in Fig A .Loci 1 and 2 (red and yellow) and the PAV (green) map very close to each other.Loci 1 and 2 have identical traces, but the increased error in the PAV gives a slightly higher minimum RF.Locus 3 from that cluster maps to a different chromosome (light blue).Most loci showed close linkage (98% of rf <=0.Error Filtering. During our map construction, the stringently filtered loci were further screened for likely genotyping errors after an initial map was inferred.Fig D shows genotype data for selected RILs over a short region of LG 1 in the A. strigosa X wiestii map.The GBS markers between the solid black lines were originally inferred to cover 6.7 cM.However, each recombination event in this region produced double recombination around a single markeran unlikely occurrence.Moreover, each of these putative double recombinants was found in heterozygous state, which is also highly unlikely in a selfing organism.When the original genotype calls (left) are thinned for errors (by removing any double recombinant involving fewer than 3 markers within less than 2 cM) there is seen to be no recombination events among these markers, placing them all within a single "bin".A_barbata_Map_Output GBS passport file for tag cluster: RGLP53537 Best model #2 fits 179 genotypes, with 1% heterozygotes.Consensus: TGCAGGGCCTGGCRCGAGGCCGTCGACTCYTGCGGCCTCCTACGGGCCGACCTGCTCCCGCTCT Best model #3 fits 176 genotypes, with 1% heterozygotes.Consensus: TGCAGGGCCTGGCGCGAGGCCGTCGACTCSTGCGGCCTCCTGCGGGCCGACCTGCTCCCGCTCT Table S3: Statistical significance of the number of homologs/paralogs between pairs of linkage groups.Given the locations of all markers that had a homolog (or paralog in table S3B), the pairings between these markers were randomized 10,000X.For each pair of LGs, the fraction of randomized data sets which exceeded the observed number of homologs for that pair is given.Pairs with < 0.05 (ie occurred significantly more often than expected by chance alone) are highlighted in green.S2), but reveal a strong similarity between the two maps (Blue diamonds with connector lines)."Non -Reciprocal " Transloc.

B Genome Arrangement
"X" Genome "Y" Genome "Z" Genome (To be complete, block "E" becomes inverted here.) 1, Fig C) to one bin of the framework map and extensive recombination (rf ~ 0.5) to most other bins.

Figure A :
Figure A: Example output of the Haplotag pipeline for a cluster of GBS tags that resolve into three segregating loci and one Presence-Absence Variant (PAV).A sample of genotype calls is given on the next page.

FigFigure D :
Fig A (cont'd)Best selected models are on the left --------haplotypes excluded from the selected locus model(s) are on the right Figure S1.Single vs Double restriction digest GBS linkage maps in A. barbata.The linkage map presented in the text (Fig.2) used the double digest GBS library preparation protocol of Poland et al.42 .The linkage map from the single digest GBS protocol 43 gave a closely similar map.For each LG, the double digest map is shown on the left while the single digest map is on the right.Loci detected by both protocols were relatively rare (Supplemental TableS2), but reveal a strong similarity between the two maps (Blue diamonds with connector lines).

Figure S2 Fig
Figure S2Hypothesized possible evolutionary transition between A and B genome karyotypes.
number of taxa that contain this haplotype (For details of model selection click HERE )

Table S3A .
Homologies between A. barbata and A. strigosa X A. wiestii

Table S3B .
Paralogs within A. barbata   Table S3C Homologies between A. barbata and A. sativa Table S3D Homologies between A. sativa and A. strigosa X A. wiestii