Genomic distances reveal relationships of wild and cultivated beets

Cultivated beets (Beta vulgaris ssp. vulgaris), including sugar beet, rank among the most important crops. The wild ancestor of beet crops is the sea beet Beta vulgaris ssp. maritima. Species and subspecies of wild beets are readily crossable with cultivated beets and are thus available for crop improvement. To study genomic relationships in the genus Beta, we sequence and analyse 606 beet genomes, encompassing sugar beet, sea beet, B. v. adanensis, B. macrocarpa, and B. patula. We observe two genetically distinct groups of sea beets, one from the Atlantic coast and the other from the Mediterranean area. Genomic comparisons based on k-mers identify sea beets from Greece as the closest wild relatives of sugar beet, suggesting that domestication of the ancestors of sugar beet may be traced to this area. Our work provides comprehensive insight into the phylogeny of wild and cultivated beets and establishes a framework for classification of further accessions of unknown (sub-)species assignment.


Supplementary Note 1. Testing the impact of organellar DNA
We evaluated if the presence of organellar sequences in the data had an impact on phylogeny calculation. We removed organelle-derived sequencing reads from the set of 47 samples (described in paragraph "Assessing the discriminatory power of Mash" of the main text) by mapping against the genomes of the sugar beet chloroplast 1 and mitochondrion 2 , respectively.
Reads not matching organellar genomes (about 93%) were used to create k-mer sketches based on which we determined Mash distances for tree calculation. The correlation between the distance matrices before and after removal of organelle-derived reads was r=1 (Mantel test 3 ) and the resulting tree was identical with the tree in Fig. 2. We concluded that organellar DNA did not impact our phylogenies. The representative genome sketches generated by Mash that are used for distance calculation comprise the 10,000 least-frequent 21-mers of the input data (omitting single-copy k-mers). We can expect that k-mers from organellar DNA are contained in high copy number in the sequencing data and therefore not represented in the sketches.

Supplementary Note 2. Comparison of genetic distances obtained from two approaches
We compared distances calculated from single-nucleotide polymorphisms assessed by microarray-based genotyping to Mash distances, using a subset of 255 B. v. maritima accessions. To avoid intra-accession heterogeneity, the same DNA samples were used to generate either data set. The genotyping data were obtained using a custom microarray that resolves variant positions identified in the RefBeet sugar beet reference genome in comparison to other sugar beet lines 4 . Illumina sequences of the same set of accessions were used to calculate Mash distances. In order to evaluate the correlation between both distance matrices, we performed a Mantel test 3 . This test showed that the matrices were highly correlated, reflected by a correlation coefficient of r=0.7448. Furthermore, we generated individual phylogenetic trees with the same settings based on the two distance matrices of 255 accessions obtained from microarray-based genotyping and from Mash sketches, respectively. Although the distance matrices based on these two datasets had been computed in different ways, the two phylogenetic trees were very similar and the overall topology of the phylogenetic trees was identical. To assess the extent of similarity between the phylogenetic trees we calculated quartet distances 5 between them. In this method, the topology of all possible sets of four nodes is compared between trees. As a benchmark, we first determined the relatedness between two random trees consisting of 255 nodes, equivalent to 172 million quartets. The random trees shared 57 million quartets, hence, displayed a quartet distance of 67%. In contrast, the comparison of the trees generated from microarray-based genotyping data and Mash sketches yielded 156 million equal quartets, resulting in a quartet distance of 12% indicating that their similarity is unlikely due to chance alone.
By adapting the settings for phylogenetic tree calculation we could further increase the similarity between the trees. The options "global rearrangements" (all possible placements of subtrees are tested) and multiple rounds of "jumbling" (entering accessions in random order) decreased the quartet distance to 9% (see Supplementary Table 1). As the two data sets were derived from two independent methods, we figured that the closer the two resulting phylogenetic trees were to each other, the more likely it is that the resulting topology is correct.

Supplementary Note 3. Sugar beet phylogeny details
In the tree comprising 290 B. v. vulgaris accessions and two outgroups ( Closer inspection of the accessions to the right of the node marked in yellow revealed that four of these accessions (PI 285592 through PI 285595) had a description mentioning "crassa" (indicating fodder beet) in the USDA/GRIN database and that another one (PI 502293), labelled as "Beta sp.", had the information "fodder beet" in the Plant Inventory report that is linked to its passport data. These five accessions (brown bracket in Supplementary Fig. 5) clustered together close to the root of the tree. One accession (PI 357367, pale orange arrow) was called "cicla" (indicating leafy beets) in the database and another one (PI 222769, purple arrow) had the description "used as a vegetable" probably indicating either leaf beet or table beet. To study the relationship between beet cultitypes a larger number of accessions would be necessary.
The accessions to the left of the node marked in yellow (close to the node) seem not to be too far off from the accessions to the right of the "green" node in terms of breeding history, but they also seem disjointed, with a self-fertile type in there, C500 (PI 663878, marked by a blue arrow in Supplementary Fig. 5) which is an annual O-type tester (see passport data of the USDA GRIN database). Material from Fargo's breeding program (F1042 = PI 674103, exceptionally Cercospora sensitive), winter-hardy selections from Kiel, Germany (EL-A1200027), and NSCR (EL-A1402163), and some introgression lines between Western and Eastern US materials are among these accessions (blue circle).
The subtree to the left of the "green" node (more Western US-like) has a strong cluster of accessions that trace to Utah releases as a group (PI 610265, and PI 633945 -PI 633950; blue bracket), a group of narrowly related materials in a set of California genetic analysis lines for Fusarium resistance (EL-A1200035 through EL-A1200037; blue bracket), and a pair of sibling releases extracted from populations selected and evaluated for sugar beet cyst nematode resistance donated by germplasm contributions selected out of Salinas, California (PI 664914, PI 664916, PI 664922 and EL-A1600008; blue bracket). This subtree also encompasses the lines C869 (PI 628755, blue arrow) and US H20 (PI 631354, blue arrow), which is a very successful legacy hybrid whose seed parent is from Utah but whose pollen parent SP22 is Eastern US, here given as EL-A015030 (PI 615525, blue arrow). Both SF 'B' (EL-A1200026) and SF 'A' (EL-A029686), shaded in light blue, derive from populations related to the use of the self fertility allele donated by C869 (PI 628755).
In the Eastern US-enriched subtree to the right of the "green" node, there is a group of lines with breeding history that includes EL50/2 as a parent (PI 664912, highly tolerant to Cercospora leaf spot; blue arrow), e.g. lines EL-A1600024, EL-A027154, EL-A1600020, EL-A1402162, and EL-A1600025 (grey circle). Many others in this subtree seem to be products of the smooth-root (SR) phenotype introgression stemming from table beet back in the 1940's. Part of the goal was to return these SR types to adaption for Great Lakes growers, and SR98 (EL-A027149 = PI 655951, blue arrow) epitomizes that effort.
It is of interest that some lines from China are grouped together with breeding material from the USA (e.g. BETA 165, PI 518157, PI 518165; blue squares). These lines may be derivatives of accessions that were sourced from US materials, bred in China for a time, and then re-deposited in seed banks as new material.
Sugar beet accession SLC101 (PI 610268, blue arrow) is interesting as the donor of the monogermy trait that is used in every sugar beet hybrid in the world and, being self-fertile, contains low genetic diversity 6 . Another self-compatible sugar beet line is EL57 (PI 663212, blue arrow) which is a multigerm 7 . The two lines are quite different, but united around the self-fertility trait which may be the reason why they have been grouped together in our analysis.
The most distant sugar beet accession on the sugar beet phylogenetic tree is EL-A024961 (placement next to the root species, green arrow). This breeding line was derived from a screen of ca. 50 PI accessions of unadapted material for germination in 100 mM NaCl, a few germinated and were inter-crossed, planted, and selected in the field, and inter-crossed once more. The ones contributing to the EL-A024961 population were sugar beet accessions PI 232889, PI 355963, PI 518160, and PI 266100 (green arrows). Apparently, the addition and/or recombination of these disparate accessions caused a decline in similarity, resulting in the observed phylogenetic placement.
USDA germplasm enhancement efforts since the 1980's, including the East Lansing program, has increasingly focused on pollen parents adapted for each of the major beet growing areas in the US with an emphasis to broaden the genetic base of sugar beet 8 . Provenance can be ascribed to these locations, even with very low marker density 6 , and revealed instances where shared germplasm is still recognizable in their genetic signatures (e.g. substantially derived accessions). Recent East Lansing pollen parents are generally derived through population improvement approaches, which results in slower fixation of favorable alleles than does strict inbreeding (e.g. higher residual heterozygosity). All of the East Lansing germplasm tested here is targeted to pollen parents, with primary selections for relative fecundity and field performance in test plots in Michigan, the Great Lakes growing region of the US.
Operationally, seed was harvested from multi-line, inter-pollinated mother roots of two to 20 more-or-less diverse accessions with 5-40 individuals per accession, and the seed harvested was kept segregated by seed parent accession (these were often designated as 'mixing lines').
The bulk of East Lansing breeding material appears highly heterozygous 9 . Thus, genetic distance of East Lansing breeding populations is likely obfuscated by the breeding process as well as ascertainment uncertainties perhaps to a greater extent than wild accessions tested.
Still, some known relationships from donor germplasm is consistent with their placement on the phylogenetic trees (e.g. four hitherto unsuspected related accessions from Utah, PI 66394x, and three from a narrow California base selected for Fusarium tolerance, EL-A120003x). It is interesting that each of these examples resolve to different higher-order subtrees. We assert that uniqueness of pollen parents is facilitated in large part by selection on generally oligo-genic traits originally derived from one or a few donor accessions, and genetic distance reflects the combination of various traits, wholly or partially, with respect to the accumulated genetic constitution of such traits. In this respect, tested accessions here have various targeted breeding goals such as Cercospora leaf spot resistance and Smooth Root and can be interpreted to occupy more closely related nodes in a manner consistent with their derivation. This information is useful to breeders, and finer discrimination could prove very useful for trait-gene discovery. Phylogenetic trees of a set of 255 beet accessions calculated from distances based on genotyping data and based on k-mer comparisons (Mash), respectively, were compared using quartet distances. "Random Topology" refers to a tree with arbitrarily arranged nodes. The Fitch algorithm provided the most reliable results, in particular when using both the "Jumble" option and "Global rearrangement" (GR), at the expense of an increase in computing time (single-core).   ITA  ITA  PRT  EGY  EGY  EGY  EGY  GRC  ITA  GRC  GRC  GRC  GRC  GRC  GRC  GRC  GRC  HRV  MKD  MKD  PAK  GRC  ITA  TUR  IRN  IRN  IRN  TUR   TUR  TUR  TUR TUR  TUR  IND         Supplementary Figure 2. Phylogenetic tree of sugar beet and sea beet accessions based on Mash distances. The 457 randomly sorted input accessions were divided in two subsets one of which is shown in this figure, the other one in Fig. 3. For comparability, 19 randomly selected accessions as well as the two outgroups spinach (Spi olera) and Patellifolia procumbens (Pat pro) are present in both trees.  IT  A  IT  A  IT  A  IT  A  IT  A  IT  A  IT  A  IT  A  F R A  IT A  IT A  IT A  IT A  IT A  IT A  IT A  IT A  IT A  IT A  IT A  FR A  IT A  IT A  IT A  ES P  ITA  ITA  IND  PRT  EGY  EGY  GRC  GRC  GRC  MKD  MKD  PAK  GRC  ITA  IND  TUR  IRN  IRN  TU D  HU N  UK R  TU R  TUR  DEU  NA  DEU  DEU  USA  HUN   HUN  HUN  DEU  POL  TUR  POL  LVA  SU N      B E T A 1 6 5 P I 3 5 5 9 6 3 P I 5 0 2 2 9 3 P I 1 7 5 5 9 4 P I 2 3 2 8 8 9 B E T A 1 3 2 0 P I 6 7 4 1 0 3 P I 6 6 4 9 1 6 P I 6 6 4 9 1 4 P I 6 3 1 3 5 4 P I 5 9 0 8 3 8 P I 6 1 0 2 6 5 P I 6 3 3 9 5 0 P I 6 3 3 9 4 9 P I 2 3 2 8 9 3 P I 4 6 IT  A  IT  A  IT  A  IT  A  IT  A  IT  A  IT  A  IT  A  F R A  IT A  IT A  IT A  IT A  IT A  IT A  IT A  IT A  IT A  IT A  IT A  FR A  IT A  IT A  IT A  ES P  ITA  ITA  IND  PRT  EGY  EGY  GRC  GRC  GRC  MKD  MKD  PAK  GRC  ITA  IND  TUR  IRN  IRN  TU D  HU N  UK R  TU R  TUR  DEU  NA  DEU  DEU  USA  HUN  HUN  HUN  DEU  POL  TUR  POL  LVA  SU N      P I 2 6 9 8 7 5 P I 6 0 4 5 2 1 P I 6 6 3 8 7 8 B E T A 1 6 5 P I 3 5 5 9 6 3 P I 5 0 2 2 9 3 P I 1 7 5 5 9 4 P I 2 3 2 8 8 9 B E T A 1 3 2 0 P I 6 7 4 1 0 3 P I 6 6 4 9 1 6 P I 6 6 4 9 1 4 P I 6 3 1 3 5 4 P I 5 9 0 8 3 8 P I 6 1 0 2 6 5 P I 6 3 3 9 5 0 P I 6 3 3 9 4 9 P I 2 3 2 8 9 3 P I 4 6