The genome of the recently domesticated crop plant sugar beet (Beta vulgaris)

Journal name:
Nature
Volume:
505,
Pages:
546–549
Date published:
DOI:
doi:10.1038/nature12817
Received
Accepted
Published online

Sugar beet (Beta vulgaris ssp. vulgaris) is an important crop of temperate climates which provides nearly 30% of the world’s annual sugar production and is a source for bioethanol and animal feed. The species belongs to the order of Caryophylalles, is diploid with 2n = 18 chromosomes, has an estimated genome size of 714–758megabases1 and shares an ancient genome triplication with other eudicot plants2. Leafy beets have been cultivated since Roman times, but sugar beet is one of the most recently domesticated crops. It arose in the late eighteenth century when lines accumulating sugar in the storage root were selected from crosses made with chard and fodder beet3. Here we present a reference genome sequence for sugar beet as the first non-rosid, non-asterid eudicot genome, advancing comparative genomics and phylogenetic reconstructions. The genome sequence comprises 567megabases, of which 85% could be assigned to chromosomes. The assembly covers a large proportion of the repetitive sequence content that was estimated4 to be 63%. We predicted 27,421 protein-coding genes supported by transcript data and annotated them on the basis of sequence homology. Phylogenetic analyses provided evidence for the separation of Caryophyllales before the split of asterids and rosids, and revealed lineage-specific gene family expansions and losses. We sequenced spinach (Spinacia oleracea), another Caryophyllales species, and validated features that separate this clade from rosids and asterids. Intraspecific genomic variation was analysed based on the genome sequences of sea beet (Beta vulgaris ssp. maritima; progenitor of all beet crops) and four additional sugar beet accessions. We identified seven million variant positions in the reference genome, and also large regions of low variability, indicating artificial selection. The sugar beet genome sequence enables the identification of genes affecting agronomically relevant traits, supports molecular breeding and maximizes the plant’s potential in energy biotechnology.

At a glance

Figures

  1. Genomic features of RefBeet chromosome 1.
    Figure 1: Genomic features of RefBeet chromosome 1.

    For chromosomes 2–9 see Extended Data Figs 1 and 2. a, Positions of genetic markers in the genetic map2 and the RefBeet assembly. b, Distribution of coding sequence (CDS) and repetitive sequence of the Gypsy type (LTR retrotransposons), the SINE type (non-LTR retrotransposons), the En/Spm type (DNA transposons), and three classes of satellite DNA (intercalary, centromeric, subtelomeric). c, Distribution of mapped small RNAs of 21 and 24 nucleotides (nt). The large peak of 21nt reads (about 327,000 reads mapped) corresponds to the highly expressed microRNA MIR166. d, Distribution of genomic variants in four sugar beet accessions and sea beet (DeKBm) compared to RefBeet. Shared and individual low-variation regions per accession are visible (for example, region 30–31Mb is shared among the sugar beet accessions KDHBv, UMSBv, YTiBv, YMoBv).

  2. Fluorescent in situ hybridization (FISH) analyses of Beta vulgaris chromosomes at early metaphase.
    Figure 2: Fluorescent in situ hybridization (FISH) analyses of Beta vulgaris chromosomes at early metaphase.

    a, Chromosomes were stained with 4′,6-diamidino-2-phenylindole (DAPI, blue); large blocks of heterochromatin are visible (arrows). b, In situ hybridization using the major satellites pBV (centromeric, red), pEV (intercalary, green) and pAV (subtelomeric, orange). c, Overlayed images of a and b show the coverage of chromosomes by satellite DNA. Scale bar, 10μm.

  3. Phylogenetic relationship of 10 sequenced plant species and comparative gene analysis.
    Figure 3: Phylogenetic relationship of 10 sequenced plant species and comparative gene analysis.

    Species tree based on maximum-likelihood analysis of a concatenated alignment of 110 widespread single-copy protein sequences (left). The upper bar per species (right with scale on the top) indicates the number of widespread genes that are found in at least 9 of the 10 species (green); eudicot-specific genes that are found in at least 7 of the 8 eudicot species (yellow); species-specific genes with no homologues in other species of this tree (light grey); and remaining genes (brown). The slim bars per species (scale on the bottom) represent the percentage of genes with at least one paralogue (blue) and the percentage of sugar beet genes that have homologues in a given species (pink), respectively.

  4. Genomic features of RefBeet chromosomes 2-5.
    Extended Data Fig. 1: Genomic features of RefBeet chromosomes 2–5.

    Section 1 shows the positions of genetic markes in the genetic map2 and RefBeet. Section 2 shows the distribution (stacked area graphs) of predicted coding sequence (CDS) and repetitive sequence of the Gypsy type (LTR retrotransposon), the SINE type (non-LTR retrotransposon), the En/Spm type (DNA transposon), and three classes of satellite DNA (intercalary, centromeric, subtelomeric). The number of bases per feature is displayed in windows of 500kb (shifted by 300kb). Section 3 shows the distribution (stacked area graphs) of mapped small RNAs of length 21 and 24nt in adjacent bins of 500kb. For reads mapping at multiple locations, one random location was selected. Reads matching within predicted rRNA loci were ignored. Positions with more than 10 thousand mapped 21nt sequences were labelled with the corresponding non-coding RNA prediction, if available, including the number of matching reads. Section 4 shows the chromosome-wide distribution of genomic variants in four sugar beet accessions and sea beet compared to RefBeet. Substitutions and deletions were detected by read-mapping with up to three variants per 100nt read in 50kb windows shifted by 25kb. Shared and individual low-variation regions per accession are visible.

  5. Genomic features of RefBeet chromosomes 6-9.
    Extended Data Fig. 2: Genomic features of RefBeet chromosomes 6–9.

    For details see Extended Data Fig. 1.

  6. K-mer distribution and read coverage.
    Extended Data Fig. 3: K-mer distribution and read coverage.

    a, Number of 17mers at different coverages. b, c, Correlation of read coverage and GC content of reads generated from a PCR-amplified library (b) and a PCR-free library (c). Read data sets in b and c were aligned against RefBeet. The GC content and the amount of aligned bases were computed in sliding windows of 500 bases shifted by 100 bases. To reduce the amount of data points only chromosome 1 scaffolds 1 (Bvchr1.sca001, 8Mb) was plotted.

  7. Genetic vs physical distances.
    Extended Data Fig. 4: Genetic vs physical distances.

    a, Genetic and physical positions of 983 genetic markers in the genetic map of sugar beet2 and the RefBeet assembly, respectively. The expected physical distance in sugar beet had been reported as 855kb per 1cM, with deviations of up to 50-fold2. In RefBeet only 5% of marker pairs showed the expected physical distance (855kb±20%) suggesting strict partitioning of the genome into regions favouring or disfavouring recombination events.

  8. Annotation of repeats and non-coding RNA genes.
    Extended Data Fig. 5: Annotation of repeats and non-coding RNA genes.

    a, Repeat content of the sugar beet genome assembly. A total of 252Mb (42.3%) of the genome assembly consist of repetitive DNA with retrotransposons as the most abundant repeat fraction. All major superfamilies of DNA transposons were represented, showing a dispersed or slightly centromere-enriched distribution along the chromosomes. Microsatellites and minisatellites were well represented owing to flanking heterogeneous sequences, which allowed their assembly. The remaining repetitive sequences (‘Unknown’) in 459 families represent potentially new repeats, most likely rearranged or truncated retrotransposons. b, Summary of non-coding RNA gene annotations. For different classes (miRNA, microRNA; rRNA, ribosomal RNA; snRNA, spliceosomal RNA; snoRNA, small nucleolar RNA; tRNA, transfer RNA) the number of predictions, the number and percentage of predictions with overlapping small RNA reads, and the number of families/subtypes are listed. c, Proportion of annotated tRNAs by amino acid for the five species studied (At, Arabidopsis thaliana; Pt, Populus trichocarpa; Bv, Beta vulgaris; Vv, Vitis vinifera; Zm, Zea mays). d, Absolute numbers of annotated tRNAs by amino acid. Except for pseudogenes the proportion of tRNAs is relatively constant among all species (species names as in c). e, Number of annotated tRNAs by amino acid and species (as predicted by tRNAscan-SE). The total is computed without the last two rows containing pseudogenes and presumably defunct tRNAs with undetermined anti-codon. f, Number and size of Beta vulgaris gene clusters of at least 10 members representing expanded gene families. A total of 1,274 genes are contained in 97 clusters.

  9. Analysis of paralogous and orthologous genes.
    Extended Data Fig. 6: Analysis of paralogous and orthologous genes.

    a, Number and percentage of detected orthologues between Beta vulgaris and nine other plant species. Orthology relationships with 10 or more proteins for any of the species were discarded in order to avoid biases introduced by species-specific gene family expansions. In total, 18,927 sugar beet genes had orthologues in at least one of nine plants, and 16,062 paralogous sugar beet genes appeared in 14,852 trees. b, Number of duplication events detected in gene trees grouped into three age classes. The duplication ratio was calculated as the number of age class-specific duplication events divided by the total number of trees containing duplication events. c, d, Collinear blocks of protein coding genes in Beta vulgaris as dotplot (c) or circular plot (d). Each dot or connecting line represents one gene pair, respectively. Shown are 34 collinear blocks containing 7–35 gene pairs. A triplicated region is visible on chromosomes 1, 3 and 5 (arrows). e. Histogram of Ks values for Beta vulgaris protein coding gene pairs in collinear blocks. Ks values mainly scatter between 1.2 and 1.8 and show peaks at 1.2 and 1.7.

  10. Phylogenetic trees.
    Extended Data Fig. 7: Phylogenetic trees.

    a, Phylogenetic tree of 44 sucrose transporter protein sequences in higher plants including Beta vulgaris. The reliability for internal branches is indicated in red ranging from 0 = unreliable to 1 = highly reliable (aLRT statistics). At, Arabidopsis thaliana; Bv, Beta vulgaris; Dc, Daucus carota; Hb, Hevea brasiliensis; Hv, Hordeum vulgare; Le, Lycopersicum esculentum renamed Solanum lycopersicum; Lj, Lotus japonicus; Lp, Lolium perenne; Nt, Nicotiana tabacum; Os, Oryza sativa; Ps, Pisum sativum; Sh, Saccharum Hybrid Cultivar Q117; St, Solanum tuberosum; Sb, Sorghum bicolor; Ta, Triticum aestivum; Zm, Zea mays. b, Intraspecific relationship of five Beta vulgaris accessions based on the alignment of 2,112 shared genes.

  11. Intraspecific variation.
    Extended Data Fig. 8: Intraspecific variation.

    a, Number of variants inferred from read mapping (black) and sequence identity of matching scaffolds (blue) along RefBeet chromosome 7. The variation profiles of five different accessions including the reference accession (RefBv) are shown. Regions with a high number of read-mapping variants showed a higher density of scaffolds of low sequence identity. However, low-identity scaffolds were also present in low-variation regions of mapped reads. b, Detailed view of the distribution of read mapping variants and read coverage (green) in Beta vulgaris accession KDHBv compared to RefBeet. The secondary y axis on the right side indicates the percentage of positions per window covered in the alignment. Low-variation regions were generally well covered. c, Fraction of variation deserts of different lengths along RefBeet based on read-mapping of genomic data sets and alignment of assembled scaffolds. The six different genotypes include the reference, four other Beta vulgaris accessions, and one Beta maritima accession. Variation deserts were found in all chromosomes. The variation deserts of non-reference sugar beet accessions contained 49% (179Mb in KDH) to 58% (217Mb in YMo) of all covered RefBeet positions. d, Intersection of variant deserts. Starting from RefBv, the size of shared variant deserts decreased by including additional Beta vulgaris accessions. e, Sequence conservation comparison of three groups of genes. Genes with GO term enrichment localized within variation deserts shown in red; genes without GO term enrichment localized within variation deserts shown in blue, genes localized outside of variation deserts shown in green. For each of the three groups 17 randomly selected genes with confirmed exon–intron structure were aligned to 24 additional sugar beet accessions. The sequence conservation was determined from the identity of the sequence alignment. Genes with GO term enrichment localized within variation deserts had the highest fraction of high identity gene alignments, followed by genes without GO term enrichment localized within variation deserts. f, Length distribution of insertions and deletions in coding sequences. Apart from one-base indels, indels of length three or multiples of three (3n) were overrepresented. Of all genes affected by indels, 49.1% had a single 3n indel and 5.0% had more than one indel (any length) with bases summing up to 3n.

Tables

  1. Sequencing data, assembly results and plant species
    Extended Data Table 1: Sequencing data, assembly results and plant species

Accession codes

Referenced accessions

BioProject

Sequence Read Archive

References

  1. Arumuganathan, K. & Earle, E. D. Nuclear DNA content of some important plant species. Plant Mol. Biol. Rep. 9, 208218 (1991)
  2. Dohm, J. C. et al. Palaeohexaploid ancestry for Caryophyllales inferred from extensive gene-based physical and genetic mapping of the sugar beet genome (Beta vulgaris). Plant J. 70, 528540 (2012)
  3. Fischer, H. E. Origin of the ‘Weisse Schlesische Rübe’ (white Silesian beet) and resynthesis of sugar beet. Euphytica 41, 7580 (1989)
  4. Flavell, R. B., Bennett, M. D., Smith, J. B. & Smith, D. B. Genome size and the proportion of repeated nucleotide sequence DNA in plants. Biochem. Genet. 12, 257269 (1974)
  5. Biancardi, E., McGrath, J. M., Panella, L., Lewellen, R. & Stevanato, P. in Root Tuber Crops Vol. 7 (ed. Bradshaw, J. E.) 173219 (Springer, 2010)
  6. Stevens, P. Angiosperm Phylogeny Website (2012) http://www.mobot.org/MOBOT/research/APweb/
  7. Paesold, S., Borchardt, D., Schmidt, T. & Dechyeva, D. A sugar beet (Beta vulgaris L.) reference FISH karyotype for chromosome and chromosome-arm identification, integration of genetic linkage groups and analysis of major repeat family distribution. Plant J. 72, 600611 (2012)
  8. Dohm, J. C., Lange, C., Reinhardt, R. & Himmelbauer, H. Haplotype divergence in Beta vulgaris and microsynteny with sequenced plant genomes. Plant J. 57, 1426 (2009)
  9. Huerta-Cepas, J., Dopazo, H., Dopazo, J. & Gabaldón, T. The human phylome. Genome Biol. 8, R109 (2007)
  10. The Angiosperm Phylogeny Group An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants: APG II. Bot. J. Linn. Soc. 141, 399436 (2003)
  11. Moore, M. J., Soltis, P. S., Bell, C. D., Burleigh, J. G. & Soltis, D. E. Phylogenetic analysis of 83 plastid genes further resolves the early diversification of eudicots. Proc. Natl Acad. Sci. USA (2010)
  12. Hunger, S. et al. Isolation and linkage analysis of expressed disease-resistance gene analogues of sugar beet (Beta vulgaris L.). Genome 46, 7082 (2003)
  13. Tian, Y., Fan, L., Thurau, T., Jung, C. & Cai, D. The absence of TIR-type resistance gene analogues in the sugar beet (Beta vulgaris L.) genome. J. Mol. Evol. 58, 4053 (2004)
  14. Schneider, K. et al. Analysis of DNA polymorphisms in sugar beet (Beta vulgaris L.) and development of an SNP-based map of expressed genes. Theor. Appl. Genet. 115, 601615 (2007)
  15. Biancardi, E., Panella, L. W. & Lewellen, R. T. Beta maritima: The Origin of Beets (Springer, 2012)
  16. Pin, P. A. et al. The role of a pseudo-response regulator gene in life cycle adaptation and domestication of beet. Curr. Biol. 22, 10951101 (2012)
  17. Schnable, P. S. & Springer, N. M. Progress toward understanding heterosis in crop plants. Annu. Rev. Plant Biol. 64, 7188 (2013)
  18. Hohmann, U. et al. A bacterial artificial chromosome (BAC) library of sugar beet and a physical map of the region encompassing the bolting gene B. Mol. Genet. Genomics 269, 126136 (2003)
  19. Lange, C., Holtgräwe, D., Schulz, B., Weisshaar, B. & Himmelbauer, H. Construction and characterization of a sugar beet (Beta vulgaris) fosmid library. Genome 51, 948951 (2008)
  20. Luo, R. et al. SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience 1, 18 (2012)
  21. Boetzer, M., Henkel, C. V., Jansen, H. J., Butler, D. & Pirovano, W. Scaffolding pre-assembled contigs using SSPACE. Bioinformatics 27, 578579 (2011)
  22. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 17541760 (2009)
  23. Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 20782079 (2009)
  24. Bao, Z. & Eddy, S. R. Automated de novo identification of repeat sequence families in sequenced genomes. Genome Res. 12, 12691276 (2002)
  25. Price, A. L., Jones, N. C. & Pevzner, P. A. De novo identification of repeat families in large genomes. Bioinformatics 21, i351i358 (2005)
  26. Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573580 (1999)
  27. Stanke, M., Diekhans, M., Baertsch, R. & Haussler, D. Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics 24, 637644 (2008)
  28. Lowe, T. M. & Eddy, S. R. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 25, 955964 (1997)
  29. Lagesen, K. et al. RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Res. 35, 31003108 (2007)
  30. Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403410 (1990)
  31. Burge, S. W. et al. Rfam 11.0: 10 years of RNA families. Nucleic Acids Res. 41, D226D232 (2013)
  32. Brown, J. W. S. et al. Plant snoRNA database. Nucleic Acids Res. 31, 432435 (2003)
  33. Benson, D. A., Karsch-Mizrachi, I., Lipman, D. J., Ostell, J. & Wheeler, D. L. GenBank: update. Nucleic Acids Res. 32, D23D26 (2004)
  34. Wang, B.-B. & Brendel, V. The ASRG database: identification and survey of Arabidopsis thaliana genes involved in pre-mRNA splicing. Genome Biol. 5, R102 (2004)
  35. Wehe, A., Bansal, M. S., Burleigh, J. G. & Eulenstein, O. DupTree: a program for large-scale phylogenetic analyses using gene tree parsimony. Bioinformatics 24, 15401541 (2008)
  36. Guindon, S. et al. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst. Biol. 59, 307321 (2010)
  37. Wang, Y. et al. MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res. 40, e49 (2012)
  38. Nei, M. & Gojobori, T. Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol. Biol. Evol. 3, 418426 (1986)
  39. Zdobnov, E. M. & Apweiler, R. InterProScan–an integration platform for the signature-recognition methods in InterPro. Bioinformatics 17, 847848 (2001)
  40. UniProt Consortium. Reorganizing the protein space at the Universal Protein Resource (UniProt). Nucleic Acids Res. 40, D71D75 (2012)
  41. Kanehisa, M., Goto, S., Sato, Y., Furumichi, M. & Tanabe, M. KEGG for integration and interpretation of large-scale molecular data sets. Nucleic Acids Res. 40, D109D114 (2012)
  42. Tatusov, R. L. et al. The COG database: an updated version includes eukaryotes. BMC Bioinformatics 4, 41 (2003)
  43. Kim, J. et al. A genome-wide comparison of NB-LRR type of resistance gene analogs (RGA) in the plant kingdom. Mol. Cells 33, 385392 (2012)
  44. Hunter, S. et al. InterPro in 2011: new developments in the family and domain prediction database. Nucleic Acids Res. 40, D306D312 (2012)
  45. Young, N. D. et al. The Medicago genome provides insight into the evolution of rhizobial symbioses. Nature 480, 520524 (2011)
  46. Kühn, C. & Grof, C. P. L. Sucrose transporters of higher plants. Curr. Opin. Plant Biol. 13, 287297 (2010)

Download references

Author information

  1. These authors contributed equally to this work.

    • Juliane C. Dohm &
    • André E. Minoche

Affiliations

  1. Max Planck Institute for Molecular Genetics, Ihnestraße 63-73, 14195 Berlin, Germany

    • Juliane C. Dohm,
    • André E. Minoche,
    • Hans Lehrach &
    • Heinz Himmelbauer
  2. Centre for Genomic Regulation (CRG), C. Dr. Aiguader 88, 08003 Barcelona, Spain

    • Juliane C. Dohm,
    • André E. Minoche,
    • Salvador Capella-Gutiérrez,
    • Toni Gabaldón &
    • Heinz Himmelbauer
  3. Universitat Pompeu Fabra (UPF), C. Dr. Aiguader 88, 08003 Barcelona, Spain

    • Juliane C. Dohm,
    • André E. Minoche,
    • Salvador Capella-Gutiérrez,
    • Toni Gabaldón &
    • Heinz Himmelbauer
  4. Bielefeld University, CeBiTec and Department of Biology, Universitätsstraße 25, 33615 Bielefeld, Germany

    • Daniela Holtgräwe,
    • Oliver Rupp,
    • Thomas Rosleff Sörensen,
    • Ralf Stracke,
    • Alexander Goesmann &
    • Bernd Weisshaar
  5. TU Dresden, Department of Biology, Zellescher Weg 20b, 01217 Dresden, Germany

    • Falk Zakrzewski &
    • Thomas Schmidt
  6. University of Leipzig, Department of Computer Science, Härtelstraße 16-18, 04107 Leipzig, Germany

    • Hakim Tafer &
    • Peter F. Stadler
  7. Max Planck Genome Centre Cologne, Carl-von-Linné-Weg 10, 50829 Köln, Germany

    • Richard Reinhardt
  8. Syngenta, Box 302, 26123 Landskrona, Sweden

    • Thomas Kraft
  9. KWS SAAT AG, Grimsehlstraße 31, 37574 Einbeck, Germany

    • Britta Schulz
  10. Institució Catalana de Recerca i Estudis Avançats (ICREA), Pg. Lluís Companys 23, 08010 Barcelona, Spain

    • Toni Gabaldón

Contributions

H.H., B.W. and J.C.D. conceived the study, H.H., D.H., T.R.S., R.R. and B.W. prepared sequencing data, J.C.D., A.E.M., D.H., S.C.-G., F.Z., H.T., O.R., R.S., A.G., B.S., T.K., P.F.S., T.S. and T.G. designed experiments and analysed the data, H.L. participated in project design, J.C.D., H.H. and A.E.M. wrote the paper with input from all other authors. All authors have read and have approved the manuscript.

Competing financial interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to:

Sequencing raw data (genomic and transcript sequences) have been submitted to the SRA archive with the study accession number SRP023136. The NCBI Bioproject accession is PRJNA41497. The whole-genome shotgun assemblies have been deposited at DDBJ/EMBL/GenBank under the accessions AYZS00000000AYZY00000000. The GenBank accession numbers KG026656KG039419 were assigned to BAC end sequences and JY274675JY473858 to fosmid end sequences generated in this study. Plant material for Beta vulgaris genotype KWS2320 and Beta maritima 9W_2101 (DeKBm) are available as seeds by signing a material transfer agreement (MTA). A sugar beet website including a genome browser has been set up at http://bvseq.molgen.mpg.de, providing access to assemblies, annotations, gene models and variation data. The sugar beet phylome can be accessed at http://phylomeDB.org.

Author details

Extended data figures and tables

Extended Data Figures

  1. Extended Data Figure 1: Genomic features of RefBeet chromosomes 2–5. (658 KB)

    Section 1 shows the positions of genetic markes in the genetic map2 and RefBeet. Section 2 shows the distribution (stacked area graphs) of predicted coding sequence (CDS) and repetitive sequence of the Gypsy type (LTR retrotransposon), the SINE type (non-LTR retrotransposon), the En/Spm type (DNA transposon), and three classes of satellite DNA (intercalary, centromeric, subtelomeric). The number of bases per feature is displayed in windows of 500kb (shifted by 300kb). Section 3 shows the distribution (stacked area graphs) of mapped small RNAs of length 21 and 24nt in adjacent bins of 500kb. For reads mapping at multiple locations, one random location was selected. Reads matching within predicted rRNA loci were ignored. Positions with more than 10 thousand mapped 21nt sequences were labelled with the corresponding non-coding RNA prediction, if available, including the number of matching reads. Section 4 shows the chromosome-wide distribution of genomic variants in four sugar beet accessions and sea beet compared to RefBeet. Substitutions and deletions were detected by read-mapping with up to three variants per 100nt read in 50kb windows shifted by 25kb. Shared and individual low-variation regions per accession are visible.

  2. Extended Data Figure 2: Genomic features of RefBeet chromosomes 6–9. (664 KB)

    For details see Extended Data Fig. 1.

  3. Extended Data Figure 3: K-mer distribution and read coverage. (93 KB)

    a, Number of 17mers at different coverages. b, c, Correlation of read coverage and GC content of reads generated from a PCR-amplified library (b) and a PCR-free library (c). Read data sets in b and c were aligned against RefBeet. The GC content and the amount of aligned bases were computed in sliding windows of 500 bases shifted by 100 bases. To reduce the amount of data points only chromosome 1 scaffolds 1 (Bvchr1.sca001, 8Mb) was plotted.

  4. Extended Data Figure 4: Genetic vs physical distances. (223 KB)

    a, Genetic and physical positions of 983 genetic markers in the genetic map of sugar beet2 and the RefBeet assembly, respectively. The expected physical distance in sugar beet had been reported as 855kb per 1cM, with deviations of up to 50-fold2. In RefBeet only 5% of marker pairs showed the expected physical distance (855kb±20%) suggesting strict partitioning of the genome into regions favouring or disfavouring recombination events.

  5. Extended Data Figure 5: Annotation of repeats and non-coding RNA genes. (279 KB)

    a, Repeat content of the sugar beet genome assembly. A total of 252Mb (42.3%) of the genome assembly consist of repetitive DNA with retrotransposons as the most abundant repeat fraction. All major superfamilies of DNA transposons were represented, showing a dispersed or slightly centromere-enriched distribution along the chromosomes. Microsatellites and minisatellites were well represented owing to flanking heterogeneous sequences, which allowed their assembly. The remaining repetitive sequences (‘Unknown’) in 459 families represent potentially new repeats, most likely rearranged or truncated retrotransposons. b, Summary of non-coding RNA gene annotations. For different classes (miRNA, microRNA; rRNA, ribosomal RNA; snRNA, spliceosomal RNA; snoRNA, small nucleolar RNA; tRNA, transfer RNA) the number of predictions, the number and percentage of predictions with overlapping small RNA reads, and the number of families/subtypes are listed. c, Proportion of annotated tRNAs by amino acid for the five species studied (At, Arabidopsis thaliana; Pt, Populus trichocarpa; Bv, Beta vulgaris; Vv, Vitis vinifera; Zm, Zea mays). d, Absolute numbers of annotated tRNAs by amino acid. Except for pseudogenes the proportion of tRNAs is relatively constant among all species (species names as in c). e, Number of annotated tRNAs by amino acid and species (as predicted by tRNAscan-SE). The total is computed without the last two rows containing pseudogenes and presumably defunct tRNAs with undetermined anti-codon. f, Number and size of Beta vulgaris gene clusters of at least 10 members representing expanded gene families. A total of 1,274 genes are contained in 97 clusters.

  6. Extended Data Figure 6: Analysis of paralogous and orthologous genes. (215 KB)

    a, Number and percentage of detected orthologues between Beta vulgaris and nine other plant species. Orthology relationships with 10 or more proteins for any of the species were discarded in order to avoid biases introduced by species-specific gene family expansions. In total, 18,927 sugar beet genes had orthologues in at least one of nine plants, and 16,062 paralogous sugar beet genes appeared in 14,852 trees. b, Number of duplication events detected in gene trees grouped into three age classes. The duplication ratio was calculated as the number of age class-specific duplication events divided by the total number of trees containing duplication events. c, d, Collinear blocks of protein coding genes in Beta vulgaris as dotplot (c) or circular plot (d). Each dot or connecting line represents one gene pair, respectively. Shown are 34 collinear blocks containing 7–35 gene pairs. A triplicated region is visible on chromosomes 1, 3 and 5 (arrows). e. Histogram of Ks values for Beta vulgaris protein coding gene pairs in collinear blocks. Ks values mainly scatter between 1.2 and 1.8 and show peaks at 1.2 and 1.7.

  7. Extended Data Figure 7: Phylogenetic trees. (289 KB)

    a, Phylogenetic tree of 44 sucrose transporter protein sequences in higher plants including Beta vulgaris. The reliability for internal branches is indicated in red ranging from 0 = unreliable to 1 = highly reliable (aLRT statistics). At, Arabidopsis thaliana; Bv, Beta vulgaris; Dc, Daucus carota; Hb, Hevea brasiliensis; Hv, Hordeum vulgare; Le, Lycopersicum esculentum renamed Solanum lycopersicum; Lj, Lotus japonicus; Lp, Lolium perenne; Nt, Nicotiana tabacum; Os, Oryza sativa; Ps, Pisum sativum; Sh, Saccharum Hybrid Cultivar Q117; St, Solanum tuberosum; Sb, Sorghum bicolor; Ta, Triticum aestivum; Zm, Zea mays. b, Intraspecific relationship of five Beta vulgaris accessions based on the alignment of 2,112 shared genes.

  8. Extended Data Figure 8: Intraspecific variation. (359 KB)

    a, Number of variants inferred from read mapping (black) and sequence identity of matching scaffolds (blue) along RefBeet chromosome 7. The variation profiles of five different accessions including the reference accession (RefBv) are shown. Regions with a high number of read-mapping variants showed a higher density of scaffolds of low sequence identity. However, low-identity scaffolds were also present in low-variation regions of mapped reads. b, Detailed view of the distribution of read mapping variants and read coverage (green) in Beta vulgaris accession KDHBv compared to RefBeet. The secondary y axis on the right side indicates the percentage of positions per window covered in the alignment. Low-variation regions were generally well covered. c, Fraction of variation deserts of different lengths along RefBeet based on read-mapping of genomic data sets and alignment of assembled scaffolds. The six different genotypes include the reference, four other Beta vulgaris accessions, and one Beta maritima accession. Variation deserts were found in all chromosomes. The variation deserts of non-reference sugar beet accessions contained 49% (179Mb in KDH) to 58% (217Mb in YMo) of all covered RefBeet positions. d, Intersection of variant deserts. Starting from RefBv, the size of shared variant deserts decreased by including additional Beta vulgaris accessions. e, Sequence conservation comparison of three groups of genes. Genes with GO term enrichment localized within variation deserts shown in red; genes without GO term enrichment localized within variation deserts shown in blue, genes localized outside of variation deserts shown in green. For each of the three groups 17 randomly selected genes with confirmed exon–intron structure were aligned to 24 additional sugar beet accessions. The sequence conservation was determined from the identity of the sequence alignment. Genes with GO term enrichment localized within variation deserts had the highest fraction of high identity gene alignments, followed by genes without GO term enrichment localized within variation deserts. f, Length distribution of insertions and deletions in coding sequences. Apart from one-base indels, indels of length three or multiples of three (3n) were overrepresented. Of all genes affected by indels, 49.1% had a single 3n indel and 5.0% had more than one indel (any length) with bases summing up to 3n.

Extended Data Tables

  1. Extended Data Table 1: Sequencing data, assembly results and plant species (247 KB)

Supplementary information

PDF files

  1. Supplementary Information (1.9 MB)

    This file contains Supplementary Methods, Supplementary Tables 1-16, Supplementary Notes and additional references.

Excel files

  1. Supplementary Data 1 (110 KB)

    Repeat families as detected by RepeatModeler along with the combined automatic and manual classification.

  2. Supplementary Data 2 (8.4 MB)

    Predicted RefBeet genes and their functional annotation based on database searches and transfer from orthologs.

  3. Supplementary Data 3 (69 KB)

    List of 715 putative resistance gene analogs (RGA). Beta vulgaris (Bv) genes were classified based on the presence of RGA domains (columns A+B). In 30 additional Bv genes (column D) these domains were missing in exon parts, but the genes showed sequence homology with known RGAs from other plants.

Additional data