A chromosome conformation capture ordered sequence of the barley genome

Journal name:
Nature
Volume:
544,
Pages:
427–433
Date published:
DOI:
doi:10.1038/nature22043
Received
Accepted
Published online

Abstract

Cereal grasses of the Triticeae tribe have been the major food source in temperate regions since the dawn of agriculture. Their large genomes are characterized by a high content of repetitive elements and large pericentromeric regions that are virtually devoid of meiotic recombination. Here we present a high-quality reference genome assembly for barley (Hordeum vulgare L.). We use chromosome conformation capture mapping to derive the linear order of sequences across the pericentromeric space and to investigate the spatial organization of chromatin in the nucleus at megabase resolution. The composition of genes and repetitive elements differs between distal and proximal regions. Gene family analyses reveal lineage-specific duplications of genes involved in the transport of nutrients to developing seeds and the mobilization of carbohydrates in grains. We demonstrate the importance of the barley reference sequence for breeding by inspecting the genomic partitioning of sequence variation in modern elite germplasm, highlighting regions vulnerable to genetic erosion.

At a glance

Figures

  1. Characteristics of genomic compartments in barley chromosomes.
    Figure 1: Characteristics of genomic compartments in barley chromosomes.

    a, The distribution of genomic features in 4 Mb windows is plotted along chromosome 1H. Analogous panels for the other chromosomes are found in Extended Data Fig. 5a. The left column in the legend refers to the background shading in the top panel; the right column indicates the colour code for lines in both panels. CDS, predicted coding sequences; cM, centimorgans. b, Enrichment of Gene Ontology (GO) terms in genomic compartments. Coloured rectangles indicate enrichment factors ranging from −2 (dark blue) to 2 (dark red). Numbers inside the rectangles indicate −log10-transformed P values.

  2. Chromosome conformation capture analysis.
    Figure 2: Chromosome conformation capture analysis.

    a, Distance-dependent decay of contact probability. b, Intrachromosomal contact matrix. The intensity of pixels represents the normalized count of Hi-C links between 1 Mb windows on chromosome 1H on a logarithmic scale. c, Schematic model of the Rabl configuration of interphase chromosomes. Centromeres and telomeres are presented by red and green circles, respectively. d, Leaf interphase nucleus of barley. Chromatin was stained blue with 4′,6-diamidino-2-phenylindole (DAPI). Fluorescence in situ hybridization was performed with probes specific for centromeres (red) and telomeres (green). Scale bar, 5 μm. e, Interchromosomal contact matrix. The intensity of pixels represents the normalized count of Hi-C links between 1 Mb windows on chromosomes 1H (x axis) and 2H (y axis) on a logarithmic scale. A principal component analysis of the normalized contact matrix at 1 Mb resolution of chromosome 1H was conducted. f, The first and second eigenvectors are plotted against each other. Each point represents a 1 Mb window. Closer proximity to the centromere is indicated by a darker colour. Windows from the short and long arms are coloured blue and red, respectively.

  3. The genomic context of repetitive elements.
    Figure 3: The genomic context of repetitive elements.

    a, Abundance of key genomic features, different transposon superfamilies and common Pfam domains across chromosome 1H. Analogous panels for the other chromosomes are found in Extended Data Fig. 5b. The colour scale of the heatmaps ranges from blue (0) to yellow (maximum across all chromosomes per track). Minimum and maximum values are indicated to the right of each track. MITEs, miniature inverted-repeat transposable elements; LINEs, long interspersed elements; fl, full-length; PR, protease; CH, chromodomain; RT, reverse transcriptase; NBS, NB-ARC; Pkin, protein kinase. b, Transposable elements up- and downstream of genes. Coding sequences of high-confidence genes were used as anchor points. Transposable element composition was determined 10 kb up- and downstream of each gene. The x axis indicates the position relative to the gene, while the y axis indicates how many genes had a transposable element of the respective superfamily at the respective position in their upstream/downstream region.

  4. Expansion of agronomically important gene families.
    Figure 4: Expansion of agronomically important gene families.

    a, OrthoMCL clustering of the barley high-confidence gene complement with B. distachyon, rice, sorghum and Arabidopsis thaliana genes. Numbers in the sections of the Venn diagram correspond to numbers of clusters (gene groups). The first number below the species name denotes the total number of proteins that were included into the OrthoMCL analysis for each species. The second number indicates the number of genes in clusters for a species. b, Phylogenetic tree of 68 full-length α-amylase protein sequences derived from amy genes identified in the genomes of barley, hexaploid wheat, B. distachyon, rice, sorghum and maize. Each wheat subgenome was considered separately to facilitate the comparison of gene copy numbers and duplication events across species. Note that for the amy4 subfamily, two to three genes per genome were identified in all genomes. These genes are located on distinct chromosomes and hence most probably did not originate from tandem gene duplications. While most species further contain only a single amy3 gene copy per genome, moderate copy number extension was observed in sorghum and rice where a potential tandem gene duplication resulted in two amy3 gene copies. Three genes of the amy2 subfamily were identified on chromosome 7H in barley and on chromosomes 7A, 7B, 7D in wheat. No similar copy number extension was observed in B. distachyon, Sorghum bicolor or Oryza sativa. In maize, two amy2 genes were identified. The amy1 subfamily shows the highest level of copy number extension. Tandem duplications are present in sorghum and rice. Two to three full-length genes were identified per genome in hexaploid wheat on group 6 chromosomes and five full-length amy1 genes on chromosome 6H and unanchored scaffolds in barley. Notably four of these barley genes share 99.8–100% sequence identity on protein and nucleotide level, indicating very recent duplication events. T. aestivum, Triticum aestivum; Z. mays, Zea mays. c, Expression of the SWEET11 gene subfamily in the developing barley grains. Left, expression profiles of SWEET11a and SWEET11b as determined by quantitative real-time PCR (qPCR) on total RNA isolated from micro-dissected developing grains. Right, localization of SWEET11a and SWEET11b expression in cross-section of immature seeds by RNA in situ hybridization. Hybridizations with sense probes are shown as negative controls in Extended Data Fig. 7a. Scale bars, 100 μm.

  5. Distribution of genetic diversity across the barley genome.
    Figure 5: Distribution of genetic diversity across the barley genome.

    Ninety-six elite barley cultivars, including 48 from the winter gene pool (blue line) and 48 from the spring gene pool (red line), were used. Diversity (unbiased heterozygosity, y axis) is plotted as the rolling average of 100 adjacent SNPs along each chromosome. For improved visualization, all chromosomes have been normalized to a standard length. a, Patterns of diversity on chromosomes 1H–7H (top to bottom). The distance between each SNP has been normalized (that is, this does not show genetic distance). The number of SNPs included on each chromosome is given at the bottom right of each plot. b, The same diversity values normalized according to physical distance. Extensive peri-centromeric regions of very low diversity in the spring gene pool are highlighted in green and low diversity in the winter gene pool in purple. Regions with similar levels of diversity in both gene pools are highlighted in orange. Coloured dots show the position of eight loci previously identified as being differentiated between the winter and spring gene pools.

  6. Gene annotation pipeline.
    Extended Data Fig. 1: Gene annotation pipeline.

    a, Gene annotation pipeline combined gene evidence information from four data sources. Open reading frames were then predicted for 83,105 gene candidates. b, Gene candidates were classified into high-confidence (HC) and low-confidence (LC) genes on the basis of homology to reference proteins and alignment to library of repeat elements. Additional filtering procedures were applied before defining the final gene sets. Arrows between boxes with counts of high-confidence and low-confidence genes in each step indicate re-classifications (high-confidence to low-confidence, or low-confidence to high-confidence).

  7. Assembly validation.
    Extended Data Fig. 2: Assembly validation.

    a, Conserved gene order between barley (y axis) and B. distachyon (x axis). b, Completeness of the gene annotation as assessed by BUSCO. c, Representation of repetitive k-mers in reads and assemblies. d, Representation of full-length LTR retrotransposons in sequence assemblies of plant genomes with different sizes (represented by black points). The map-based reference sequence of barley reported in the present paper is shown in blue. Red dots correspond to shotgun assemblies of the barley genome7 and wheat chromosome 3B99.

  8. Hi-C contact matrices.
    Extended Data Fig. 3: Hi-C contact matrices.

    a, Intrachromosomal contacts. b, Interchromosomal contacts. Darker red indicates a higher contact probability.

  9. Global patterns in Hi-C contact matrices.
    Extended Data Fig. 4: Global patterns in Hi-C contact matrices.

    a, Principal component analysis of intrachromosomal Hi-C contact matrices. The eigenvectors of the first three principal components are plotted. Centromere positions are marked with a red line. b, Proportion of variance explained by linear models incorporating position informational in the linear genome fitted to the Hi-C contact matrices. c, Hi-C link counts in Morex × Barke F1 hybrids within the same chromosome, between homologous chromosomes and between non-homologous chromosomes.

  10. Distributions of genomic features and the context of repetitive elements.
    Extended Data Fig. 5: Distributions of genomic features and the context of repetitive elements.

    a, b, Panels a and b are analogous to Figs 1a and 2a. Grey vertical connector bars and dashed lines inside sub-panels between sub-panels for each chromosome indicate centromere positions.

  11. Experimental strategy to distinguish individual amy1_1 copies by PCR from genomic DNA through polymorphisms in the extended promoter regions of amy1_1 full-length copies.
    Extended Data Fig. 6: Experimental strategy to distinguish individual amy1_1 copies by PCR from genomic DNA through polymorphisms in the extended promoter regions of amy1_1 full-length copies.

    a, Experimental strategy, primers CD52_amy1fw and CD53_amy1rc bind in the extended promotor region of all full-length amy1_1 copies (expected amplicon sizes are 225 bp for amy1_1a, 299 bp for amy1_1b and amy1_1d and 336 bp for amy1_1c). Forward primers CD54_fw1a, CD55_fw1b and CD56_fw1c are designed to specifically amplify copies amy1_1a, amy1_1b and amy1_1c, respectively when used with reverse primer CD58_amy1rc, which binds in the coding region of all amy1_1 copies. Expected amplicon sizes are 1,024 bp (amy1_1a), 1,026 bp (amy1_1b) and 757 bp (amy1_1c). Primer pair (CD55_fw1b–CD58_amy1rc) further binds to copy amy1_1d: here, sequences of the expected amplicons contain sufficient polymorphisms to distinguish these copies from each other. Positions of selected sequence polymorphisms and deleted regions suitable to distinguish single copies are indicated as black vertical bars and gaps, respectively. Numbering was done in respect of copy amy1_1b. b, PCR amplification of amy1_1 promoter regions in six barley cultivars and landraces. As expected, a PCR for cultivar Morex, using universal primers CD52_amy1fw and CD53_amy1rc, resulted in three amplicons of the expected sizes 225, 299 and 336 bp (compare a), which was confirmed by Sanger sequencing. Further primers CD52_amy1fw and CD53_amy1rc were used to amplify the amy1_1 extended promoter region in various barley cultivars. These experiments indicate polymorphic variation in, or even absence of, single promoters of amy1_1 in the different cultivars. The cultivars analysed differ in row type (six-rowed: cultivars Morex, Masan Naked 1, Akashinriki, Etincel; two-rowed: cultivars Barke, Bowman), growth habit (spring barley: cultivars Morex, Barke, Bowman, Masan Naked 1, Akashinriki; winter barley: cultivar Etincel) and geographic origin (North America: cultivars Morex, Bowman; Europe: cultivars Barke, Etincel; Asia: cultivars Masan Naked 1, Akashinriki). The cultivars Masan Naked 1 and Akashinriki depict landraces used for food, Bowman was classified as non-malting barley, while Morex, Barke and Etincel represent modern malting barley. c, Copy-specific PCR amplification of amy1_1 extended promoter regions. PCR amplification and Sanger sequencing identified three amy1_1 copies in barley cultivar Morex: amy1_1a (CD54_fw1a–CD58_amy1rc), amy1_1b (CD55_fw1b–CD58_amy1rc) and amy1_1c (CD56_fw1c–CD58_amy1rc). Additionally, sequencing revealed two polymorphic sites in PCR amplicon amy1_1b (CD55_fw1b–CD58_amy1rc) at positions 721 bp (T/C) and 1175 bp (C/T) (see a), indicating the presence of one or two additional amy1_1b-like copies in the genome of the analysed individual. The presence of copy amy1-1d could not be confirmed. The reason for that might have been sequence deviations in the cultivar Morex accession used for BAC library construction versus that used for the presented experiments, or differences in PCR efficiency for amplification of copies amy1_1b and amy1_1d.

  12. SWEET gene expression.
    Extended Data Fig. 7: SWEET gene expression.

    a, Control experiment for mRNA in situ hybridizations shown in Fig. 3c. In situ hybridization with sense probes for SWEET11a (top) and SWEET11b (bottom). Scale bars, 100 μm. b, Expression of SWEET11a and SWEET11b. Results of qPCR in different plant organs and in the developing grains at 7 days after flowering (DAF).

  13. Haplotype blocks in sets of 48 samples each of elite two-row spring barley lines (top half of each chromosome’s figure) and winter barley lines (bottom half), separately for each chromosome.
    Extended Data Fig. 8: Haplotype blocks in sets of 48 samples each of elite two-row spring barley lines (top half of each chromosome’s figure) and winter barley lines (bottom half), separately for each chromosome.

    We restricted the number of SNPs per chromosome by randomly choosing 3,500 to fit with the maximum permitted by the software. The red and green plots in the centre of each chromosome figure represent whole-canvas dumps produced with the Flapjack software97. Markers are arranged in columns in linear order along the chromosome; red pixels represent reference alleles, while green pixels represent alternative alleles. Each row represents a barley cultivar; these have been sorted top to bottom by year of introduction (ascending). The Flapjack plots are framed by cropped linkage disequilibrium plots generated with the HaploView software96. Colour intensity conveys the extent of linkage between pairs of markers (red, highest). Approximate centromere positions are indicated by semi-opaque grey squares. The triangles with the thin black outline represent haplotype blocks as computed by HaploView. In some regions, extensive stretches exist where no blocks were detected (for example, chr2H, spring lines in top half, near centromere). These generally present highly monomorphic regions where there is no evidence for multiple haplotypes, and consequently blocks were not called.

Tables

  1. Hi-C and optical map datasets for chromosome-scale assembly
    Extended Data Table 1: Hi-C and optical map datasets for chromosome-scale assembly
  2. Statistics on gene annotation and genomic compartments
    Extended Data Table 2: Statistics on gene annotation and genomic compartments
  3. Repeat annotation statistics
    Extended Data Table 3: Repeat annotation statistics
  4. Information on gene families associated with malting quality
    Extended Data Table 4: Information on gene families associated with malting quality

References

  1. van Zeist, W. & Bakker-Heeres, J. A. H. Archaeological studies in the Levant 1. Neolithic sites in the Damascus basin: Aswad, Ghoraifé, Ramad. Palaeohistoria 24, 165256 (1985)
  2. Riehl, S., Zeidi, M. & Conard, N. J. Emergence of agriculture in the foothills of the Zagros Mountains of Iran. Science 341, 6567 (2013)
  3. Dietrich, O., Heun, M., Notroff, J., Schmidt, K. & Zarnkow, M. The role of cult and feasting in the emergence of Neolithic communities. New evidence from Göbekli Tepe, south-eastern Turkey. Antiquity 86, 674695 (2012)
  4. Hayden, B., Canuel, N. & Shanse, J. What was brewing in the Natufian? An archaeological assessment of brewing technology in the Epipaleolithic. J. Archaeol. Method Theory 20, 102150 (2013)
  5. Wang, J. et al. Revealing a 5,000-y-old beer recipe in China. Proc. Natl Acad. Sci. USA 113, 64446448 (2016)
  6. Zohary, D., Hopf, M. & Weiss, E. Domestication of Plants in the Old World: The Origin and Spread of Domesticated Plants in Southwest Asia, Europe, and the Mediterranean Basin (Oxford Univ. Press, 2012)
  7. International Barley Genome Sequencing Consortium. A physical, genetic and functional sequence assembly of the barley genome. Nature 491, 711716 (2012)
  8. Yang, P. et al. PROTEIN DISULFIDE ISOMERASE LIKE 5-1 is a susceptibility factor to plant viruses. Proc. Natl Acad. Sci. USA 111, 21042109 (2014)
  9. Pourkheirandish, M. et al. Evolution of the grain dispersal system in barley. Cell 162, 527539 (2015)
  10. Russell, J. et al. Exome sequencing of geographically diverse barley landraces and wild relatives gives insights into environmental adaptation. Nat. Genet. 48, 10241030 (2016)
  11. Künzel, G., Korzun, L. & Meister, A. Cytologically integrated physical restriction fragment length polymorphism maps for the barley genome based on translocation breakpoints. Genetics 154, 397412 (2000)
  12. Beier, S. et al. Multiplex sequencing of bacterial artificial chromosomes for assembling complex plant genomes. Plant Biotechnol. J. 14, 15111522 (2016)
  13. Muñoz-Amatriaín, M. et al. Sequencing of 15 622 gene-bearing BACs clarifies the gene-dense regions of the barley genome. Plant J. 84, 216227 (2015)
  14. Ounit, R., Wanamaker, S., Close, T. J. & Lonardi, S. CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers. BMC Genomics 16, 236 (2015)
  15. Colmsee, C. et al. BARLEX - the Barley Draft Genome Explorer. Mol. Plant 8, 964966 (2015)
  16. Ariyadasa, R. et al. A sequence-ready physical map of barley anchored genetically by two million single-nucleotide polymorphisms. Plant Physiol. 164, 412423 (2014)
  17. Mascher, M. et al. Anchoring and ordering NGS contig assemblies by population sequencing (POPSEQ). Plant J. 76, 718727 (2013)
  18. Lam, E. T. et al. Genome mapping on nanochannel arrays for structural variation analysis and sequence assembly. Nat. Biotechnol. 30, 771776 (2012)
  19. Kalhor, R., Tjong, H., Jayathilaka, N., Alber, F. & Chen, L. Genome architectures revealed by tethered chromosome conformation capture and population-based modeling. Nat. Biotechnol. 30, 9098 (2011)
  20. Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289293 (2009)
  21. Burton, J. N. et al. Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nat. Biotechnol. 31, 11191125 (2013)
  22. Beier, S. et al. Construction of a map-based reference genome sequence for barley, Hordeum vulgare L. Sci. Data 4, 170044 (2017)
  23. Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 32103212 (2015)
  24. Fuchs, J., Houben, A., Brandes, A. & Schubert, I. Chromosome ‘painting’ in plants - a feasible technique? Chromosoma 104, 315320 (1996)
  25. Grob, S., Schmid, M. W. & Grossniklaus, U. Hi-C analysis in Arabidopsis identifies the KNOT, a structure with similarities to the flamenco locus of Drosophila. Mol. Cell 55, 678693 (2014)
  26. Duan, Z. et al. A three-dimensional model of the yeast genome. Nature 465, 363367 (2010)
  27. Tiang, C. L., He, Y. & Pawlowski, W. P. Chromosome organization and dynamics during interphase, mitosis, and meiosis in plants. Plant Physiol. 158, 2634 (2012)
  28. Rao, S. S. et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 16651680 (2014)
  29. Houben, A. et al. Methylation of histone H3 in euchromatin of plant chromosomes depends on basic nuclear DNA content. Plant J. 33, 967973 (2003)
  30. Flavell, R. B., Bennett, M. D., Smith, J. B. & Smith, D. B. Genome size and the proportion of repeated nucleotide sequence DNA in plants. Biochem. Genet. 12, 257269 (1974)
  31. SanMiguel, P. et al. Nested retrotransposons in the intergenic regions of the maize genome. Science 274, 765768 (1996)
  32. Wicker, T., Matthews, D. E. & Keller, B. TREP: a database for Triticeae repetitive elements. Trends Plant Sci. 7, 561562 (2002)
  33. Choulet, F. et al. Structural and functional partitioning of bread wheat chromosome 3B. Science 345, 1249721 (2014)
  34. Bureau, T. E. & Wessler, S. R. Stowaway: a new family of inverted repeat elements associated with the genes of both monocotyledonous and dicotyledonous plants. Plant Cell 6, 907916 (1994)
  35. Bureau, T. E. & Wessler, S. R. Mobile inverted-repeat elements of the Tourist family are associated with the genes of many cereal grasses. Proc. Natl Acad. Sci. USA 91, 14111415 (1994)
  36. Malik, H. S. & Eickbush, T. H. Modular evolution of the integrase domain in the Ty3/Gypsy class of LTR retrotransposons. J. Virol. 73, 51865190 (1999)
  37. Harris, M. A. et al. The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res. 32, D258D261 (2004)
  38. Huang, N., Sutliff, T. D., Litts, J. C. & Rodriguez, R. L. Classification and characterization of the rice α-amylase multigene family. Plant Mol. Biol. 14, 655668 (1990)
  39. Muthukrishnan, S., Gill, B. S., Swegle, M. & Chandra, G. R. Structural genes for α-amylases are located on barley chromosomes 1 and 6. J. Biol. Chem. 259, 1363713639 (1984)
  40. Khursheed, B. & Rogers, J. C. Barley α-amylase genes. Quantitative comparison of steady-state mRNA levels from individual members of the two different families expressed in aleurone cells. J. Biol. Chem. 263, 1895318960 (1988)
  41. Melkus, G. et al. Dynamic 13C/1H NMR imaging uncovers sugar allocation in the living seed. Plant Biotechnol. J. 9, 10221037 (2011)
  42. Chen, L. Q. et al. Sucrose efflux mediated by SWEET proteins as a key step for phloem transport. Science 335, 207211 (2012)
  43. Tran, V., Weier, D., Radchuk, R., Thiel, J. & Radchuk, V. Caspase-like activities accompany programmed cell death events in developing barley grains. PLoS ONE 9, e109426 (2014)
  44. Comadran, J. et al. Natural variation in a homolog of Antirrhinum CENTRORADIALIS contributed to spring growth habit and environmental adaptation in cultivated barley. Nat. Genet. 44, 13881392 (2012)
  45. Schmalenbach, I., Léon, J. & Pillen, K. Identification and verification of QTLs for agronomic traits using wild barley introgression lines. Theor. Appl. Genet. 118, 483497 (2009)
  46. Han, F. et al. Dissection of a malting quality QTL region on chromosome 1 (7H) of barley. Mol. Breed. 14, 339347 (2004)
  47. Schnable, P. S. et al. The B73 maize genome: complexity, diversity, and dynamics. Science 326, 11121115 (2009)
  48. Zimin, A. V. et al. Hybrid assembly of the large and highly repetitive genome of Aegilops tauschii, a progenitor of bread wheat, with the mega-reads algorithm. Preprint at http://biorxiv.org/content/early/2016/07/26/066100 (2016)
  49. Pendleton, M. et al. Assembly and diploid architecture of an individual human genome via single-molecule technologies. Nat. Methods 12, 780786 (2015)
  50. Hirsch, C. et al. Draft assembly of elite inbred line PH207 provides insights into genomic and transcriptome diversity in maize. Plant Cell 28, 27002714 (2016)
  51. Chevreux, B., Wetter, T. & Suhai, S. Genome sequence assembly using trace signals and additional sequence information. In Computer Science and Biology: Proc. 99th German Conference on Bioinformatics (eds Hofestädt, R. et al. 4556 (GCB, 1999)
  52. Steuernagel, B. et al. De novo 454 sequencing of barcoded BAC pools for comprehensive gene survey and genome analysis in the complex genome of barley. BMC Genomics 10, 547 (2009)
  53. Taudien, S. et al. Sequencing of BAC pools by different next generation sequencing platforms and strategies. BMC Res. Notes 4, 411 (2011)
  54. Luo, R. et al. SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience 1, 18 (2012)
  55. Simpson, J. T. et al. ABySS: a parallel assembler for short read sequence data. Genome Res. 19, 11171123 (2009)
  56. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 17541760 (2009)
  57. Boetzer, M., Henkel, C. V., Jansen, H. J., Butler, D. & Pirovano, W. Scaffolding pre-assembled contigs using SSPACE. Bioinformatics 27, 578579 (2011)
  58. Belton, J. M. et al. Hi-C: a comprehensive technique to capture the conformation of genomes. Methods 58, 268276 (2012)
  59. Stanˇková, H. et al. BioNano genome mapping of individual chromosomes supports physical mapping and sequence assembly in complex plant genomes. Plant Biotechnol. J. 14, 15231531 (2016)
  60. Lysák, M. A. et al. Flow karyotyping and sorting of mitotic chromosomes of barley (Hordeum vulgare L.). Chromosome Res. 7, 431444 (1999)
  61. Šimková, H., Cˇ íhalíková, J., Vrána, J., Lysák, M. & Doležel, J. Preparation of HMW DNA from plant nuclei and chromosomes isolated from root tips. Biol. Plant. 46, 369373 (2003)
  62. Cao, H. et al. Rapid detection of structural variation in a human genome using nanochannel-based genome mapping technology. Gigascience 3, 34 (2014)
  63. Zhang, Z., Schwartz, S., Wagner, L. & Miller, W. A greedy algorithm for aligning DNA sequences. J. Comput. Biol. 7, 203214 (2000)
  64. Csardi, G. & Nepusz, T. The igraph software package for complex network research. InterJournal Complex Syst. 1695 (2006)
  65. Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet. J. 17, 1012 (2011)
  66. Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at https://arxiv.org/abs/1303.3997 (2013)
  67. Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841842 (2010)
  68. Hu, M. et al. HiCNorm: removing biases in Hi-C data via Poisson regression. Bioinformatics 28, 31313133 (2012)
  69. R Core Team. R: a language and environment for statistical computing (R Foundation for Statistical Computing, 2015)
  70. Li, H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 27, 29872993 (2011)
  71. DePristo, M. A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491498 (2011)
  72. Aliyeva-Schnorr, L. et al. Cytogenetic mapping with centromeric bacterial artificial chromosomes contigs shows that this recombination-poor region comprises more than half of barley chromosome 3H. Plant J. 84, 385394 (2015)
  73. Hudakova, S. et al. Sequence organization of barley centromeres. Nucleic Acids Res. 29, 50295035 (2001)
  74. International Rice Genome Sequencing Project. The map-based sequence of the rice genome. Nature 436, 793800 (2005)
  75. International Brachypodium Initiative. Genome sequencing and analysis of the model grass Brachypodium distachyon. Nature 463, 763768 (2010)
  76. Paterson, A. H. et al. The Sorghum bicolor genome and the diversification of grasses. Nature 457, 551556 (2009)
  77. Matsumoto, T. et al. Comprehensive sequence analysis of 24,783 barley full-length cDNAs derived from 12 clone libraries. Plant Physiol. 156, 2028 (2011)
  78. Trapnell, C. et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat. Protoc. 7, 562578 (2012)
  79. Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403410 (1990)
  80. Arabidopsis Genome Initiative. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408, 796815 (2000)
  81. Spannagl, M. et al. PGSB PlantsDB: updates to the database framework for comparative plant genome research. Nucleic Acids Res. 44 (D1), D1141D1147 (2016)
  82. Ellinghaus, D., Kurtz, S. & Willhoeft, U. LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons. BMC Bioinformatics 9, 18 (2008)
  83. Eddy, S. R. Accelerated profile HMM searches. PLOS Comput. Biol. 7, e1002195 (2011)
  84. SanMiguel, P., Gaut, B. S., Tikhonov, A., Nakajima, Y. & Bennetzen, J. L. The paleontology of intergene retrotransposons of maize. Nat. Genet. 20, 4345 (1998)
  85. Gremme, G., Steinbiss, S. & Kurtz, S. GenomeTools: a comprehensive software library for efficient processing of structured genome annotations. IEEE/ACM Trans. Comput. Biol. Bioinformatics 10, 645656 (2013)
  86. Bateman, A. et al. The Pfam protein families database. Nucleic Acids Res. 32, D138D141 (2004)
  87. Petersen, T. N., Brunak, S., von Heijne, G. & Nielsen, H. SignalP 4.0: discriminating signal peptides from transmembrane regions. Nat. Methods 8, 785786 (2011)
  88. Lupas, A., Van Dyke, M. & Stock, J. Predicting coiled coils from protein sequences. Science 252, 11621164 (1991)
  89. Li, L., Stoeckert, C. J. Jr & Roos, D. S. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 13, 21782189 (2003)
  90. Bolser, D., Staines, D. M., Pritchard, E. & Kersey, P. Ensembl Plants: integrating tools for visualizing, mining, and analyzing plant genomics data. Methods Mol. Biol. 1374, 115140 (2016)
  91. Gentleman, R. C. et al. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 5, R80 (2004)
  92. Supek, F., Bošnjak, M., Škunca, N. & Šmuc, T. REVIGO summarizes and visualizes long lists of Gene Ontology terms. PLoS ONE 6, e21800 (2011)
  93. Radchuk, V., Weier, D., Radchuk, R., Weschke, W. & Weber, H. Development of maternal seed tissue in barley is mediated by regulated cell expansion and cell disintegration and coordinated with endosperm growth. J. Exp. Bot. 62, 12171227 (2011)
  94. Mascher, M. et al. Barley whole exome capture: a tool for genomic research in the genus Hordeum and beyond. Plant J. 76, 494505 (2013)
  95. Milne, I. et al. Tablet—next generation sequence assembly visualization. Bioinformatics 26, 401402 (2010)
  96. Barrett, J. C., Fry, B., Maller, J. & Daly, M. J. Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 21, 263265 (2005)
  97. Milne, I. et al. Flapjack—graphical genotype visualization. Bioinformatics 26, 31333134 (2010)
  98. Peakall, R. & Smouse, P. E. GenAlEx 6.5: genetic analysis in Excel. Population genetic software for teaching and research—an update. Bioinformatics 28, 25372539 (2012)
  99. The International Wheat Genome Sequencing Consortium. A chromosome-based draft sequence of the hexaploid bread wheat (Triticum aestivum) genome. Science 345, 1251788 (2014)

Download references

Author information

  1. These authors contributed equally to this work.

    • Martin Mascher &
    • Heidrun Gundlach

Affiliations

  1. Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466 Seeland, Germany

    • Martin Mascher,
    • Axel Himmelbach,
    • Sebastian Beier,
    • Volodymyr Radchuk,
    • Christian Colmsee,
    • Thomas Schmutzer,
    • Lala Aliyeva-Schnorr,
    • Ljudmilla Borisjuk,
    • Andreas Houben,
    • Uwe Scholz &
    • Nils Stein
  2. German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, 04103 Leipzig, Germany

    • Martin Mascher
  3. PGSB - Plant Genome and Systems Biology, Helmholtz Center Munich - German Research Center for Environmental Health, 85764 Neuherberg, Germany

    • Heidrun Gundlach,
    • Sven O. Twardziok,
    • Georg Haberer,
    • Klaus F. X. Mayer &
    • Manuel Spannagl
  4. Department of Plant and Microbial Biology, University of Zurich, 8008 Zurich, Switzerland

    • Thomas Wicker
  5. Carlsberg Research Laboratory, 1799 Copenhagen, Denmark

    • Christoph Dockter,
    • Anna Chailyan &
    • Ilka Braumann
  6. The James Hutton Institute, Dundee DD2 5DA, UK

    • Pete E. Hedley,
    • Joanne Russell,
    • Micha Bayer,
    • Luke Ramsay,
    • Hui Liu &
    • Robbie Waugh
  7. School of Veterinary and Life Sciences, Murdoch University, Murdoch, WA6150, Australia

    • Xiao-Qi Zhang,
    • Penghao Wang,
    • Gaofeng Zhou &
    • Chengdao Li
  8. Australian Export Grains Innovation Centre, South Perth, WA6151, Australia

    • Qisen Zhang
  9. Centre for Comparative Genomics, Murdoch University, WA6150, Murdoch, Australia

    • Roberto A. Barrero,
    • Brett Chapman,
    • John K. McCooke,
    • Cong Tan &
    • Matthew I. Bellgard
  10. Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN 55108, Minnesota, USA

    • Lin Li &
    • Gary J. Muehlbauer
  11. Leibniz Institute on Aging - Fritz Lipmann Institute (FLI), 07745 Jena, Germany

    • Stefan Taudien,
    • Marco Groth,
    • Marius Felder &
    • Matthias Platzer
  12. BioNano Genomics Inc., San Diego, CA 92121, California, USA

    • Alex Hastie &
    • Saki Chan
  13. Institute of Experimental Botany, Centre of the Region Haná for Biotechnological and Agricultural Research, 78371 Olomouc, Czech Republic

    • Hana Šimková,
    • Helena Staňková,
    • Jan Vrána &
    • Jaroslav Doležel
  14. Department of Botany & Plant Sciences, University of California, Riverside, Riverside, CA 92521, California, USA

    • María Muñoz-Amatriaín,
    • Steve Wanamaker &
    • Timothy J. Close
  15. Department of Computer Science and Engineering, University of California, Riverside, Riverside, CA 92521 California, USA

    • Rachid Ounit &
    • Stefano Lonardi
  16. European Molecular Biology Laboratory - The European Bioinformatics Institute, Hinxton CB10 1SD, UK

    • Daniel Bolser &
    • Paul Kersey
  17. Department of Agricultural and Environmental Sciences, University of Udine, 33100 Udine, Italy

    • Stefano Grasso
  18. Green Technology, Natural Resources Institute (Luke), Viikki Plant Science Centre, and Institute of Biotechnology, University of Helsinki, 00014, Helsinki, Finland

    • Jaakko Tanskanen &
    • Alan H. Schulman
  19. Earlham Institute, Norwich NR4 7UH, UK

    • Dharanya Sampath,
    • Darren Heavens,
    • Leah Clissold,
    • Sarah Ayling,
    • Matthew D. Clark &
    • Mario Caccamo
  20. BGI-Shenzhen, Shenzhen, 518083, China

    • Sujie Cao,
    • Hua Li,
    • Xuan Li,
    • Chongyun Lin &
    • Songbo Wang
  21. College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, 310058, China

    • Fei Dai,
    • Yong Han,
    • Shuya Yin &
    • Guoping Zhang
  22. Kansas State University, Wheat Genetics Resource Center, Department of Plant Pathology and Department of Agronomy, Manhattan, KS 66506, Kansas, USA

    • Jesse A. Poland
  23. School of Agriculture, University of Adelaide, Urrbrae, SA5064, Australia

    • Peter Langridge
  24. Department of Plant and Microbial Biology, University of Minnesota, St. Paul, MN 55108, Minnesota, USA

    • Gary J. Muehlbauer
  25. School of Environmental Sciences, University of East Anglia, Norwich NR4 7TJ, UK

    • Matthew D. Clark
  26. National Institute of Agricultural Botany, Cambridge CB3 0LE, UK

    • Mario Caccamo
  27. Wissenschaftszentrum Weihenstephan (WZW), Technical University Munich, 85354 Freising, Germany

    • Klaus F. X. Mayer
  28. Department of Biology, Lund University, 22362 Lund, Sweden

    • Mats Hansson
  29. Department of Agriculture and Food, Government of Western Australia, South Perth WA 6151, Australia

    • Chengdao Li
  30. Hubei Collaborative Innovation Centre for Grain Industry, Yangtze University, Jingzhou, Hubei, 434023, China

    • Chengdao Li
  31. School of Life Sciences, University of Dundee, Dundee DD2 5DA, UK

    • Robbie Waugh
  32. School of Plant Biology, University of Western Australia, Crawley, WA6009, Australia

    • Nils Stein

Contributions

Project coordination: M.S., I.B., C. Li, R.W. (co-leader), N.S. (leader); BAC sequencing and assembly (1H, 3H, 4H): S.B., A. Himmelbach, S.T., M.F., M.G., M.M., U.S. (co-leader), M.P. (co-leader), N.S. (leader); BAC sequencing and assembly (2H, unassigned): D.S., D.H., S.A. (co-leader), M.D.C. (co-leader), M.C. (co-leader), R.W. (leader); BAC sequencing and assembly (5H, 7H): X.Z., R.A.B., Q.Z., C.T., J.K.M., B.C., G. Zhou, F.D., Y.H., S.Y., S. Cao, S. Wang, X.L., M.I.B., P.L., G. Zhang (co-leader), C. Li (leader); BAC sequencing and assembly (6H): S.B., S. Wang, C. Lin, H. Li, U.S., M.H. (co-leader), I.B. (leader); BAC sequencing (gene-bearing): M.M.-A., R.O., S. Wanamaker, S.L. (co-leader), T.J.C. (leader); optical mapping: A. Hastie, H.Š., J.T., H.S., J.V., S. Chan, M.M., N.S., J.D., A.H.S. (leader); data integration: M.M. (leader), S.B., C.C., D.B., L.L., T.S., J.A.P., P.K., N.S., U.S. (co-leader); transcriptome sequencing and analysis: P.E.H., M.B., J.R., H. Liu, S.T., M.F., M.G., M.P., R.W. (leader); annotation of transcribed regions: S.O.T., G.H., R.A.B., L.L., G.J.M., K.F.X.M. (co-leader), M.S. (leader); repetitive DNA analysis: T.W. (co-leader), J.T., K.F.X.M., A.H.S., H.G. (leader); gene family analysis: Q.Z., M.S., V.R., C.D., G.H., A.C., D.B., P.W., L.B., N.S., P.K., C. Li (co-leader), I.B. (leader); chromosome conformation capture: A. Himmelbach, S.G., L.A.-S., A. Houben, M.M. (co-leader), N.S. (leader); resequencing and diversity analysis: J.R., M.B., P.E.H., L.R., L.C., R.W. (leader); writing: M.M. (co-leader), M.S., A.H.S., G.J.M., R.W., N.S. (leader). All authors read and commented on the manuscript.

Competing financial interests

A. Hastie is an employee of Bionano Genomics.

Corresponding authors

Correspondence to:

Reviewer Information Nature thanks M. Bevan, B. Keller and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Author details

Extended data figures and tables

Extended Data Figures

  1. Extended Data Figure 1: Gene annotation pipeline. (187 KB)

    a, Gene annotation pipeline combined gene evidence information from four data sources. Open reading frames were then predicted for 83,105 gene candidates. b, Gene candidates were classified into high-confidence (HC) and low-confidence (LC) genes on the basis of homology to reference proteins and alignment to library of repeat elements. Additional filtering procedures were applied before defining the final gene sets. Arrows between boxes with counts of high-confidence and low-confidence genes in each step indicate re-classifications (high-confidence to low-confidence, or low-confidence to high-confidence).

  2. Extended Data Figure 2: Assembly validation. (369 KB)

    a, Conserved gene order between barley (y axis) and B. distachyon (x axis). b, Completeness of the gene annotation as assessed by BUSCO. c, Representation of repetitive k-mers in reads and assemblies. d, Representation of full-length LTR retrotransposons in sequence assemblies of plant genomes with different sizes (represented by black points). The map-based reference sequence of barley reported in the present paper is shown in blue. Red dots correspond to shotgun assemblies of the barley genome7 and wheat chromosome 3B99.

  3. Extended Data Figure 3: Hi-C contact matrices. (545 KB)

    a, Intrachromosomal contacts. b, Interchromosomal contacts. Darker red indicates a higher contact probability.

  4. Extended Data Figure 4: Global patterns in Hi-C contact matrices. (484 KB)

    a, Principal component analysis of intrachromosomal Hi-C contact matrices. The eigenvectors of the first three principal components are plotted. Centromere positions are marked with a red line. b, Proportion of variance explained by linear models incorporating position informational in the linear genome fitted to the Hi-C contact matrices. c, Hi-C link counts in Morex × Barke F1 hybrids within the same chromosome, between homologous chromosomes and between non-homologous chromosomes.

  5. Extended Data Figure 5: Distributions of genomic features and the context of repetitive elements. (887 KB)

    a, b, Panels a and b are analogous to Figs 1a and 2a. Grey vertical connector bars and dashed lines inside sub-panels between sub-panels for each chromosome indicate centromere positions.

  6. Extended Data Figure 6: Experimental strategy to distinguish individual amy1_1 copies by PCR from genomic DNA through polymorphisms in the extended promoter regions of amy1_1 full-length copies. (216 KB)

    a, Experimental strategy, primers CD52_amy1fw and CD53_amy1rc bind in the extended promotor region of all full-length amy1_1 copies (expected amplicon sizes are 225 bp for amy1_1a, 299 bp for amy1_1b and amy1_1d and 336 bp for amy1_1c). Forward primers CD54_fw1a, CD55_fw1b and CD56_fw1c are designed to specifically amplify copies amy1_1a, amy1_1b and amy1_1c, respectively when used with reverse primer CD58_amy1rc, which binds in the coding region of all amy1_1 copies. Expected amplicon sizes are 1,024 bp (amy1_1a), 1,026 bp (amy1_1b) and 757 bp (amy1_1c). Primer pair (CD55_fw1b–CD58_amy1rc) further binds to copy amy1_1d: here, sequences of the expected amplicons contain sufficient polymorphisms to distinguish these copies from each other. Positions of selected sequence polymorphisms and deleted regions suitable to distinguish single copies are indicated as black vertical bars and gaps, respectively. Numbering was done in respect of copy amy1_1b. b, PCR amplification of amy1_1 promoter regions in six barley cultivars and landraces. As expected, a PCR for cultivar Morex, using universal primers CD52_amy1fw and CD53_amy1rc, resulted in three amplicons of the expected sizes 225, 299 and 336 bp (compare a), which was confirmed by Sanger sequencing. Further primers CD52_amy1fw and CD53_amy1rc were used to amplify the amy1_1 extended promoter region in various barley cultivars. These experiments indicate polymorphic variation in, or even absence of, single promoters of amy1_1 in the different cultivars. The cultivars analysed differ in row type (six-rowed: cultivars Morex, Masan Naked 1, Akashinriki, Etincel; two-rowed: cultivars Barke, Bowman), growth habit (spring barley: cultivars Morex, Barke, Bowman, Masan Naked 1, Akashinriki; winter barley: cultivar Etincel) and geographic origin (North America: cultivars Morex, Bowman; Europe: cultivars Barke, Etincel; Asia: cultivars Masan Naked 1, Akashinriki). The cultivars Masan Naked 1 and Akashinriki depict landraces used for food, Bowman was classified as non-malting barley, while Morex, Barke and Etincel represent modern malting barley. c, Copy-specific PCR amplification of amy1_1 extended promoter regions. PCR amplification and Sanger sequencing identified three amy1_1 copies in barley cultivar Morex: amy1_1a (CD54_fw1a–CD58_amy1rc), amy1_1b (CD55_fw1b–CD58_amy1rc) and amy1_1c (CD56_fw1c–CD58_amy1rc). Additionally, sequencing revealed two polymorphic sites in PCR amplicon amy1_1b (CD55_fw1b–CD58_amy1rc) at positions 721 bp (T/C) and 1175 bp (C/T) (see a), indicating the presence of one or two additional amy1_1b-like copies in the genome of the analysed individual. The presence of copy amy1-1d could not be confirmed. The reason for that might have been sequence deviations in the cultivar Morex accession used for BAC library construction versus that used for the presented experiments, or differences in PCR efficiency for amplification of copies amy1_1b and amy1_1d.

  7. Extended Data Figure 7: SWEET gene expression. (227 KB)

    a, Control experiment for mRNA in situ hybridizations shown in Fig. 3c. In situ hybridization with sense probes for SWEET11a (top) and SWEET11b (bottom). Scale bars, 100 μm. b, Expression of SWEET11a and SWEET11b. Results of qPCR in different plant organs and in the developing grains at 7 days after flowering (DAF).

  8. Extended Data Figure 8: Haplotype blocks in sets of 48 samples each of elite two-row spring barley lines (top half of each chromosome’s figure) and winter barley lines (bottom half), separately for each chromosome. (1,525 KB)

    We restricted the number of SNPs per chromosome by randomly choosing 3,500 to fit with the maximum permitted by the software. The red and green plots in the centre of each chromosome figure represent whole-canvas dumps produced with the Flapjack software97. Markers are arranged in columns in linear order along the chromosome; red pixels represent reference alleles, while green pixels represent alternative alleles. Each row represents a barley cultivar; these have been sorted top to bottom by year of introduction (ascending). The Flapjack plots are framed by cropped linkage disequilibrium plots generated with the HaploView software96. Colour intensity conveys the extent of linkage between pairs of markers (red, highest). Approximate centromere positions are indicated by semi-opaque grey squares. The triangles with the thin black outline represent haplotype blocks as computed by HaploView. In some regions, extensive stretches exist where no blocks were detected (for example, chr2H, spring lines in top half, near centromere). These generally present highly monomorphic regions where there is no evidence for multiple haplotypes, and consequently blocks were not called.

Extended Data Tables

  1. Extended Data Table 1: Hi-C and optical map datasets for chromosome-scale assembly (208 KB)
  2. Extended Data Table 2: Statistics on gene annotation and genomic compartments (184 KB)
  3. Extended Data Table 3: Repeat annotation statistics (322 KB)
  4. Extended Data Table 4: Information on gene families associated with malting quality (500 KB)

Supplementary information

PDF files

  1. Supplementary Information (3.3 MB)

    This file contains Supplementary Notes 1-5 and Supplementary References – see contents page for details.

Zip files

  1. Supplementary Data (24.9 MB)

    This zipped file contains Supplementary Tables 4.1-4.5, 4.7 and 5.1.

Additional data