Cereal grasses of the Triticeae tribe have been the major food source in temperate regions since the dawn of agriculture. Their large genomes are characterized by a high content of repetitive elements and large pericentromeric regions that are virtually devoid of meiotic recombination. Here we present a high-quality reference genome assembly for barley (Hordeum vulgare L.). We use chromosome conformation capture mapping to derive the linear order of sequences across the pericentromeric space and to investigate the spatial organization of chromatin in the nucleus at megabase resolution. The composition of genes and repetitive elements differs between distal and proximal regions. Gene family analyses reveal lineage-specific duplications of genes involved in the transport of nutrients to developing seeds and the mobilization of carbohydrates in grains. We demonstrate the importance of the barley reference sequence for breeding by inspecting the genomic partitioning of sequence variation in modern elite germplasm, highlighting regions vulnerable to genetic erosion.
- Archaeological studies in the Levant 1. Neolithic sites in the Damascus basin: Aswad, Ghoraifé, Ramad. Palaeohistoria 24, 165–256 (1985) &
- Emergence of agriculture in the foothills of the Zagros Mountains of Iran. Science 341, 65–67 (2013) , &
- The role of cult and feasting in the emergence of Neolithic communities. New evidence from Göbekli Tepe, south-eastern Turkey. Antiquity 86, 674–695 (2012) , , , &
- What was brewing in the Natufian? An archaeological assessment of brewing technology in the Epipaleolithic. J. Archaeol. Method Theory 20, 102–150 (2013) , &
- Revealing a 5,000-y-old beer recipe in China. Proc. Natl Acad. Sci. USA 113, 6444–6448 (2016) et al.
- Domestication of Plants in the Old World: The Origin and Spread of Domesticated Plants in Southwest Asia, Europe, and the Mediterranean Basin (Oxford Univ. Press, 2012) ., &
- International Barley Genome Sequencing Consortium. A physical, genetic and functional sequence assembly of the barley genome. Nature 491, 711–716 (2012)
- PROTEIN DISULFIDE ISOMERASE LIKE 5-1 is a susceptibility factor to plant viruses. Proc. Natl Acad. Sci. USA 111, 2104–2109 (2014) et al.
- Evolution of the grain dispersal system in barley. Cell 162, 527–539 (2015) et al.
- Exome sequencing of geographically diverse barley landraces and wild relatives gives insights into environmental adaptation. Nat. Genet. 48, 1024–1030 (2016) et al.
- Cytologically integrated physical restriction fragment length polymorphism maps for the barley genome based on translocation breakpoints. Genetics 154, 397–412 (2000) , &
- Multiplex sequencing of bacterial artificial chromosomes for assembling complex plant genomes. Plant Biotechnol. J. 14, 1511–1522 (2016) et al.
- Sequencing of 15 622 gene-bearing BACs clarifies the gene-dense regions of the barley genome. Plant J. 84, 216–227 (2015) et al.
- CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers. BMC Genomics 16, 236 (2015) , , &
- BARLEX - the Barley Draft Genome Explorer. Mol. Plant 8, 964–966 (2015) et al.
- A sequence-ready physical map of barley anchored genetically by two million single-nucleotide polymorphisms. Plant Physiol. 164, 412–423 (2014) et al.
- Anchoring and ordering NGS contig assemblies by population sequencing (POPSEQ). Plant J. 76, 718–727 (2013) et al.
- Genome mapping on nanochannel arrays for structural variation analysis and sequence assembly. Nat. Biotechnol. 30, 771–776 (2012) et al.
- Genome architectures revealed by tethered chromosome conformation capture and population-based modeling. Nat. Biotechnol. 30, 90–98 (2011) , , , &
- Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289–293 (2009) et al.
- Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nat. Biotechnol. 31, 1119–1125 (2013) et al.
- Construction of a map-based reference genome sequence for barley, Hordeum vulgare L. Sci. Data 4, 170044 (2017) et al.
- BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015) , , , &
- Chromosome ‘painting’ in plants - a feasible technique? Chromosoma 104, 315–320 (1996) , , &
- Hi-C analysis in Arabidopsis identifies the KNOT, a structure with similarities to the flamenco locus of Drosophila. Mol. Cell 55, 678–693 (2014) , &
- A three-dimensional model of the yeast genome. Nature 465, 363–367 (2010) et al.
- Chromosome organization and dynamics during interphase, mitosis, and meiosis in plants. Plant Physiol. 158, 26–34 (2012) , &
- A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–1680 (2014) et al.
- Methylation of histone H3 in euchromatin of plant chromosomes depends on basic nuclear DNA content. Plant J. 33, 967–973 (2003) et al.
- Genome size and the proportion of repeated nucleotide sequence DNA in plants. Biochem. Genet. 12, 257–269 (1974) , , &
- Nested retrotransposons in the intergenic regions of the maize genome. Science 274, 765–768 (1996) et al.
- TREP: a database for Triticeae repetitive elements. Trends Plant Sci. 7, 561–562 (2002) , &
- Structural and functional partitioning of bread wheat chromosome 3B. Science 345, 1249721 (2014) et al.
- Stowaway: a new family of inverted repeat elements associated with the genes of both monocotyledonous and dicotyledonous plants. Plant Cell 6, 907–916 (1994) &
- Mobile inverted-repeat elements of the Tourist family are associated with the genes of many cereal grasses. Proc. Natl Acad. Sci. USA 91, 1411–1415 (1994) &
- Modular evolution of the integrase domain in the Ty3/Gypsy class of LTR retrotransposons. J. Virol. 73, 5186–5190 (1999) &
- The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res. 32, D258–D261 (2004) et al.
- Classification and characterization of the rice α-amylase multigene family. Plant Mol. Biol. 14, 655–668 (1990) , , &
- Structural genes for α-amylases are located on barley chromosomes 1 and 6. J. Biol. Chem. 259, 13637–13639 (1984) , , &
- Barley α-amylase genes. Quantitative comparison of steady-state mRNA levels from individual members of the two different families expressed in aleurone cells. J. Biol. Chem. 263, 18953–18960 (1988) &
- Dynamic 13C/1H NMR imaging uncovers sugar allocation in the living seed. Plant Biotechnol. J. 9, 1022–1037 (2011) et al.
- Sucrose efflux mediated by SWEET proteins as a key step for phloem transport. Science 335, 207–211 (2012) et al.
- Caspase-like activities accompany programmed cell death events in developing barley grains. PLoS ONE 9, e109426 (2014) , , , &
- Natural variation in a homolog of Antirrhinum CENTRORADIALIS contributed to spring growth habit and environmental adaptation in cultivated barley. Nat. Genet. 44, 1388–1392 (2012) et al.
- Identification and verification of QTLs for agronomic traits using wild barley introgression lines. Theor. Appl. Genet. 118, 483–497 (2009) , &
- Dissection of a malting quality QTL region on chromosome 1 (7H) of barley. Mol. Breed. 14, 339–347 (2004) et al.
- The B73 maize genome: complexity, diversity, and dynamics. Science 326, 1112–1115 (2009) et al.
- Hybrid assembly of the large and highly repetitive genome of Aegilops tauschii, a progenitor of bread wheat, with the mega-reads algorithm. Preprint at http://biorxiv.org/content/early/2016/07/26/066100 (2016) et al.
- Assembly and diploid architecture of an individual human genome via single-molecule technologies. Nat. Methods 12, 780–786 (2015) et al.
- Draft assembly of elite inbred line PH207 provides insights into genomic and transcriptome diversity in maize. Plant Cell 28, 2700–2714 (2016) et al.
- Genome sequence assembly using trace signals and additional sequence information. In Computer Science and Biology: Proc. 99th German Conference on Bioinformatics (eds Hofestädt, R. et al. 45–56 (GCB, 1999) , &
- De novo 454 sequencing of barcoded BAC pools for comprehensive gene survey and genome analysis in the complex genome of barley. BMC Genomics 10, 547 (2009) et al.
- Sequencing of BAC pools by different next generation sequencing platforms and strategies. BMC Res. Notes 4, 411 (2011) et al.
- SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience 1, 18 (2012) et al.
- ABySS: a parallel assembler for short read sequence data. Genome Res. 19, 1117–1123 (2009) et al.
- Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009) &
- Scaffolding pre-assembled contigs using SSPACE. Bioinformatics 27, 578–579 (2011) , , , &
- Hi-C: a comprehensive technique to capture the conformation of genomes. Methods 58, 268–276 (2012) et al.
- BioNano genome mapping of individual chromosomes supports physical mapping and sequence assembly in complex plant genomes. Plant Biotechnol. J. 14, 1523–1531 (2016) et al.
- Flow karyotyping and sorting of mitotic chromosomes of barley (Hordeum vulgare L.). Chromosome Res. 7, 431–444 (1999) et al.
- Preparation of HMW DNA from plant nuclei and chromosomes isolated from root tips. Biol. Plant. 46, 369–373 (2003) , , , &
- Rapid detection of structural variation in a human genome using nanochannel-based genome mapping technology. Gigascience 3, 34 (2014) et al.
- A greedy algorithm for aligning DNA sequences. J. Comput. Biol. 7, 203–214 (2000) , , &
- The igraph software package for complex network research. InterJournal Complex Syst. 1695 (2006) &
- Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet. J. 17, 10–12 (2011)
- https://arxiv.org/abs/1303.3997 (2013) Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at
- BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010) &
- HiCNorm: removing biases in Hi-C data via Poisson regression. Bioinformatics 28, 3131–3133 (2012) et al.
- R Core Team. R: a language and environment for statistical computing (R Foundation for Statistical Computing, 2015)
- A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 27, 2987–2993 (2011)
- A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011) et al.
- Cytogenetic mapping with centromeric bacterial artificial chromosomes contigs shows that this recombination-poor region comprises more than half of barley chromosome 3H. Plant J. 84, 385–394 (2015) et al.
- Sequence organization of barley centromeres. Nucleic Acids Res. 29, 5029–5035 (2001) et al.
- International Rice Genome Sequencing Project. The map-based sequence of the rice genome. Nature 436, 793–800 (2005)
- International Brachypodium Initiative. Genome sequencing and analysis of the model grass Brachypodium distachyon. Nature 463, 763–768 (2010)
- The Sorghum bicolor genome and the diversification of grasses. Nature 457, 551–556 (2009) et al.
- Comprehensive sequence analysis of 24,783 barley full-length cDNAs derived from 12 clone libraries. Plant Physiol. 156, 20–28 (2011) et al.
- Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat. Protoc. 7, 562–578 (2012) et al.
- Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990) , , , &
- Arabidopsis Genome Initiative. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408, 796–815 (2000)
- PGSB PlantsDB: updates to the database framework for comparative plant genome research. Nucleic Acids Res. 44 (D1), D1141–D1147 (2016) et al.
- LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons. BMC Bioinformatics 9, 18 (2008) , &
- Accelerated profile HMM searches. PLOS Comput. Biol. 7, e1002195 (2011)
- The paleontology of intergene retrotransposons of maize. Nat. Genet. 20, 43–45 (1998) , , , &
- GenomeTools: a comprehensive software library for efficient processing of structured genome annotations. IEEE/ACM Trans. Comput. Biol. Bioinformatics 10, 645–656 (2013) , &
- The Pfam protein families database. Nucleic Acids Res. 32, D138–D141 (2004) et al.
- SignalP 4.0: discriminating signal peptides from transmembrane regions. Nat. Methods 8, 785–786 (2011) , , &
- Predicting coiled coils from protein sequences. Science 252, 1162–1164 (1991) , &
- OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 13, 2178–2189 (2003) , &
- Ensembl Plants: integrating tools for visualizing, mining, and analyzing plant genomics data. Methods Mol. Biol. 1374, 115–140 (2016) , , &
- Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 5, R80 (2004) et al.
- REVIGO summarizes and visualizes long lists of Gene Ontology terms. PLoS ONE 6, e21800 (2011) , , &
- Development of maternal seed tissue in barley is mediated by regulated cell expansion and cell disintegration and coordinated with endosperm growth. J. Exp. Bot. 62, 1217–1227 (2011) , , , &
- Barley whole exome capture: a tool for genomic research in the genus Hordeum and beyond. Plant J. 76, 494–505 (2013) et al.
- Tablet—next generation sequence assembly visualization. Bioinformatics 26, 401–402 (2010) et al.
- Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 21, 263–265 (2005) , , &
- Flapjack—graphical genotype visualization. Bioinformatics 26, 3133–3134 (2010) et al.
- GenAlEx 6.5: genetic analysis in Excel. Population genetic software for teaching and research—an update. Bioinformatics 28, 2537–2539 (2012) &
- The International Wheat Genome Sequencing Consortium. A chromosome-based draft sequence of the hexaploid bread wheat (Triticum aestivum) genome. Science 345, 1251788 (2014)
Extended data figures and tables
Extended Data Figures
- Extended Data Figure 1: Gene annotation pipeline. (187 KB)
a, Gene annotation pipeline combined gene evidence information from four data sources. Open reading frames were then predicted for 83,105 gene candidates. b, Gene candidates were classified into high-confidence (HC) and low-confidence (LC) genes on the basis of homology to reference proteins and alignment to library of repeat elements. Additional filtering procedures were applied before defining the final gene sets. Arrows between boxes with counts of high-confidence and low-confidence genes in each step indicate re-classifications (high-confidence to low-confidence, or low-confidence to high-confidence).
- Extended Data Figure 2: Assembly validation. (369 KB)
a, Conserved gene order between barley (y axis) and B. distachyon (x axis). b, Completeness of the gene annotation as assessed by BUSCO. c, Representation of repetitive k-mers in reads and assemblies. d, Representation of full-length LTR retrotransposons in sequence assemblies of plant genomes with different sizes (represented by black points). The map-based reference sequence of barley reported in the present paper is shown in blue. Red dots correspond to shotgun assemblies of the barley genome7 and wheat chromosome 3B99.
- Extended Data Figure 3: Hi-C contact matrices. (545 KB)
a, Intrachromosomal contacts. b, Interchromosomal contacts. Darker red indicates a higher contact probability.
- Extended Data Figure 4: Global patterns in Hi-C contact matrices. (484 KB)
a, Principal component analysis of intrachromosomal Hi-C contact matrices. The eigenvectors of the first three principal components are plotted. Centromere positions are marked with a red line. b, Proportion of variance explained by linear models incorporating position informational in the linear genome fitted to the Hi-C contact matrices. c, Hi-C link counts in Morex × Barke F1 hybrids within the same chromosome, between homologous chromosomes and between non-homologous chromosomes.
- Extended Data Figure 5: Distributions of genomic features and the context of repetitive elements. (887 KB)
- Extended Data Figure 6: Experimental strategy to distinguish individual amy1_1 copies by PCR from genomic DNA through polymorphisms in the extended promoter regions of amy1_1 full-length copies. (216 KB)
a, Experimental strategy, primers CD52_amy1fw and CD53_amy1rc bind in the extended promotor region of all full-length amy1_1 copies (expected amplicon sizes are 225 bp for amy1_1a, 299 bp for amy1_1b and amy1_1d and 336 bp for amy1_1c). Forward primers CD54_fw1a, CD55_fw1b and CD56_fw1c are designed to specifically amplify copies amy1_1a, amy1_1b and amy1_1c, respectively when used with reverse primer CD58_amy1rc, which binds in the coding region of all amy1_1 copies. Expected amplicon sizes are 1,024 bp (amy1_1a), 1,026 bp (amy1_1b) and 757 bp (amy1_1c). Primer pair (CD55_fw1b–CD58_amy1rc) further binds to copy amy1_1d: here, sequences of the expected amplicons contain sufficient polymorphisms to distinguish these copies from each other. Positions of selected sequence polymorphisms and deleted regions suitable to distinguish single copies are indicated as black vertical bars and gaps, respectively. Numbering was done in respect of copy amy1_1b. b, PCR amplification of amy1_1 promoter regions in six barley cultivars and landraces. As expected, a PCR for cultivar Morex, using universal primers CD52_amy1fw and CD53_amy1rc, resulted in three amplicons of the expected sizes 225, 299 and 336 bp (compare a), which was confirmed by Sanger sequencing. Further primers CD52_amy1fw and CD53_amy1rc were used to amplify the amy1_1 extended promoter region in various barley cultivars. These experiments indicate polymorphic variation in, or even absence of, single promoters of amy1_1 in the different cultivars. The cultivars analysed differ in row type (six-rowed: cultivars Morex, Masan Naked 1, Akashinriki, Etincel; two-rowed: cultivars Barke, Bowman), growth habit (spring barley: cultivars Morex, Barke, Bowman, Masan Naked 1, Akashinriki; winter barley: cultivar Etincel) and geographic origin (North America: cultivars Morex, Bowman; Europe: cultivars Barke, Etincel; Asia: cultivars Masan Naked 1, Akashinriki). The cultivars Masan Naked 1 and Akashinriki depict landraces used for food, Bowman was classified as non-malting barley, while Morex, Barke and Etincel represent modern malting barley. c, Copy-specific PCR amplification of amy1_1 extended promoter regions. PCR amplification and Sanger sequencing identified three amy1_1 copies in barley cultivar Morex: amy1_1a (CD54_fw1a–CD58_amy1rc), amy1_1b (CD55_fw1b–CD58_amy1rc) and amy1_1c (CD56_fw1c–CD58_amy1rc). Additionally, sequencing revealed two polymorphic sites in PCR amplicon amy1_1b (CD55_fw1b–CD58_amy1rc) at positions 721 bp (T/C) and 1175 bp (C/T) (see a), indicating the presence of one or two additional amy1_1b-like copies in the genome of the analysed individual. The presence of copy amy1-1d could not be confirmed. The reason for that might have been sequence deviations in the cultivar Morex accession used for BAC library construction versus that used for the presented experiments, or differences in PCR efficiency for amplification of copies amy1_1b and amy1_1d.
- Extended Data Figure 7: SWEET gene expression. (227 KB)
a, Control experiment for mRNA in situ hybridizations shown in Fig. 3c. In situ hybridization with sense probes for SWEET11a (top) and SWEET11b (bottom). Scale bars, 100 μm. b, Expression of SWEET11a and SWEET11b. Results of qPCR in different plant organs and in the developing grains at 7 days after flowering (DAF).
- Extended Data Figure 8: Haplotype blocks in sets of 48 samples each of elite two-row spring barley lines (top half of each chromosome’s figure) and winter barley lines (bottom half), separately for each chromosome. (1,525 KB)
We restricted the number of SNPs per chromosome by randomly choosing 3,500 to fit with the maximum permitted by the software. The red and green plots in the centre of each chromosome figure represent whole-canvas dumps produced with the Flapjack software97. Markers are arranged in columns in linear order along the chromosome; red pixels represent reference alleles, while green pixels represent alternative alleles. Each row represents a barley cultivar; these have been sorted top to bottom by year of introduction (ascending). The Flapjack plots are framed by cropped linkage disequilibrium plots generated with the HaploView software96. Colour intensity conveys the extent of linkage between pairs of markers (red, highest). Approximate centromere positions are indicated by semi-opaque grey squares. The triangles with the thin black outline represent haplotype blocks as computed by HaploView. In some regions, extensive stretches exist where no blocks were detected (for example, chr2H, spring lines in top half, near centromere). These generally present highly monomorphic regions where there is no evidence for multiple haplotypes, and consequently blocks were not called.