A chromosome conformation capture ordered sequence of the barley genome

Cereal grasses of the Triticeae tribe have been the major food source in temperate regions since the dawn of agriculture. Their large genomes are characterized by a high content of repetitive elements and large pericentromeric regions that are virtually devoid of meiotic recombination. Here we present a high-quality reference genome assembly for barley (Hordeum vulgare L.). We use chromosome conformation capture mapping to derive the linear order of sequences across the pericentromeric space and to investigate the spatial organization of chromatin in the nucleus at megabase resolution. The composition of genes and repetitive elements differs between distal and proximal regions. Gene family analyses reveal lineage-specific duplications of genes involved in the transport of nutrients to developing seeds and the mobilization of carbohydrates in grains. We demonstrate the importance of the barley reference sequence for breeding by inspecting the genomic partitioning of sequence variation in modern elite germplasm, highlighting regions vulnerable to genetic erosion. The International Barley Genome Sequencing Consortium reports sequencing and assembly of a reference genome for barley, Hordeum vulgare. Triticeae grasses, which include barley, wheat and rye, are widely cultivated plants with particularly complex genomes and evolutionary histories. Sequencing of the barley genome has been particularly challenging owing to its large size and particular genomic features, such as an abundance of repetitive elements. Nils Stein and colleagues of the International Barley Genome Sequencing Consortium report sequencing and assembly of a reference genome for barley (Hordeumvulgare L). They use a combined approach of hierarchical shotgun sequencing of bacterial artificial chromosomes, genome mapping on nanochannel arrays and chromosome-scale scaffolding with Hi-C sequencing. This brings the first comprehensive, completely ordered assembly of the pericentromeric regions of a Triticeae genome. The authors also sequenced and examined genetic diversity in the exomes of 96 European elite barley lines with a spring or winter growth habit, and highlight the utility of this resource for cereal genomics and breeding programs.

Barley remains dated to the dawn of agriculture have been found at several archaeological sites 1,2 . In addition to indications that barley was an important food crop, recent excavations have fuelled speculation that beverages from fermented grains may have motivated early Neolithic hunter-gatherers to erect some of humankind's oldest monuments 3,4 . Moreover, brewing beer may also have played a role in the eastward spread of the crop after its initial domestication in the Fertile Crescent 5,6 .
Since 2012, both genetic research and crop improvement in barley have benefited from a partly ordered draft sequence assembly 7 . This community resource has underpinned gene isolation 8,9 and population genomic studies 10 . However, these and other efforts have also revealed limitations of the current draft assembly. The limitations are often direct consequences of two characteristic genomic features: the extreme abundance of repetitive elements, and the severely reduced frequency of meiotic recombination in pericentromeric regions 11 . These factors have limited the contiguity of whole-genome assemblies to kilobase-sized sequences originating from low-copy regions of the genome. Thus, a detailed investigation of the composition of the repetitive fraction of the genome-including expanded gene families-and of the distribution of targets of selection and crop improvement in (genetically defined) pericentromeric regions has been beyond reach.
Here we present a map-based reference sequence of the barley genome including the first comprehensively ordered assembly of the pericentromeric regions of a Triticeae genome. The resource highlights a conspicuous distinction between distal and proximal regions of chromosomes that is reflected by the intranuclear chromatin organization. Moreover, chromosomal compartments are differentiated by an exponential gradient of gene density and recombination rate, striking contrasts in the distribution of retrotransposon families, and distinct patterns of genetic diversity.
Cereal grasses of the Triticeae tribe have been the major food source in temperate regions since the dawn of agriculture. Their large genomes are characterized by a high content of repetitive elements and large pericentromeric regions that are virtually devoid of meiotic recombination. Here we present a high-quality reference genome assembly for barley (Hordeum vulgare L.). We use chromosome conformation capture mapping to derive the linear order of sequences across the pericentromeric space and to investigate the spatial organization of chromatin in the nucleus at megabase resolution. The composition of genes and repetitive elements differs between distal and proximal regions. Gene family analyses reveal lineage-specific duplications of genes involved in the transport of nutrients to developing seeds and the mobilization of carbohydrates in grains. We demonstrate the importance of the barley reference sequence for breeding by inspecting the genomic partitioning of sequence variation in modern elite germplasm, highlighting regions vulnerable to genetic erosion.

A chromosome-scale assembly of the barley genome
We adopted a hierarchical approach to generate a high-quality reference genome sequence of the barley cultivar Morex, a US spring six-row malting barley. First, a total of 87,075 bacterial artificial chromosomes (BACs) were sequenced, mainly using Illumina paired-end and mate-pair technology and assembled individually from 4.5 terabases of raw sequence data 12-14 (Supplementary Note 1). In a second step, overlaps between adjacent clones 15 were detected and validated by physical map information 16 , a genetic linkage 17 and a highly contiguous optical map 18 to construct super-scaffolds composed of merged assemblies of individual BACs (Table 1 and Extended Data  Table 1). This increased the contiguity as measured by the N50 value (the scaffold size above which 50% of the total length of the sequence was included in the assembly) from 79 kb to 1.9 Mb. Scaffolds were assigned to chromosomes using a population sequencing (POPSEQ) genetic map 17 . Finally, we used three-dimensional proximity information obtained by chromosome conformation capture sequencing [19][20][21] (Hi-C) to order and orient BAC-based super-scaffolds (Supplementary Note 2 and ref. 22). The final chromosome-scale assembly of the barley genome consists of 6,347 ordered super-scaffolds composed of merged assemblies of individual BACs, representing 4.79 Gb (~ 95%) of the genomic sequence content, of which 4.54 Gb have been assigned to precise chromosomal location in the Hi-C map ( Table 1). Mapping of transcriptome data and reference protein sequences from other plant species to the assembly identified 83,105 putative gene loci including protein-coding genes, non-coding RNAs, pseudogenes and transcribed transposons (  Table 2 and Supplementary Note 3). These loci were filtered further and divided into 39,734 high-confidence genes (with four different sub-categories) and 41,949 low-confidence genes on the basis of sequence homology to related species (Methods and Supplementary Note 3.4). Moreover, we predicted 19,908 long non-coding RNAs (Supplementary Note 3.7) and 792 microRNA precursor loci (Supplementary Note 3.8). The high co-linearity between the Hi-Cbased pseudomolecules and linkage and cytogenetic maps 22 as well as the conserved order of syntenic genes in pericentromeric regions compared with model grass Brachypodium distachyon (Extended Data Fig. 2a) corroborated the quality of the assembly. Extrapolating from a set of conserved eukaryotic core genes 23 , we estimate that the predicted gene models represent 98% of the cultivar Morex barley gene complement (Extended Data Fig. 2b).

Organization of chromatin
Barley has served as a model for traditional cytogenetics 11 ; but relating chromosomal features to unique sequences has been challenging, requiring the cloning of repeat-free probes 24 . The reference sequence allowed us to employ the Hi-C data to interrogate the threedimensional organization of chromatin in the nucleus. As in other eukaryotes 20,25,26 , the spatial proximity of genomic loci as measured by Hi-C link frequency is highly dependent on their distance in the linear genome (Fig. 2a). However, we observed an elevated link frequency at distances above 200 Mb and a pronounced anti-diagonal pattern in the intrachromosomal Hi-C contact matrices ( Fig. 2b and Extended Data Fig. 3a), indicating an increased adjacency of regions on different chromosome arms. We interpret this pattern as reflective of the so-called Rabl configuration 27 of interphase nuclei, where individual chromosomes fold back to juxtapose the long and short arms, with centromeres and telomeres of all chromosomes clustering at opposite poles of the nucleus (Fig. 2c and Supplementary Fig. 2.2). Fluorescence in situ hybridization (Fig. 2d) supported this hypothesis. Principal component analysis of the intrachromosomal proximity matrix showed that the first three principal components cumulatively explained ~70% of the variation and differentiated (1) distal from proximal regions, (2) interstitial from both distal and proximal regions and (3) the long arms from the short arms ( Fig. 2f and Extended Data Fig. 4a). A linear model taking into account the genomic distance between two loci, as well as their relative distance from the centromere, accounted for 79% of the variation (Extended Data Fig. 4b) in the intrachromosomal proximity matrix at 1 Mb resolution. Contacts between loci on different chromosomes followed a similar pattern ( Fig. 2e and Extended Data Fig. 3b): a prominent cross pattern supporting a juxtaposition of long and short arms. In contrast to intrachromosomal matrices, contact probabilities between loci on, for instance, the short arm of one chromosome are equal for loci on both the short and the long arm on another chromosome having the same relative distance to the centromeres: that is, facing each other in the interphase nucleus. We also observed a higher contact frequency between telomere-near regions, as has been observed in Arabidopsis 25 .
To test whether pairs of homologous chromosomes are positioned closer to each other than to non-homologues, we performed diploid Hi-C 28 on leaf tissue from F 1 hybrids between the cultivars Morex and Barke, and assigned the resultant Hi-C links to the haplotypes of both inbred parents by mapping reads to a diploid reference. We did not observe any preferential interaction between homologues. Rather, contacts between the maternal and paternal copies of the same chromosome occurred as frequently as between non-homologues (Extended Data Fig. 4c).
We conclude that the frequency with which loci juxtapose in three-dimensional space is predominantly determined by their position in the linear genome. This is in sharp contrast to the organization of chromatin in human nuclei where two compartments corresponding to open and closed chromatin domains are evident at megabase resolution 20 , but is consistent with cytogenetic mapping of histone marks associated with heterochromatin in large, repeat-rich genomes 29 .

The genomic context of repetitive elements
Large plant genomes consist mainly of highly similar copies of repetitive elements such as long terminal repeat (LTR) retrotransposons and DNA transposons 30,31 . Our hierarchical sequencing strategy reduced the algorithmic complexity of assembling a highly repetitive genome from short reads. Instead of resolving complex repeat structures on the whole-genome level, we reconstructed the sequences of 100-150 kb BACs. This allowed us to disentangle nearly identical copies of highly abundant repetitive elements, as evidenced by the good representation of both mathematically defined repeats and retrotransposon families (Extended Data Fig. 2c, d). Homology-guided repeat annotation with a Triticeae-specific repeat library 32 identified 3.7 Gb (80.8%) of the assembled sequence as derived from transposable elements (Table 1, Fig. 1a and Extended Data Table 3), most of which were present as truncated and degenerated copies, with only 10% of mobile elements intact and potentially active.
Median 20-mer frequencies were used to partition the seven barley chromosomes into three zones ( Fig. 1 and Extended Data Fig. 5a), reminiscent of the three compartments of wheat chromosome 3B 33 . The distal zone 1 was characterized by an enrichment of low-copy regions, a high gene content and frequent meiotic recombination. Zone 2, occupying the interstitial regions of chromosomes, had the highest 20-mer frequencies and intermediate gene density. Surprisingly, the abundance of repetitive 20-mers decreased in the proximal zone 3, where older mobile elements with diverged, and thus unique, sequences predominated (Fig. 1). The three zones also differed in the composition of the gene space (Extended Data Table 2b and Supplementary Note 3). For example, genes involved in defence response and reproductive processes were preferentially found in distal regions, while proximal regions contained more genes related to housekeeping processes, such as photosynthesis and respiration, compared with other parts of the genome (Fig. 1b).
Transposable element groups exhibited pronounced variation in their insertion site preferences ( Fig. 3a and Extended Data Fig. 5b). On a global scale, most miniature inverted-repeat transposable elements and long interspersed elements were found in gene-rich distal regions, as has been reported in other grass species 34,35 . By contrast, zone 3 was populated by Gypsy retrotransposons, while Copia elements favoured zones 1 and 2. These differences in the relative abundance of retrotransposon families were reflected by distinct distributions of functional domains. For example, sequences encoding the chromodomain (PF00385) are concentrated in the vicinity of the centromere and may be involved in the target specificity through incorporation in the integrase of Gypsy elements 36 ( Fig. 3a and Extended Data Fig. 5b).
At a local scale, different types of elements also occupy different niches in the proximity of genes (Fig. 3b). Mariner transposons preferably reside within 1 kb up-or downstream of the coding regions of genes, while Harbinger and long interspersed elements are found further away. The observed distribution of different types of transposable elements around genes may reflect selective pressures, allowing only the smallest elements, namely Mariners, to be tolerated closest to genes. Intriguingly, Helitrons as well as elements of the Harbinger superfamily have a clear preference for promoter regions, while long interspersed elements have a preference for downstream regions (Fig. 3b). At greater distances from genes, large elements such as LTR retrotransposons and CACTA elements dominate.

Expansion of gene families
The barley reference sequence enabled us to disentangle complex gene duplications that may shed light on gene family expansion specific to barley or the Triticeae. A total of 29,944 genes belonged to families with multiple members (Fig. 4a and Supplementary Note 4.1). Gene families expanded in barley were tested for overrepresentation of Gene Ontology 37 terms compared with sorghum, rice, Brachypodium and Arabidopsis. Among the most significant results were terms related to defence response and disease resistance (NBS-LRR and thionin genes), as well as thioredoxin genes (Supplementary Note 4.1).
In the following, we focused on a detailed analysis of gene families having particular importance for malting quality. Germinating barley grains possess high diastatic power: that is, the combined ability of a complex of enzymes to mobilize fermentable sugars from starch. Key diastatic enzymes include α-amylases. The genome of barley cultivar Morex contains 12 α-amylase (amy) family sequences (Supplementary Note 4.2 and Extended Data Table 4a), which can be classified into four subfamilies 38 . Gene duplication events have occurred in the subfamilies amy1 and amy2 (Fig. 4b), located on chromosomes 6H and 7H, respectively. The existence of these duplications had been speculated earlier, but could not be analysed further because of high sequence similarity between the copies. The reference assembly contained five full-length amy1 subfamily genes, four of which, here designated as amy1_1a-d, shared >99.8% identity at the nucleotide level including introns. Locus-specific PCR confirmed earlier suggestions 39,40 of multiple, highly similar amy1_1 genes (Extended Data Fig. 6 and Supplementary Note 4.2). Given the relevance of α-amylase activity to the brewing process, the high variability of the amy1_1 multiple gene locus (Extended Data Fig. 6) observed in landraces and elite lines, including modern malting cultivars, is remarkable.
The accumulation of fermentable carbohydrates in the grain depends on the transfer of sugars from maternal tissue into the developing seeds. In contrast to the two routes of nutrient transfer in rice seeds-the nucellar projection and nucellar epidermis-delivery of assimilates into barley grains occurs predominantly via the nucellar projection 41 and requires active transporters. The family of SUGARS WILL EVENTUALLY BE EXPORTED TRANSPORTER (SWEET) transmembrane proteins mediating sugar efflux 42 consists of 23 members in barley (Extended Data Table 4b and Supplementary Note 4.3). There is a small extension of the sugar-transporting SWEET11, SWEET13, SWEET14 and SWEET15 subfamilies, with two or more genes for each subgroup compared with only a single orthologue in rice and Arabidopsis (Extended Data Table 4b). Duplication of SWEET11 was most likely followed by neofunctionalization as evidenced by divergent expression patterns. Both SWEET11a and SWEET11b were highly expressed in maternal seed tissue, but differed in the distribution of expression domains ( Fig. 4c and Extended Data Fig. 7). Genes encoding a family of vacuolar processing enzymes, which are essential for programmed cell death in maternal tissue 43  These examples of genes involved in sugar transport and metabolism illustrate that the high-quality reference genome sequence can serve as a springboard for the in-depth analysis of the evolutionary history of gene duplications, their relation to morphological and physiological innovations, and their impact on crop performance.

Molecular diversity and haplotype analysis
To explore how the new barley genome assembly could be exploited for genetics and breeding, we generated exome sequence data from 96 European elite barley lines, half with a spring growth habit, half with a winter one (Supplementary Table 5.1). We investigated the extent and partitioning of molecular variation within and between these groups using 71,285 single-nucleotide polymorphisms (SNPs). Plotting diversity values in 100 SNP windows both in linear order (Fig. 5a) and according to physical distance (Fig. 5b) revealed marked contrasts in the levels and distribution of diversity both within and between gene pools. In spring types, extensive regions on +8,000 +6,000 +4,000 +2,000 -2,000 -4,000 -6,000 chromosomes 1H, 2H and 7H were virtually devoid of diversity, as was a large region on 5H in the winter gene pool. For these chromosomes, this results in a single gene-pool-specific haplotype across the extensive pericentromeric regions. Chromosomes 3H, 4H and 6H maintain higher diversity across these regions owing to the presence of multiple similarly extensive haplotypes. This is even more evident when diversity is plotted on a physical scale (Fig. 5b). We presume that the lack of observed variation in elite germplasm is a signature of intense selection during breeding for different end-use sectors (principally malting versus feed barley), and the virtual absence of allelic re-assortment during meiosis owing to restricted recombination in the pericentromeric regions.
Crosses between spring and winter barleys are rarely performed as they are considered to disrupt the gene-pool-specific gene complexes required for general performance (such as phenological adaptations) and end-use quality. Contrasting local patterns of diversity outside the pericentromeric regions therefore also most likely reflect the outcome of selection within alternative gene pools. We explored this further by comparing diversity in eight characterized genes whose variant alleles are important for conditioning barley's seasonal growth habit (Supplementary Note 5). Of the eight genes, HvCEN is uniquely 'locked' in the pericentromeric region of chromosome 2H where alternative alleles at a single SNP confer both differences in days-to-heading 44 and strong latitudinal differentiation 10 . The extensive pericentromeric haplotype in spring barleys (Fig. 5) may stem from selection for this single HvCEN SNP. While strong selection for other favourable alleles locked in the same region in spring barley cannot be ruled out, the virtual absence of recombination severely restricts exploitation of diversity across the entire region. Despite our focus here on life-history traits, strong selection for other traits mapping to pericentromeric regions 45,46 , including good malting quality in the spring gene pool on chromosomes 1H and 7H, would probably also reduce diversity in these regions. Interestingly, we are unaware of any phenotypic trait in the winter gene pool that would The first number below the species name denotes the total number of proteins that were included into the OrthoMCL analysis for each species. The second number indicates the number of genes in clusters for a species. b, Phylogenetic tree of 68 full-length α-amylase protein sequences derived from amy genes identified in the genomes of barley, hexaploid wheat, B. distachyon, rice, sorghum and maize. Each wheat subgenome was considered separately to facilitate the comparison of gene copy numbers and duplication events across species. Note that for the amy4 subfamily, two to three genes per genome were identified in all genomes. These genes are located on distinct chromosomes and hence most probably did not originate from tandem gene duplications. While most species further contain only a single amy3 gene copy per genome, moderate copy number extension was observed in sorghum and rice where a potential tandem gene duplication resulted in two amy3 gene copies.
Three genes of the amy2 subfamily were identified on chromosome 7H in barley and on chromosomes 7A, 7B, 7D in wheat. No similar copy number extension was observed in B. distachyon, Sorghum bicolor or Oryza sativa. In maize, two amy2 genes were identified. The amy1 subfamily shows the highest level of copy number extension. Tandem duplications are present in sorghum and rice. Two to three full-length genes were identified per genome in hexaploid wheat on group 6 chromosomes and five full-length amy1 genes on chromosome 6H and unanchored scaffolds in barley. Notably four of these barley genes share 99.8-100% sequence identity on protein and nucleotide level, indicating very recent duplication events. T. aestivum, Triticum aestivum; Z. mays, Zea mays. c, Expression of the SWEET11 gene subfamily in the developing barley grains. Left, expression profiles of SWEET11a and SWEET11b as determined by quantitative realtime PCR (qPCR) on total RNA isolated from micro-dissected developing grains. Right, localization of SWEET11a and SWEET11b expression in cross-section of immature seeds by RNA in situ hybridization. Hybridizations with sense probes are shown as negative controls in Extended Data Fig. 7a. Scale bars, 100 μm.    We next explored patterns of linkage disequilibrium across the entire genome. As expected for two highly inbred and elite crop gene pools, we observed extensive linkage disequilibrium on all chromosomes in both spring and winter barleys (Extended Data Fig. 8). The number of discrete haplotype blocks in this germplasm set varied from 86 to 161 per chromosome (Extended Data Fig. 8). Surprisingly, the two-row spring gene pool, generally considered to be narrowest owing to intense selection for malting quality, exhibited a greater number of haplotype blocks than the winter lines for most chromosomes.

Discussion
To assemble a highly contiguous reference genome sequence for barley, we combined hierarchical shotgun sequencing, a strategy previously used for assembling large and complex plant genomes 33,47 , with novel technologies such as optical mapping 18 and chromosome-scale scaffolding with Hi-C 21 . The latter technology was key to resolving the linear order of sequence scaffolds in pericentromeric regions. We anticipate the adoption of Hi-C-based genome mapping in other Triticeae species, such as bread and durum wheat and their wild relatives. Now that the quality of whole-genome shotgun assemblies is on a par with map-based assemblies 48,49 , we believe that the barley genome project will be one of the last such efforts to follow the laborious BAC-by-BAC approach.
The barley reference genome sequence constitutes an important community resource for cereal genetics and genomics. It will facilitate positional cloning, provide a better contextualization of population genomic datasets and enable comparative genomic analysis with other Triticeae in non-recombining regions that have been inaccessible to analysis of gene collinearity until now. The exciting methodological advances in sequence assembly and genome mapping have enabled even large and repeat-rich genomes to be unlocked 48,50 and hold the promise of constructing reference-quality genome sequences, not only for a single cultivar, but also for representatives of major germplasm groups.
Online Content Methods, along with any additional Extended Data display items and Source Data, are available in the online version of the paper; references unique to these sections appear only in the online paper.  Supplementary Information is available in the online version of the paper.

METhOdS
No statistical methods were used to predetermine sample size. The experiments were not randomized. The investigators were not blinded to allocation during experiments and outcome assessment. Sequencing and assembly of individual BAC clones. Barley genome sequencing relied exclusively on shotgun sequencing of 88,731 BAC clones using high-throughput next-generation sequencing-by-synthesis 22 . This comprised 15,661 so-called gene-bearing BAC clones, preselected mainly by overgo-probe hybridization for the presence of transcribed genes and fingerprinted for definition of a minimum tiling path of the barley gene space. These gene-space minimum tiling path BAC clones were sequenced as combinatorial pools by Illumina shortread technology and, after quality trimming of de-convoluted reads, were assembled using Velvet version 1.2.09 as previously described 13 . The remaining 73,070 BACs were selected from a minimum tiling path representing the physical map of the barley genome 16 . Minimum tiling path BAC clones assigned to different barley chromosomes were sequenced at one of four sequencing centres, relying on highly multiplexed paired-end and mate-pair sequencing libraries using either the Roche 454 Titanium or the Illumina MiSeq, HiSeq2000 and HiSeq2500 platforms ( Supplementary Note 1 and ref. 51). In brief, sequencing reads were de-convoluted on the basis of the used BAC-specific barcode sequence tags and assembled with sequencing centre-specific assembly pipelines. BAC clones sequenced on the Roche 454 Titanium platform were assembled with MIRA 51 according to previously described procedures 52,53 . Illumina HiSeq2000 paired-end sequencing data (2 × 100 nucleotides) of BAC clones were assembled either with CLC Assembly Cell version 4.0.6 beta (http://www.clcbio.com/products/clc-assembly-cell/) set to default parameters 12 , SOAPdenovo version 2.01 (ref. 54) or the ABySS assembler (version 1.5.1) 55 . Sequence contigs of the de novo BAC assemblies larger than 500 base pairs (bp) were scaffolded using mate-pair sequencing information either generated from BAC DNA-derived 8 kbp insert mate-pair sequencing libraries or from 2 kbp, 5 kbp or 10 kbp genomic DNA-derived mate-pair libraries. This was achieved by either using BWA mem version 0.7.4 (ref. 56) with default parameters for read mapping, followed by scaffolding individual BACs using SSPACE version 3.0 Standard 57 , or with SOAPaligner/soap2 version 2.21 and using SOAPdenovo 54 scaffolder version 2.01. Genome-wide three-dimensional chromatin conformation capture sequencing. To generate physical scaffolding information for the BAC sequence based genome assembly, as proposed in ref. 21, Hi-C and tethered conformation capture (TCC) sequencing data were generated from 7-day-old leaf tissue of greenhouse-grown barley plantlets by adapting previously published procedures (Supplementary Note 2). In brief, for Hi-C, freshly harvested leaves were cut into 2 cm pieces and vacuum infiltrated in nuclei isolation buffer supplemented with 2% formaldehyde. Crosslinking was stopped by adding glycine and additional vacuum infiltration. Fixed tissue was frozen in liquid nitrogen and ground to powder before re-suspending in nuclei isolation buffer to obtain a suspension of nuclei. About 10 7 purified nuclei were digested with 400 units of HindIII as described previously 58 . Digested chromatin was marked by incubating with biotin-14-dCTP and Klenow enzyme using a fill-in reaction 20 resulting in blunt-ended repaired DNA strands. Biotin-14-dCTP from non-ligated DNA ends was removed owing to the exonuclease activity of T4 DNA polymerase, followed by phenolchloroform extraction and washing of the precipitated DNA as described 20 . As an alternative to Hi-C, the TCC protocol was also adapted for barley. Nuclei were prepared from barley leaf tissue as described above for Hi-C, before biotinylating the isolated chromatin using EZlink Iodoacetyl-PEG2-Biotin. The samples were neutralized with SDS, and DNA was digested with HindIII, dialysed, followed by immobilization to low surface coverage using streptavidincoated magnetic beads 19 . Open DNA ends were labelled with biotin-14-dCTP using Klenow enzyme, and blunt-ended, labelled DNA products were collected from the magnetic beads by reversing the formaldehyde crosslink using proteinase K 19 . Biotin-14-dCTP from non-ligated DNA ends was removed by using Exonuclease III 19 . Hi-C and TCC products were mechanically sheared to fragment sizes of 200-300 bp by applying ultrasound using a Covaris S220 device followed by size-fractionation using AMPure XP beads. DNA fragments in the range between 150 and 300 bp were blunt-end repaired and A-tailed before purification through biotin-streptavidin-mediated pull-down 58 . Illumina pairedend adapters were ligated to the Hi-C and TCC products, respectively, followed by PCR amplification, pooling of PCR products and purification with AMPure XP beads before quantification of Hi-C/TCC libraries by qPCR for Illumina HiSeq2500 PE100 sequencing 20 . Nanochannel-based genome mapping. Long-range scaffolding of genome sequence assemblies was facilitated by BioNano genome maps generated by nanochannel electrophoresis of fluorescently labelled high-molecular mass DNA obtained from flow-sorted chromosomes 59 . High-molecular mass DNA was prepared from 3.5 × 10 6 purified chromosomes (whole genome) of barley cultivar Morex essentially following published procedures 60,61 . The purified chromosomes were embedded in agarose miniplugs to achieve approximate concentrations of 1 million chromosomes per 40 μl volume before being treated with proteinase K as described previously 61 . DNA was labelled at Nt.BspQI nicking sites (GCTCTTC) by incorporation of fluorescent-dUTP nucleotide analogues using Taq polymerase as described previously 59 . The labelled DNA was analysed on the Irys platform (BioNano Genomics) in 191 cycles in total, generating 243 Gb of data exceeding 150 kb. On the basis of the label positions on single DNA molecules, de novo assembly was performed by a pairwise comparison of all single molecules and graph building 62 . The parameter set for large genomes was used for assembly with the IrysView software. A P value threshold of 10 −9 was used during the pairwise assembly, 10 −10 for extension and refinement steps and 10 −14 for merging contigs. A whole-genome map of 4.3 Gb was obtained (Extended Data Table 1). Data integration for constructing pseudomolecules. The construction of pseudo molecules representing the seven barley chromosomes followed an iterative, mainly automated procedure which involved the integration of the following major datasets: (1) sequence assemblies of 87,075 unique, successfully sequenced and assembled BAC clones; (2) BAC assembly information from a genome-wide physical map of barley 16 ; (3) 571,814 end-sequences of BAC clones 7 ; (4) a dense linkage map assigning genetic positions to 791,177 contigs of a wholegenome shotgun assembly of barley cultivar Morex 17 ; (5) Hi-C/TCC sequence information; and (6) the optical map of the genome of barley cultivar Morex. A schematic outline of the procedure is presented elsewhere 22 . In the first step, overlaps between individual BAC assemblies were searched with Megablast 63 by either applying 'stringent' or 'permissive' alignment criteria 22 and by combining with the high density genetic map information. On the basis of this initial analysis, a BAC overlap graph was constructed by use of the R package igraph 64 considering the above-listed additional datasets in subsequent iterative steps. Building the overlap graph focused first on overlaps obtained under 'stringent' search criteria for BACs within individual physical map contigs (FP contigs) and then subsequently also between independent FP contigs. Subsequently, overlaps obtained under 'permissive' criteria were evaluated while checking for cumulative evidences provided by the additional datasets supporting the overlap information 22 . Ordering and orienting of the resultant sequence scaffolds were achieved by integrating the overlap graph with Hi-C /TCC data 22 . Before the construction of pseudomolecules, we (1) identified genes incomplete or missing in the non-redundant sequence, but represented by (a) BAC sequence that had been excluded from the construction of the non-redundant sequence, or by (b) Morex WGS contigs, and (2) performed a final scan for contaminant sequences. Then a single FASTA file containing a single entry for each barley chromosome (a 'pseudomolecule') and an additional entry combining all sequences not anchored to chromosomes was constructed 22 . Three-dimensional chromatin conformation analysis. Mapping of Hi-C/TCC reads and assignment to restriction fragments were performed as described elsewhere 22 . Briefly, raw reads were trimmed with cutadapt 65 . Trimmed Hi-C reads were mapped to the barley pseudomolecule sequence with BWA mem (version 0.7.12) 66 . Duplicate removal and sorting were performed with NovoSort (http:// www.novocraft.com/products/novosort/). Mapped reads were assigned to restriction fragments with BEDtools 67 , tabulated with custom AWK scripts and imported into R (https://www.r-project.org/). Raw counts of Hi-C links were aggregated in 1 Mb bins and normalized separately for intra-and interchromosomal contacts using HiCNorm 68 . Contact probability matrices were plotted using standard R functions 69 . Principal component analysis was performed with the R function prcomp() on the matrix of log-transformed normalized Hi-C link counts between 1 Mb fragments.
We fitted the linear model log 10 (nl) ~ log 10 (dist) + abs(cen_dist1 -cen_dist2) + arm1:arm2 + apos1:apos1 using the R function lm(). Here, nl is the normalized link count between two 1 Mb bins, dist is their distance in the linear genome, cen_dist1 and cen_dist2 are the relative distances from the centromere of both loci, arm1 and arm2 are the chromosome arm assignment of both loci, and apos1 and apos2 are the relative distances of both loci from the ends of the chromosome arm (that is, apos1 is close to zero if locus 1 is either near the centromere or the telomere, and close to one if locus 1 resides in interstitial regions). TCC reads of Morex × Barke F 1 hybrids were mapped to a synthetic reference representing the parental genomes. An in silico Barke assembly was created by inserting SNPs discovered by aligning Barke WGS reads to the Morex reference assembly with BWA MEM 66 and calling variants with SAMtools 70 . SNPs were then inserted into the Morex reference using the FastaAlternateReferenceMaker of GATK 71 . TCC reads of the hybrid were then mapped to the synthetic reference as described above. Only uniquely alignable read pairs were considered. Hi-C link counts were tabulated at the level of chromosomes.