Chromosome-scale scaffolding of the black raspberry (Rubus occidentalis L.) genome based on chromatin interaction data

Jibran, Rubina; Dzierzon, Helge; Bassil, Nahla; Bushakra, Jill M.; Edger, Patrick P.; Sullivan, Shawn; Finn, Chad E.; Dossett, Michael; Vining, Kelly J.; VanBuren, Robert; Mockler, Todd C.; Liachko, Ivan; Davies, Kevin M.; Foster, Toshi M.; Chagné, David

doi:10.1038/s41438-017-0013-y

Download PDF

Article
Open access
Published: 07 February 2018

Chromosome-scale scaffolding of the black raspberry (Rubus occidentalis L.) genome based on chromatin interaction data

Rubina Jibran¹,
Helge Dzierzon¹,
Nahla Bassil²,
Jill M. Bushakra²,
Patrick P. Edger³,
Shawn Sullivan⁴,
Chad E. Finn⁵,
Michael Dossett⁶,
Kelly J. Vining⁶,
Robert VanBuren ORCID: orcid.org/0000-0003-2133-2760⁷,
Todd C. Mockler ORCID: orcid.org/0000-0002-0462-5775⁸,
Ivan Liachko^4,9,
Kevin M. Davies¹⁰,
Toshi M. Foster¹⁰ &
…
David Chagné¹⁰

Horticulture Research volume 5, Article number: 8 (2018) Cite this article

4595 Accesses
39 Citations
3 Altmetric
Metrics details

Subjects

Abstract

Black raspberry (Rubus occidentalis L.) is a niche fruit crop valued for its flavor and potential health benefits. The improvement of fruit and cane characteristics via molecular breeding technologies has been hindered by the lack of a high-quality reference genome. The recently released draft genome for black raspberry (ORUS 4115-3) lacks assembly of scaffolds to chromosome scale. We used high-throughput chromatin conformation capture (Hi-C) and Proximity-Guided Assembly (PGA) to cluster and order 9650 out of 11,936 contigs of this draft genome assembly into seven pseudo-chromosomes. The seven pseudo-chromosomes cover ~97.2% of the total contig length (~223.8 Mb). Locating existing genetic markers on the physical map resolved multiple discrepancies in marker order on the genetic map. Centromeric regions were inferred from recombination frequencies of genetic markers, alignment of 303 bp centromeric sequence with the PGA, and heat map showing the physical contact matrix over the entire genome. We demonstrate a high degree of synteny between each of the seven chromosomes of black raspberry and a high-quality reference genome for strawberry (Fragaria vesca L.) assembled using only PacBio long-read sequences. We conclude that PGA is a cost-effective and rapid method of generating chromosome-scale assemblies from Illumina short-read sequencing data.

High-quality reference genome and annotation aids understanding of berry development for evergreen blueberry (Vaccinium darrowii)

Article Open access 01 November 2021

A draft chromosome-scale genome assembly of a commercial sugarcane

Article Open access 28 November 2022

Chromosome-level genome assembly of the Japanese sawyer beetle Monochamus alternatus

Article Open access 13 February 2024

Introduction

Despite recent advances in DNA sequencing technologies and computational approaches, the de novo assembly of high-quality reference genomes in plant species using solely short-read sequencing data remains difficult. One of the biggest challenges is scaffolding contigs into chromosome-scale sequences, as construction of short gun libraries with large inserts (>15 kb) is extremely difficult and de novo assembly algorithms tend to not perform well across repetitive sequences. Chromosome-scale reference genomes have only been obtained for a few selected horticultural species including strawberry¹, peach², and apple³ largely with the aid of available genetic maps to anchor scaffolds.

Several methods can be used for scaffolding. Bacterial artificial chromosome (BAC)-end sequencing is one approach; however, it is expensive, tedious and time consuming. High-density genetic maps can be used for scaffolding; however, it is often difficult to obtain sufficient numbers of markers to anchor and orientate small contigs and the mapping algorithms used to build genetic maps can sometimes place markers at incorrect locations⁴. Furthermore, there is not a direct relationship between centiMorgan (cM) on linkage maps, which is a measure of recombination frequency, and physical distance expressed as megabases (Mb) of sequence data.

De novo scaffolding has been demonstrated in both haploid and diploid organisms using chromosome conformation capture (3C) followed by Hi-C analysis^5,6. Hi-C is a novel strategy combining capture of chromatin interaction within the nucleus and next-generation sequencing (NGS). New bioinformatics methods such as proximity-guided assembly (PGA) can be used for developing near-complete pseudo-chromosome assemblies of complex genomes using Hi-C data. Hi-C relies on the folding of chromosomes inside the nucleus so that fragments sequenced from the same chromosome are in close three-dimensional proximity, whereas fragments from different chromosomes are more distant. The method involves an initial chemical cross-linking of chromatin to capture chromosome fragments that are in close physical proximity, digestion of the captured fragments with restriction enzymes to create cuts close to the fixed area, ligating the cut ends and then sequencing the junctions using NGS⁷. A major benefit of this method is that the cross-linking process is random and independent of DNA sequence. The probability of finding fragments that were in close proximity on the same chromosome within paired-end NGS reads decreases with physical distance and decreases even more drastically between fragments from different chromosomes. This method has been used in prokaryotes and some eukaryotes, including humans, to dramatically improve genome assemblies, infer the location of previously unanchored contigs, and create chromosome length haplotypes^5,8,9.

Because of their economic importance, several members of the Rosaceae family have fully or partially assembled genomes. The genomes of apple (Malus × domestica Borkh.), peach (Prunus persica [L.] Batsch) and diploid strawberry (Fragaria vesca L.) have been assembled as pseudo-chromosomes^1,3,10,11. Asian and European pear (Pyrus pyrifolia [Burm.] Nak. and P. communis L., respectively), Chinese plum (Prunus mume Siebold & Zucc.), and China rose (Rosa chinensis Jacq.) all have partially anchored draft genomes^12,13,14,15. Diploid black raspberry (Rubus occidentalis L.) has been recently sequenced and assembled into draft genome spanning 82% of the genome¹⁶. There are 11,936 black raspberry scaffolds and the scaffold N50 length is 49,488 bp. We performed Hi-C sequencing of the same genotype (ORUS 4115-3) sequenced by VanBuren et al.¹⁶ to order these scaffolds into pseudo-chromosomes.

Methods

Genome sequence

The assembled genome of ORUS 4115-3 (Rubus_occidentalis_v1.0.a1) was downloaded from the Genomic Database for Rosaceae (www.rosaceae.org)¹⁶. ORUS 4115-3 was a selection made in Corvallis, OR (USDA) from a population collected near Rich Mountain, SC (USA) and is clonally maintained by the USDA-ARS, National Clonal Germplasm Repository as PI 672644 (Germplasm Resources Information Network, 2016)¹⁵. The input assembly has 11,936 contigs of a total length of 230,199,469 bp and an N50 of 49,488 bp.

Proximity-guided assembly

Hi-C scaffolding was performed by Phase Genomics Ltd. (Seattle, WA, USA) Proximo Hi-C genome scaffolding service. A Hi-C library was prepared from 0.2 g of fresh R. occidentalis ORUS 4115-3 leaf tissue¹⁷. The Hi-C library was sequenced on the Illumina NextSeq platform (Illumina, La Jolla, CA, USA), generating 54.4 million Hi-C read pairs, which were provided as input to the Proximo Hi-C scaffolding pipeline. The Hi-C reads were aligned to the draft assembly using the “bwa aln” algorithm¹⁸ with the “-n 0” option, requiring reads to map without any mismatches, as has been suggested¹⁹. After filtering erroneous mappings (mapQ = 0) on the alignments, 16,521,894 read pairs were discarded because they mapped to the same contig and were therefore uninformative for scaffolding purposes. In total 5,220,846 read pairs remained where one read mapped to one contig and the other mapped to another contig, and were used in the scaffolding pipeline, an average of 437 informative Hi-C read pairs. Only read pairs in which read 1 (R1) and read 2 (R2) aligned to different contigs are useful for Hi-C scaffolding, because these pairs gives information about the structure of the genome. Read pairs that either mapped to the same contig or did not map cleanly enough were discarded in clustering analysis.

The Proximo Hi-C scaffolding pipeline performed chromosome assignment and scaffold ordering and orientation on the contigs using the Hi-C data as described previously^5,6. Proximo includes an enhanced version of the LACHESIS algorithm¹⁹ (http://shendurelab.github.io/LACHESIS/), along with a scaffold optimization algorithm, extra reports and QC steps. This method places contigs into chromosome groups based on the strength of their Hi-C interactions, and then arranges and orientates them into a linear order such that the amount of Hi-C interactions among adjacent and nearby contigs is maximized. The results of this process were optimized over the course of 40,000 independent scaffolding attempts, to maximize the number of Hi-C links within scaffolds and the log-likelihood of the chosen placement and orientation of a contig having generated the observed Hi-C data relative to alternatives.

The post-clustering heat maps constructed made by the Proximo Hi-C pipeline as one of the report steps. Once scaffolding was complete, the contigs were placed into a linear order by scaffold, and then by position on the scaffold. These x-/y-coordinates were used in the heat map. To develop the “by_length” heat map, larger contigs were given some extra space in the x–y plane by allowing multiple coordinates in a row. The Hi-C link density between each pair of contigs was recorded, using the x-/y-coordinates we identified above to order them. “Link density” is the number of Hi-C links between a given pair of contigs normalized for the number of restriction sites in those contigs. The link density was plotted across 3000 bins for each pseudomolecule with bin size varying by pseudomolecule length. The heat map was plotted with an R script that uses the link density and scaffold boundary data. This script is the same one published with LACHESIS, with minor modifications that do not affect the analysis or primary figure content (e.g., programmatic interface tweaks, slightly different figure title).

Rubus occidentalis gene models¹⁶ were mapped to the Rubus_occidentalis_v1.1 assembly by using the default pipeline FLO (https://github.com/wurmlab/flo), which is based on the University of California, Santa Cruz (UCSC)-liftOver Toolkit²⁰.

Physical ordering of genetic markers and comparison of the PGA black raspberry assembly with parental genetic maps

The genotyping by sequencing (GBS)-based single nucleotide polymorphism (SNP) markers (Supplementary Tables S2 and S3)¹⁶ and centromere sequences were mapped and aligned to the reference genome assembly of ORUS 4115-3¹⁶ using the Genomic Mapping and Alignment Program (GMAP) with default parameters (gmap/2016-06-09)²¹. In total, 244 (for ORUS 4153–1) and 224 (for ORUS 3021–2) SNP markers, evenly distributed across the 7 chromosomes, were chosen for the comparison of the parental genetic maps and the PGA guided assembly genome (Supplementary Tables S2 and S3). We chose to focus on GBS SNP-based markers so we could apply a more systematic method to retrieve the markers within the assembly contigs based on their flanking sequences. The selected markers from each linkage group were aligned against the 7 pseudo-chromosomes of Rubus_occidentalis_v1.0.pga using GMAP (Supplementary Figures S1–7).

Trans:cis quantitation

In order to examine the distribution of local (cis) and between chromosomes interactions (trans) for each pseudo-chromosome in the genome, the Hi-C data were processed using the HiCUP mapping program (www.bioinformatics.babraham.ac.uk/projects/hicup/).

The HiCUP Binary Alignment Map (BAM) file along with a file of restriction enzyme Sau3AI restriction fragments was imported into SeqMonk https://www.bioinformatics.babraham.ac.uk/projects/seqmonk/. A suitable set of probes was generated using Sau3AI restriction fragments. Fifty consecutive Sau3AI probes were grouped together to identify significant interactions (BoxWhisker filter) in the larger genomic regions. The quantitation was adjusted to subtract the median trans percentage from each chromosome overall value (https://www.bioinformatics.babraham.ac.uk/projects/seqmonk/), to identify the location variation in trans percentage and to view the trans:cis ratio in a more reliable way. SeqMonk was used to calculate the trans:cis ratio of the Hi-C data. This method counts the numbers of ends involved in a cis or a trans interaction in each probe and then quantifies the probes with percentage of trans reads. The Hi-C probes were generated around Sau3AI digestion fragments. The raw counts of trans:cis ratios can be misleading as the probes located on small chromosome will have more trans interaction and the probes of a longer chromosome will have larger cis reads. The randomly ligated fragments will be picked up as trans interactions and the percentage of these interactions is directly proportional to chromosome length. Therefore the median trans percentage values were extracted from overall values.

Comparative genomics

Syntenic dot plots were generated with SynMap²² using LAST for whole-genome comparisons. A minimum of five synteny genes with a maximum distance of 50 kb between adjacent genes was used to seed a syntenic block. Syntenic gene pairs were colored based on their K_s values in order to differentiate orthologous and out-paralog syntenic blocks. Circularized syntenic plots were constructed using Circos²³ with each connecting line representing orthologous gene sequences.

Results

Proximity-guided assembly of R. occidentalis (ORUS 4115-3) genome

In total, 52.2 million Hi-C read pairs were aligned to the 11,936 R. occidentalis contigs spanning 230,199,469 bp (97.3% of total sequence length). By clustering, ordering, and orienting groups of contigs that were likely to derive from a common chromosome, seven pseudo-chromosomes were defined (Fig. 1a). Of the clustered 9813 contigs, 9650 were ordered and oriented, spanning a total of 223,838,710 bp (97.2% of total sequence length) (Fig. 1b, c). The shortest and longest assembled pseudo-molecules were chromosomes 1 and 6 with 25,907,393 and 39,497,411 bp, respectively. Half of the scaffolded pseudo-chromosomes were longer than 31,759,000 bp, representing a dramatic increase in the N50 49,488 bp obtained in the original assembly, prior to PGA¹⁶. An additional 163 contigs were clustered into a 205,944 bp scaffold. However they could not be placed into that scaffold’s linear order with sufficient confidence to warrant their inclusion, likely due to the small size of those contigs (average size: 1263 bp). Our new R. occidentalis assembled genome of ORUS 4115-3 obtained by PGA is now referred to as Rubus_occidentalis_v1.1 and is available at the Genome Database for Rosaceae (https://www.rosaceae.org/24).

**Fig. 1: Chromatin conformation capture-based improved assembly of *Rubus occidentalis* genome of ORUS 4115-3 (Rubus_occidentalis_v1.0.pga).**

We aligned the predicted Rubus gene annotations¹⁶ to the new Rubus_occidentalis_v1.1 genome by using the FLO pipeline with default parameters. Out of the 28,005 protein-coding genes, 27,541 (98.3%) were mapped successfully. The structural information of the aligned mRNA sequences is described in Supplementary Table S1A–G. Chromosomes 1 to 7 have 3101, 4094, 4134, 3677, 3872, 5167, and 3496 protein-coding genes, respectively.

Comparison of the PGA black raspberry assembly with parental genetic maps

We performed a comparison between the order of markers in the parental linkage maps of ORUS 4153–1 x ORUS 3021–2 (Supplementary Tables S2 and S3)¹⁶ and the PGA guided genome assembly of Rubus_occidentalis_v1.1 to identify any conflicts arising from a shift in the genetic map position because of inaccuracies in recombination frequencies. The physical positions of 90% of ORUS 4153–1 and 78% of ORUS 3021–2 genetic markers were correctly identified in the Rubus_occidentalis_v1.0.pga genome. When the genetic map positions of the markers were compared with their physical orders (Fig. 2; Supplementary Figures S1–S8), the physical and genetic orders of ~85% of the markers were found to be consistent in both parents. However, there was inconsistency in the orders of 32 (13%) and 27 (12%) markers between the physical order and the ORUS 4153–1 and ORUS 3021–2 genetic maps, respectively (Table 1).

**Fig. 2: Comparison between the black raspberry genome assembly (Rubus_occidentalis_v1.0.pga) and genetic map of female parent ORUS 4153-115-3.**

Table 1 Descriptive statistics of comparison between ORUS 4153–1 and ORUS 3021–2 genetic and physical maps

Full size table

Location of centromeres in the black raspberry genome

The likely locations of centromeres in the black raspberry genome were inferred using three independent methods, and the results were largely consistent. First, the probability of crossovers in different regions of the genome was analyzed by comparing physical distances in million base pairs (Mb) with the genetic distances of the markers (Fig. 3a). The average physical distance ranged from 2 to 4 cM/Mb, which varied with position along and between the chromosomes. Eight regions of suppressed genetic recombination that are likely to be associated with the repetitive regions of the genome and centromeres were identified on the seven chromosomes (Fig. 3a). For example, clear reduced recombination frequencies were detected on chromosomes 1 and 5 at positions 10.9 and 7.5 Mb, respectively. On chromosome 7 the recombination frequency gradually decreased towards the end of the chromosome. Secondly, centromeric sequences were analyzed in the Rubus_occidentalis_v1.0.pga assembly, using the 303 bp conserved region described by VanBuren et al.¹⁶ (Supplementary Figure S9).

**Fig. 3: Identification of centromeres in the Rubus_occidentalis_v1.0.pga genome assembly.**

This conserved sequence was identified in all seven pseudo-chromosomes using GMAP. Finally, the Hi-C interaction data were used to validate the candidate positions of centromeres, as these regions are expected to display reduced intrachromosomal interactions in the Hi-C heat map (Fig. 3c). Figure 3a shows that such regions in the Hi-C heat map coincide with the candidate centromere locations identified by GMAP alignments and the recombination distribution patterns. For example, in chromosome 7 the reduced recombination frequency and the physical centromere location (identified by aligning 303 bp centromere sequence with the Hi-C assembly) were observed around position 25 Mb. However, some discrepancies were observed. For example the crossover events for chromosome 1 show two distinct dips at positions 10.9 and 22 Mb. Similarly, the recombination suppression is observed at position 7.5 Mb in chromosome 5, whereas the identified centromere location is around 30 Mb.

Chromosomes can be classified into one of six types based on the ratio of the long arm to the short arm²⁴. Chromosomes 1, 5, 6, and 7 of black raspberry were classified as chromosomes with terminal (t) centromeres, chromosome 2 was classified as chromosome with submedian centromere (sm), and chromosomes 3 and 4 were classified as chromosomes with subterminal (st) centromeres (Fig. 3c).

Interchromosomal and intrachromosomal interactions

Trans:cis ratios of Hi-C reads can be useful in determining genome regions with potential true trans and cis interactions. The distribution of corrected trans percentage across the genome calculated using SeqMonk (Fig. 4) clearly shows genomic regions that have either high or low trans percentage, indicating either distant or local interactions, respectively. The trans:cis ratio decreases when a genomic region strongly interacts with itself (represented by blue bars) and the ratio increases (represented by red bars) for long-range interchromosomal interactions. The information on the placement of these chromatin domains provides an insight into their general behaviors in the genome.

**Fig. 4: Distribution of *trans* percentages across the black raspberry genome.**

Comparison of the PGA black raspberry assembly with Fragaria vesca

Alignment of our new black raspberry genome with the high-quality diploid strawberry genome (F. vesca V4)²⁵ showed that there is a high degree of synteny between these genomes. Figure 5 depicts their positional relationships with circularly arranged ideograms and illustrates that there are only a few potential small transversions between the genomes of black raspberry and strawberry.

**Fig. 5: Circos plot showing macro-synteny between the black raspberry and strawberry genomes.**

Discussion

We have assembled the genome of black raspberry ORUS 4115-3¹⁶ to the chromosome scale using the Hi-C technique followed by PGA. We have confirmed and resolved the positons of genetic markers and identified centromeres on each of the black raspberry chromosomes. The distribution of trans:cis ratios over the genome identify regions of local and interchromosomal interactions. The high degree of collinearity between the new black raspberry assembly and the strawberry genome validates the quality of our assembly on both the scaffold and chromosome scale²⁵.

The Hi-C chromatin conformation capture technology has both improved our understanding of genome packaging and facilitated the assembly of physical maps. An increasing number of studies have used the chromosome conformation capture technique to achieve high resolution chromosome-scale assemblies^19,26,27. Here we have clearly demonstrated that Hi-C analysis has vastly improved the black raspberry genome assembly of ORUS 4115-3, yielding a N50 contig size for the proximity-guided assembly of 31,759,000 bp versus the N50 scaffold size of 48,488 bp for the previously assembled genome of VanBuren et al.¹⁶ Furthermore, our Rubus_occidentalis_v1 assembly showed a high degree of alignment with genetic maps based on GBS data, demonstrating the capability of Hi-C to correctly organize scaffolds into pseudo-chromosomes.

Despite some errors in the order of GBS markers, the majority of the genetic markers tested in this study were collinear with the physical map obtained using Hi-C. The misplaced markers could be due to the observed variable recombination rates across the genome, which often depends on GC content as well as the relative positions of markers compared to centromeres and telomeres²⁸. It has been observed previously that eukaryotic genomes have regions of high and low frequency of recombination, making it difficult to establish a direct correlation between physical and genetic maps²⁹. For example, Chen et al.³⁰, found supressed recombination in centromeric regions and on the short arms of chromosomes 4 and 10 in rice genome.

Consistent with this work, many studies have reported inconsistencies in marker’s orders of genetic and physical maps. For example, Dewan et al.³¹, had identified that ~11% of genetic markers from chromosomes 3 and 21 in human genome were inconsistent with the physical map. Similarly, a comparison between a soybean sequence-based physical map and a genetic map, built from a cross between G. max and G. soja, revealed huge discrepancies in the markers orders of 26 genomic regions³². These anomalies were found to be due to the errors in genome assembly³². Furthermore, the evidence that genetic markers are often located at different physical position is highlighted by Bustamante et al.³³ in barley. In this study, the authors investigated the positions and orders of genetically identified markers of chromosome 3H with a physical map, derived from fingerprinted BAC contigs. Their study showed that long genetic distances at subterminal chromosomal regions were translated into short physical distances indicating the presence of hot spots for recombination at distal regions of chromosome 3H.

Each eukaryotic chromosome has a centromere composed of highly repeated DNA sequences involved in the control of chromosome separation during meiosis. Genetic loci located near centromeres are difficult to map, due to supressed recombination frequencies at or near centromeres. Presumably, this prevents chromosome breakage, chromosome loss, and disruption in pericentric sister chromatid cohesion³⁴. The complex structures together with the difficulty in distinguishing centromere repeats from the flanking repeats make the identification of centromere locations a daunting task^35,36. For this reason, we used a combination of genetic maps, sequence search and chromosome interaction patterns derived from the Hi-C data to identify the raspberry centromere locations. We found that the centromere positions predicted by low recombination frequencies corresponded closely with those predicted by mapping the 303 bp black raspberry centromere sequence to the seven pseudo-chromosomes of the genome assembly. It should be noted that the 303 bp centromere sequence mapped to two separate locations, 4.831 and 12.686 Mb on chromosome 6, which could have been be due to errors in the genome assembly. To identify the correct position of the chromosome 6 centromere we investigated the flanking regions around the two mapped locations and compared them with the flanking sequences of the other six centromeric positions identified. This comparison suggests that the correct position of the centromere of chromosome 6 is 4.83 Mb rather than 12.686 Mb. Furthermore, chromosome 1 may has a cold spots for recombination at 22 Mb.

We also exploited the power of Hi-C analysis to identify genomic regions involved in local intrachromosomal and long-range interchromosomal interactions (Fig. 4). This information could be of significant value for gene expression studies in the future, because interchromosomal interactions can facilitate the identification of regulatory elements such as promoters and enhancers that interact with each other during chromatin folding to regulate gene expression³⁷. The number of interactions occurring between genomic regions of same chromosome (cis interactions) is higher that the numbers occurring between the elements located on different chromosomes (trans interactions). Interestingly, Hi-C heat map shows low intensity and discontinuous signals along the diagonal of chromosome 6 (Fig. 1a). This may represent the hetro-chromatin regions actively involved in trans interactions rather than cis interactions (also indicated by red bars in Fig. 4).

Conclusion

We present a high-quality reference genome for black raspberry with significantly longer scaffolds than the previously published version¹⁶. Our PGA-based Rubus genome will provide assistance in improving genetic maps for both diploid and polyploid species of the Rosoideae. This will accelerate the identification of DNA markers linked to desired traits thus facilitating the cost-effective and efficient development of new cultivars with these traits. Furthermore we have located the positions of each of the centromeres and mapped chromatin domains with cis and trans interactions, providing new knowledge that will have significant importance in developing an understanding of genome organization in black raspberry.

Our results provide a significant step towards improved solutions for map-based cloning of genes controlling crucial traits such as pest and disease resistance, plant architecture, and fruit form and quality.

References

Shulaev, V. et al. The genome of woodland strawberry (Fragaria vesca). Nat. Genet 43, 109–116 (2011).
Article CAS PubMed Google Scholar
Verde, I. et al. The high-quality draft genome of peach (Prunus persica) identifies unique patterns of genetic diversity, domestication and genome evolution. Nat. Genet. 45, 487–494 (2013).
Article CAS PubMed Google Scholar
Daccord, N. et al. High-quality de novo assembly of the apple genome and methylome dynamics of early fruit development. Nat. Genet. 49, 1099–1106 (2017).
Article CAS PubMed Google Scholar
Tang, H. et al. ALLMAPS: robust scaffold ordering based on multiple maps. Genome Biol. 16, 3 (2015).
Article CAS PubMed PubMed Central Google Scholar
Bickhart, D. M. et al. Single-molecule sequencing and chromatin conformation capture enable de novo reference assembly of the domestic goat genome. Nat. Genet. 49, 643–650 (2017).
Article CAS PubMed Google Scholar
Peichel C. L., Sullivan S. T., Liachko I., & White M. A. Improvement of the threespine stickleback (Gasterosteus aculeatus) genome using a Hi-C-based proximity-guided assembly method. bioRxiv 068528 (2016). J Hered. 108, 693–700 (2017).
Belton, J.-M. et al. Hi-C: a comprehensive technique to capture the conformation of genomes. Methods 58, 268–276 (2012).
Article CAS PubMed Google Scholar
Marie-Nelly, H. et al. High-quality genome (re) assembly using chromosomal contact data. Nat. Commun. 5, 5695 (2014).
Article CAS PubMed PubMed Central Google Scholar
Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289–293 (2009).
Article CAS PubMed PubMed Central Google Scholar
Velasco, R. et al. The genome of the domesticated apple (Malus×domestica Borkh.). Nat. Genet. 42, 833–839 (2010).
Article CAS PubMed Google Scholar
Verde, I. et al. The high-quality draft genome of peach (Prunus persica) identifies unique patterns of genetic diversity, domestication and genome evolution. Nat. Genet 45, 487–494 (2013).
Article CAS PubMed Google Scholar
Foucher, F. et al. Towards the rose genome sequence and its use in research and breeding 167–175 (International Society for Horticultural Science (ISHS), Leuven, Belgium, 2015).
Chagné, D. et al. The draft genome sequence of European Pear (Pyrus communis L. ‘Bartlett’). PLoS ONE 9, e92644 (2014).
Article PubMed PubMed Central Google Scholar
Wu, J. et al. The genome of the pear (Pyrus bretschneideri Rehd.). Genome Res. 23, 396–408 (2013).
Article CAS PubMed PubMed Central Google Scholar
Zhang, Q. et al. The genome of Prunus mume. Nat. Commun. 3, 1318 (2012).
Article PubMed PubMed Central Google Scholar
VanBuren, R. et al. The genome of black raspberry (Rubus occidentalis). Plant J. 87, 535–547 (2016).
Article CAS PubMed Google Scholar
Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289–293 (2009).
Article CAS PubMed PubMed Central Google Scholar
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Article CAS PubMed PubMed Central Google Scholar
Burton, J. N. et al. Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nat. Biotechnol. 31, 1119–1125 (2013).
Article CAS PubMed PubMed Central Google Scholar
Kuhn, R. M., Haussler, D. & Kent, W. J. The UCSC genome browser and associated tools. Brief Bioinformatics 14, 144–161 (2013).
Article CAS PubMed Google Scholar
Wu, T. D. & Watanabe, C. K. GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics 21, 1859–1875 (2005).
Article CAS PubMed Google Scholar
Lyons, E., Pedersen, B., Kane, J. & Freeling, M. The value of nonmodel genomes and an example using SynMap within CoGe to dissect the hexaploidy that predates the rosids. Trop. Plant Biol. 1, 181–190 (2008).
Article CAS Google Scholar
Krzywinski, M. et al. Circos: an information aesthetic for comparative genomics. Genome Res. 19, 1639–1645 (2009).
Article CAS PubMed PubMed Central Google Scholar
Naranjo, C. A., Poggio, L. & Brandham, P. E. A practical method of chromosome classification on the basis of centromere position. Genetica 62, 51–53 (1983).
Article Google Scholar
Edger, P. P. et al. Single-molecule sequencing and optical mapping yields an improved genome of woodland strawberry (Fragaria vesca) with chromosome-scale contiguity. GigaScience (2017).
Dekker, J., Marti-Renom, M. A. & Mirny, L. A. Exploring the three-dimensional organization of genomes: interpreting chromatin interaction data. Nat. Rev. Genet. 14, 390–403 (2013).
Article CAS PubMed PubMed Central Google Scholar
Kaplan, N. & Dekker, J. High-throughput genome scaffolding from in vivo DNA interaction frequency. Nat. Biotechnol. 31, 1143–1147 (2013).
Article CAS PubMed PubMed Central Google Scholar
Yu, A. et al. Comparison of human genetic and sequence-based physical maps. Nature 409, 951–953 (2001).
Article CAS PubMed Google Scholar
Petes, T. D. Meiotic recombination hot spots and cold spots. Nat. Rev. Genet. 2, 360 (2001).
Article CAS PubMed Google Scholar
Chen, M. et al. An integrated physical and genetic map of the rice genome. Plant Cell 14, 537–545 (2002).
Article PubMed PubMed Central Google Scholar
DeWan, A. T., Parrado, A. R., Matise, T. C. & Leal, S. M. The map problem: a comparison of genetic and sequence-based physical maps. Am. J. Hum. Genet 70, 101–107 (2002).
Article CAS PubMed Google Scholar
Lee, W. K. et al. Dynamic genetic features of chromosomes revealed by comparison of soybean genetic and sequence-based physical maps. Theor. Appl. Genet 126, 1103–1119 (2013).
Article PubMed Google Scholar
Bustamante, F. O., Aliyeva-Schnorr, L., Fuchs, J., Beier, S. & Houben, A. Correlating the genetic and physical map of barley chromosome 3H revealed limitations of the FISH-based mapping of nearby single-copy probes caused by the dynamic structure of metaphase chromosomes. Cytogenet. Genome Res. 152, 90–96 (2017).
Article CAS PubMed Google Scholar
Talbert, P. B. & Henikoff, S. Centromeres convert but don’t cross. PLoS Biol. 8, e1000326 (2010).
Article PubMed PubMed Central Google Scholar
Heslop-Harrison, J. P. & Schwarzacher, T. Nucleosomes and centromeric DNA packaging. Proc. Natl Acad. Sci. USA 110, 19974–19975 (2013).
Article CAS PubMed PubMed Central Google Scholar
Henikoff, S., Ahmad, K. & Malik, H. S. The centromere paradox: stable inheritance with rapidly evolving DNA. Science 293, 1098–1102 (2001).
Article CAS PubMed Google Scholar
Spilianakis, C. G., Lalioti, M. D., Town, T., Lee, G. R. & Flavell, R. A. Interchromosomal associations between alternatively expressed loci. Nature 435, 637 (2005).
Article CAS PubMed Google Scholar
Jung, S. et al. GDR (G enome D atabase for R osaceae): integrated web resources for Rosaceae genomics and genetics research. BMC Bioinformatics 5, 130 (2004).
Article PubMed PubMed Central Google Scholar

Download references

Author information

Authors and Affiliations

The New Zealand Institute for Plant & Food Research Limited, Private Bag 11600, Palmerston North, 4474, New Zealand
Rubina Jibran & Helge Dzierzon
USDA-ARS National Clonal Germplasm Repository, 33447 Peoria Road, Corvallis, OR, 97333, USA
Nahla Bassil & Jill M. Bushakra
Department of Horticulture, Michigan State University, East Lansing, MI, 48824-2604, USA
Patrick P. Edger
Phase Genomics, Seattle, WA, 98195, USA
Shawn Sullivan & Ivan Liachko
USDA-ARS Horticultural Crops Research Unit, Corvallis, OR, 97330, USA
Chad E. Finn
B.C. Blueberry Council (in Partnership with Agriculture and Agri-Food Canada) Agassiz Food Research Centre, Agassiz, BC, V0M 1A0, Canada
Michael Dossett & Kelly J. Vining
Department of Horticulture, Michigan State University, East Lansing, MI, 48824-2604, USA
Robert VanBuren
The Donald Danforth Plant Science Center, St. Louis, MO, 63132, USA
Todd C. Mockler
Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, 98195, USA
Ivan Liachko
The New Zealand Institute for Plant & Food Research Limited, Private Bag 11600, Palmerston North, 4474, New Zealand
Kevin M. Davies, Toshi M. Foster & David Chagné

Authors

Rubina Jibran
View author publications
You can also search for this author in PubMed Google Scholar
Helge Dzierzon
View author publications
You can also search for this author in PubMed Google Scholar
Nahla Bassil
View author publications
You can also search for this author in PubMed Google Scholar
Jill M. Bushakra
View author publications
You can also search for this author in PubMed Google Scholar
Patrick P. Edger
View author publications
You can also search for this author in PubMed Google Scholar
Shawn Sullivan
View author publications
You can also search for this author in PubMed Google Scholar
Chad E. Finn
View author publications
You can also search for this author in PubMed Google Scholar
Michael Dossett
View author publications
You can also search for this author in PubMed Google Scholar
Kelly J. Vining
View author publications
You can also search for this author in PubMed Google Scholar
Robert VanBuren
View author publications
You can also search for this author in PubMed Google Scholar
Todd C. Mockler
View author publications
You can also search for this author in PubMed Google Scholar
Ivan Liachko
View author publications
You can also search for this author in PubMed Google Scholar
Kevin M. Davies
View author publications
You can also search for this author in PubMed Google Scholar
Toshi M. Foster
View author publications
You can also search for this author in PubMed Google Scholar
David Chagné
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to David Chagné.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Electronic supplementary material

Supplementary table 1

Supplementary Table S2

Supplementary Table S3

Supplementary Figure S1

Supplementary Figure S2

Supplementary Figure S3

Supplementary Figure S4

Supplementary Figure S5

Supplementary Figure S6

Supplementary Figure S7

Supplementary Figure S8

Supplementary Figure S9

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Jibran, R., Dzierzon, H., Bassil, N. et al. Chromosome-scale scaffolding of the black raspberry (Rubus occidentalis L.) genome based on chromatin interaction data. Hortic Res 5, 8 (2018). https://doi.org/10.1038/s41438-017-0013-y

Download citation

Received: 17 September 2017
Revised: 06 December 2017
Accepted: 10 December 2017
Published: 07 February 2018
DOI: https://doi.org/10.1038/s41438-017-0013-y

This article is cited by

Genome-Wide Characterization and Expression of the bZIP Family in Black Raspberry
- Yaqiong Wu
- Xin Huang
- Wenlong Wu
Journal of Plant Growth Regulation (2024)
A chromosome-scale assembly of the quinoa genome provides insights into the structure and dynamics of its subgenomes
- Elodie Rey
- Peter J. Maughan
- David E. Jarvis
Communications Biology (2023)
A chromosome-level reference genome of the wax gourd (Benincasa hispida)
- Wenlong Luo
- Jinqiang Yan
- Biao Jiang
Scientific Data (2023)
Agrobacterium-mediated transformation of purple raspberry (Rubus occidentalis × R. idaeus) with the PtFIT (FER-like iron deficiency–induced transcription factor 1) gene
- Changhyeon Kim
- Wenhao Dai
In Vitro Cellular & Developmental Biology - Plant (2022)
Chromosome-level genome assembly of Ophiorrhiza pumila reveals the evolution of camptothecin biosynthesis
- Amit Rai
- Hideki Hirakawa
- Mami Yamazaki
Nature Communications (2021)