Abstract
Genome-wide chromatin conformation capture assays provide formidable insights into the spatial organization of genomes. However, due to the complexity of the data structure, their integration in multi-omics workflows remains challenging. We present data structures, computational methods and visualization tools available in Bioconductor to investigate Hi-C, micro-C and other 3C-related data, in R. An online book (https://bioconductor.org/books/OHCA/) further provides prospective end users with a number of workflows to process, import, analyze and visualize any type of chromosome conformation capture data.
Similar content being viewed by others
Introduction
Chromosome conformation capture methods (3 C, 4 C, Hi-C, micro-C, …) have become a prevalent approach to investigate the interplay between DNA-related metabolic processes and the 3D folding of chromosomes (e.g. gene regulation, chromosome compaction, DNA repair and rearrangements1,2,3,4,5). Computational processing of HiC data has also provided powerful and elegant solutions to several genomic limitations, in particular allowing for robust genome scaffolding6,7,8,9. Furthermore, the application of Hi-C directly to complex microbial communities allows the characterization of whole microorganism genomes, the identification of mobile genetic elements such as viruses and plasmids and their assignment to their respective hosts10,11,12,13, and characterization of prophages activity14. International consortia have emerged to orchestrate efforts to characterize chromosome conformation and nuclear organization across cell types, tissue samples and species8,15.
Genome-wide chromatin conformation capture assays, such as Hi-C, micro-C or DNAse-C16,17,18, yield lists of pairs of interacting genomic loci at a base-pair resolution (stored in pairs files, in which each record describes a single measured contact between two genomic loci), which can be further binned to a window of chosen size and stored in symmetric sparse matrix files (where consecutive columns/rows correspond to consecutive genomic bins). Hi-C data specificities thus largely differ from the typical 1D genomic file formats (e.g. ‘.bigwig‘ or ‘.bed‘ files). While a single ‘.pairs‘ file format has been formally defined by the NIH 4D Nucleome Network19, three file formats have been independently proposed to store binned matrix files, each generated by a specific processing software. Hi-C Pro generates (i) sparse matrices as three column (bini / binj / countij) text files and (ii) region files describing genomic coordinates for each bin20, Juicer produces ‘.hic‘ files21 and ‘.(m)cool‘ files are generated by the distiller pipeline22. The ‘.hic‘ and ‘.(m)cool‘ formats are binary, multi-resolution, highly compressed and indexed files and can rely on companion libraries (respectively straw and cooler) to perform random access to a subset of the data. Softwares are also developed to manipulate files in these formats (Juicer tools, Juicebox and Juicebox Assembly Toolbox for ‘.hic‘ files and HiGlass, cooltools and coolpuppy for ‘.(m)cool‘ files23,24,25,26,27, FAN-C for ‘.hic‘ and ‘.(m)cool‘ files28). These computational solutions provide a dedicated shell command line interface (CLI) or a Python API. However, they are not embedded in a larger, genomics-centric ecosystem. Other softwares, such as HiCExplorer, GENOVA, mariner or HiCUP, also provide additional toolkits for Hi-C exploratory data analysis (EDA)29,30,31.
The Bioconductor project focuses on the development of R packages to provide classes, methods and functions dedicated to genomic datasets32,33,34,35. For this reason, Bioconductor has become the reference ecosystem for in-depth genomics investigation (encompassing most genome sequencing methodologies, genome annotations, single cell omics, multi-omics, etc). However, although core methods exist to represent genomic interactions in R (defined by the GInteractions class in the InteractionSet package36), and a few packages exist to perform statistical analyses to Hi-C data (e.g. comparing chromatin interaction frequency between samples with HiCcompare37), no standard chromosome conformation capture data structure has been defined so far in R. Furthermore, data import methods to parse Hi-C processed files in R are still lacking. Overall, the lack of a unified methodology surrounding Hi-C data has limited their integration in the powerful genomics-centric Bioconductor ecosystem, particularly compared to other omics approaches. To address these limitations, we formally defined a set of classes to represent chromosome conformation capture data with Bioconductor, and developed a set of tools to process, parse, analyze and visualize this type of data in R. Compared to existing solutions, this approach allows the end user to leverage existing, powerful genomics-centric methodology already implemented in Bioconductor. Here, we cover the key aspects of the chromosome conformation capture analysis workflow and describe the packages used at each step as well as interoperability features in R. We also present an online book (https://bioconductor.org/books/OHCA/) introducing the end user to the installation of the required packages, their specific functionalities and several examples of complete chromosome conformation capture analysis workflows.
Results
Data representation
The HiCExperiment package implements the ContactFile class (encompassing the CoolFile, HicFile and HicproFile classes) to connect to a contact matrix stored on disk in one of these three formats (Fig. 1, Fig. S1), supporting Hi-C, micro-C and other 3C-related data binned at a fixed resolution. A ContactFile instance also lists the resolutions available in the matrix file and metadata relevant for biological analysis. The import method provides random access to a ContactFile, to only import relevant chunks of data from large Hi-C matrix files. This instantiates a HiCExperiment object containing binned genomic contacts of interest stored as GInteractions (Fig. 1). Raw counts and normalized contact frequencies (if available) are automatically imported and stored in a list of scores. Additional methods are available to move around the Hi-C map (refocus), dynamically change its resolution (zoom), subset interactions or add qualitative or quantitative metrics (using the standard ‘[‘ subsetting operator and the ‘$‘ column accessor), and set/get general information related to the contact matrix (e.g. seqinfo, anchors, bins, topologicalFeatures, metadata). The HiCExperiment package also defines a PairsFile class to efficiently import ‘.pairs‘ files in R as GInteractions.
All the classes implemented in HiCExperiment directly extend core Bioconductor classes and generic methods, including BiocFile, GenomeInfoDb and GenomicRanges, ensuring seamless parsing, manipulation and genomic representation of locally stored Hi-C contact matrices in R. Importantly, a HiCExperiment object can be seamlessly coerced as a GInteractions, a data.frame tabular object or a (optionally sparse) matrix. This facilitates its interoperability and the integration of Hi-C processed data with other pre-existing packages.
Thanks to ever-decreasing sequencing costs and improving technology, the average size of chromosome conformation capture datasets is continuously increasing, both in sequencing depth and in resolution. HDF5-derived ‘.(m)cool‘ and binary ‘.hic‘ files both efficiently store such large-scale data, and HiCExperiment objects instantized from these file formats benefit from efficient parsing libraries based on C code, optimized for speed. Furthermore, because random access is supported for these file formats, contact matrices can be partially imported in R, allowing manipulation of large datasets – such as deeply sequenced micro-C datasets – even on personal laptops with standard hardware configuration (e.g. 4 CPUs and 8–16 Gb RAM).
Data processing
Integrated workflows such as nf-core/hic, open2c/distiller-nf or Juicer20,21,38,39 efficiently use high-performance computing (HPC) environments to process chromosome conformation capture data. The large number of indirect operations they perform (e.g. container caching, sanitary check-ups and additional quality controls) results in a significant overhead and an increase in storage and memory requirements. These large workflows are therefore less suitable for processing on local workstations, where setting up dependencies is often cumbersome. The HiCool package was developed with these limitations in mind. HiCool is an R package that automatically sets up a basilisk-managed conda environment40 linked to hicstuff, a multipurpose lightweight Hi-C processing Python library41. This environment allows HiCool to align paired-end sequencing data to a genome reference, parse them into a standard ‘.pairs‘ file, filter out invalid pairs42 and PCR duplicates, bin them into multi-resolution balanced ‘.mcool‘ and ‘.hic‘ matrix files and automatically generate an HTML report of the processing (Fig. 2a, b). The implementation of HiCool as a Bioconductor package enables its efficient integration with other local Hi-C analysis packages (Fig. S1), and unlocks access to genomic databases, e.g. to automatically retrieve and cache cloud-hosted pre-built genome reference indexes by using a genome ID string (e.g. “mm10”, “GRZc10”, …), accelerating local Hi-C data pre-processing and direct import in R.
Hi-C visualization
The HiCExperiment object inherits methods from the core GInteractions and GRanges classes to provide a flexible representation of Hi-C data in R. The HiContacts package leverages these inheritances to explore HiCExperiment objects, focusing on four main topics: Hi-C visualization, contact matrix-centric analysis, interactions-centric analysis and structural feature annotation (Fig. 2A, list of functions in Table S1, interoperability illustrated in Fig. S1).
Hi-C exploratory data analysis is instrumental in generating hypotheses, discovering patterns, and directly answering biological questions. A generic plotMatrix function is provided in the HiContacts package and can operate on HiCExperiment, GInteractions or standard matrix objects, with extensive Hi-C-related customization options (Fig. S2). These include single matrix visualization (Fig. S2A, B), side-by-side comparison of two matrices (Fig. S2C), visualization of ratio, observed vs. expected (O/E) and correlation matrices (Fig. S2E–G), support for horizontal Hi-C maps (Fig. S2G), annotation of structural features and alignment with genomic tracks (Fig. S2F), and visualization of aggregated matrices (Fig. S2H). All visualization functions provided by HiContacts return ggplot objects that can be easily customized, e.g. to change the scaling, range or hue of the color map or to add additional details or labels and generate publication-ready figures.
Contact matrix-centric analysis
In a Hi-C analysis workflow, a preliminary requirement is the normalization of contact matrices. A well-established approach for matrix normalization is the matrix balancing approach42,43. By default, HiCool processing performs such normalization automatically, but the end user may need to manually normalize existing contact matrices. The HiContacts package implements the balancing of a HiCExperiment object, calculating weight scores for each bin and adding a new normalized score metric to each genomic interaction (Fig. 2a).
Several basic matrix operations can be applied to HiCExperiment objects. HiContacts defines operators to subset or merge Hi-C maps, or to subtract, divide or sum two Hi-C maps. HiContacts also provides a random subsampling method of Hi-C interactions that preserves sample-specific distance-dependent interaction frequencies.
Other calculations can be performed on HiCExperiment instances. HiContacts can estimate the overall expected signal for an imported contact matrix and compute the ratio of observed vs. expected (O/E) interaction frequency (Fig. S2F). HiContacts can also compute correlation matrices, to reveal a stereotypical plaid pattern in which interactions are enriched between chromosome segments belonging to the same compartment (AA or BB)18 (Fig. S2G).
Finally, an operation frequently performed when analyzing Hi-C matrices is the aggregation of matrix snippets, e.g. matrix subsets centered at all topological domain boundaries or all chromatin loops44. The AggrHiCExperiment class stores and averages the Hi-C signal across a set of snippets of interest. Because AggrHiCExperiment is an extension of the core HiCExperiment class, it inherits all the methods available to HiCExperiment instances, including visualization functionalities (Fig. S2H).
Interactions-centric analysis
The proportion of cis (intra-chromosomal) and trans (inter-chromosomal) interactions per chromosome can be calculated with HiContacts to investigate the propensity of chromosomes to form chromosomal territories with limited intermixing45 (Fig. S2I).
The chromosome-wide distance-dependent interaction frequency (a.k.a. P(s)) and its slope are valuable metrics that can be used to infer physical properties of individual chromosomes18,46. For example, in yeast entering mitosis, there is a significant decrease in interactions in the 20–30 kb range and a downward shift in the P(s) slope beyond this range (Fig. S2J). These features have been successfully used to accurately model the reorganization of the nuclear genomic content into mitotic condensed chromosomes47,48,49. Distance-dependent interaction frequency can also be summarized in scalograms (Fig. S2K): the median interaction genomic distance (±25%, or other quantiles specified by the end user) can be plotted along a linear axis representing a chromosome segment. This is often useful for deciphering the behavior of chromatin interactions along chromosomes50.
Finally, on a smaller scale, one may also be interested in studying interactions between an discrete viewpoint (a.k.a. bait) locus (e.g. a single or cluster of regulatory elements) and neighboring genomic features (e.g. other regulatory elements, gene bodies, repeats, etc). Profiles of contacts between such a viewpoint and the rest of the genome, sometimes referred to as virtual 4 C plots, can be computed with HiContacts (Fig. S2L). This feature is an efficient way to summarize and compare interactions between a number of different loci at once.
Structural feature annotations
A key step in using chromosome conformation capture data to explore the functional organization of chromosomes is the annotation of structural features. HiContacts implements methods to identify A/B chromosome compartments using eigenvector decomposition18, topologically associated domains (TADs51), using a diamond insulation score52 et chromatin loops using computer vision53 (Table S1). It is nonetheless advised to investigate structural features using a range of different methods. For instance, A/B compartments can be identified in R with HiTC and HiCDOC packages, while finer sub-compartments can be annotated using CALDER54,55,56. To allow end-users to use best-suited existing R packages, HiCExperiment objects can be coerced into the specific data structures such as matrices, data frames or GInteractions.
Data integration
Hi-C has gained traction in several fields related to genome biology, and several consortia have developed large-scale programs based on this technique. The NIH 4D Nucleome Program hosts a data portal that lists > 500 chromosome conformation capture experiment sets performed in humans and a variety of model organisms15. For each experiment set, Hi-C contact matrices, pairs files, coverage tracks and downstream analysis files are publicly available. The DNA Zoo consortium is using Hi-C scaffolding to generate genome references for hundreds of animals, plants, fungi and microorganisms. The polished genome sequences and corresponding Hi-C contact maps are directly accessible on a dedicated website8. Two R packages, fourDNData and DNAZooData, provide gateways to these databases. A list of available experiment sets and their metadata is provided within each package, and the actual data files (‘.mcool‘, ‘.hic‘, ‘.bw‘, ‘.bed‘, ‘.pairs‘ and ‘.fasta‘ files) associated with an experiment of interest can be seamlessly downloaded by providing a sample ID, and are locally cached for reuse across independent R sessions. For example, the mouse and chicken Hi-C data presented in Fig. S2 and Fig. 4 were retrieved directly in R using the fourDNData package. Providing programmatic access to existing databases will accelerate investigation in genome biology and open new avenues of research.
Interoperability between Hi-C packages
The HiCExperiment class provides interoperability between Hi-C packages. To illustrate this point, we present here a typical workflow to analyze seven Hi-C yeast datasets obtained from WT cells or mutants of the cohesin complex, synchronized in either G1 or G2/M, a stage in which chromosomes are compacted into arrays of loops57. For each library, Hi-C reads are aligned to the yeast reference genome and binned into contact maps using HiCool. Visual inspection of Hi-C maps suggests an increase in contact over longer distances in G2/M vs G1, enhanced in the absence of wpl1 and wpl1/eco1 (Fig. 3a), and this is confirmed by the P(s) curves (Fig. 3b). That the replicates are coherent is demonstrated by both the strong overlap of the P(s) curves (Fig. 3b) and by the stratum-adjusted correlation coefficients (SCC) calculated using HiCRep58. Here, we use the HiCExperiment coercion methods to convert the imported Hi-C maps into dense matrices, the input class required by HiCRep. SCC scores show that replicates for WT G2/M and wpl1 are overall correlated, while the two replicates for WT G1 are slightly more divergent (Fig. 3c). The stratum-dependent correlation between G1 and G2/M replicates decreases dramatically at short distance (10–30 kb), corresponding to the range of cohesin-mediated chromatin loops along G2/M chromosomes in yeast (Fig. 3d). In contrast, stratum-dependent correlation with wpl1 single mutant and wpl1/eco1 double mutant decreases at mid-range (50–100 kb) and mid-to-long-range (50–200 kb) respectively, consistent with the independent roles of Eco1 and Wpl1 factors in chromatin loop formation. We took advantage of the replicates to perform differential interaction (DI) analysis using the multiHiCcompare package59. Using HiCExperiment, we imported wpl1 and WT replicates chromosome XI Hi-C data and seamlessly coerced them into the multiHiCcompare-specific tabular format. The contact frequency fold-change and adjusted p-values computed by multiHiCcompare are injected back into the original HiCExperiment objects to visually represent these metrics in Hi-C maps (Fig. 3e) or as volcano plots, separating inter-arm and intra-arm interactions over chrXI (Fig. 3f). This analysis highlights that the increase in contact frequencies over longer distances occurs specifically for intra-arm contacts in wpl1 compared to WT, while contacts spanning the acrocentric chrXI centromere decrease. Interoperability with other packages in R is further illustrated in the following page of the companion online book: https://bioconductor.org/books/devel/OHCA/pages/interoperability.html.
Delivering new biological insights using Hi-C
To illustrate how HiCExperiment can be leveraged to raise new biological hypotheses, we investigated a time-course Hi-C dataset of chicken cells released from a G2 block into mitosis60. We used the fourDNData gateway package to retrieve the data processed by the 4D consortium, and HiContacts to annotate compartments. Hi-C maps of chr3 (Fig. 4a) illustrate the progressive loss of compartment organization following G2 release, resulting in a rod-like polymer organization as early as 10 min after release when cells are in prophase and followed by the emergence of a second broader diagonal corresponding to helical coiling of chromosomes60. We generated saddle plots for each time point with HiContacts and noted that although AA and BB interactions are comparably lost at 10 min of release onwards, at 5 min BB interactions seem to be specifically depleted compared to AA ones (Fig. 4b). Correlation matrices over a magnified section of chr4 at G2, and 5 min and 30 min after release further confirmed that at this locus AA interactions are retained at 5 min while BB interactions disappear (Fig. 4c). We quantified the average contact frequency between pairs of genomic loci at these three time points, revealing a similar trend genome-wide (Fig. 4D). These results suggest that within minutes after G2 release, the B compartment - corresponding to heterochromatin - is affected faster than A compartment. This is consistent with the model whereby H3 S10 phosphorylation, occurring in late G2 first at chromocenters, initially induces HP1 eviction from H3K9me361 and heterochromatin dissolution62, and then spreads across entire chromosomes to allow for mitotic condensation63.
Discussion
Over the past decade, dozens of Bioconductor-hosted packages have led to widely adopted functional data classes for the generation, parsing and analysis of emerging genomic technologies. These developments allow advanced multi-omics analyses in R to an extent unmatched in other programming languages. Yet, manipulating chromosome conformation capture standard file formats in R remains particularly cumbersome. Here, we present the implementation of a flexible HiCExperiment class built on the robust Bioconductor core infrastructure. The HiCExperiment class facilitates chromosome conformation capture data integration (from Hi-C, micro-C, …) into existing genomic analysis workflows in R, reducing redundancy and improving interoperability. The companion HiContacts package provides the essential toolkit to compare, aggregate and further investigate HiCExperiment objects. A detailed introduction and extensive examples of Hi-C data analysis workflows are provided in the companion website https://bioconductor.org/books/OHCA/.
This rich ecosystem has several advantages compared to existing chromosome conformation capture libraries: (1) it is embedded in the genomics-focus Bioconductor ecosystem, ensuring a rational genomic representation of C data and evolvability (an extension of HiCExperiment to support single-cell Hi-C data is currently in development); (2) it extends pre-existing generic methods used by a large community, facilitating the intuitive integration of the C data in existing genomics workflows; (3) it supports quantitative and qualitative analysis of C data, represented as a numerical matrix or as a set of genomic interactions; (4) the daily building/testing infrastructure maintained by Bioconductor assures reproducibility of chromosome conformation capture analyses. For developers, a Docker image with preinstalled development versions of HiCExperiment-related packages is available here: https://github.com/users/js2264/packages/container/package/ohca.
A tight integration of HiCExperiment within the Bioconductor ecosystem unlocks future development opportunities for Hi-C data analysis. First, HiCExperiment could adopt the “tidy” grammar recently adapted to omics data investigation64, a project spearheaded by the Bioconductor community. This would make Hi-C data wrangling more intuitive and accessible to new investigators. Secondly, Bioconductor supports a DelayedMatrix framework and a block processing mechanism, which could be used to improve summarization of Hi-C data over multiple loci. Finally, Bioconductor is currently making efforts to deploy HiCExperiment functionalities within the AnVIL cloud computing platform, a project powered by Terra to facilitate collaborative data investigation between biomedical researchers65. We hope that this will accelerate the use of Hi-C in biomedical research, e.g. to shed light on genomic rearrangement events often identified through Hi-C66,67.
Methods
All analyses were performed using R 4.3.0 with Bioconductor 3.18. Further details of how each analysis was performed can be found in the Code Availability section.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
All data presented in this manuscript have already been published. Yeast Hi-C data come from57 and fastq files were obtained from the SRA repository (SRA accession numbers: SRR8769554, SRR10687276, SRR8769549, SRR10687281, SRR8769551, SRR10687278, SRR8769555) or directly obtained through HiContactsData. Yeast ChIP-seq data come from68 and processed data were obtained from GEO (GSM6703614). Chicken Hi-C data come from60 and was directly imported from the 4DN data portal with fourDNData (ExperimentSet accession numbers: 4DNES9LEZXN7, 4DNESNWWIFZU, 4DNESGDXKM2I, 4DNESIR416OW, 4DNESS8PTK6F). micro-C data generated from HFFc6 cells69 was also imported from the 4DN data portal (ExperimentSet accession number: 4DNESWST3UBH).
Code availability
All the analysis steps are extensively described as dedicated workflows in the companion website: https://bioconductor.org/books/OHCA/. Additional examples are also available from the following documentation webpages: Importing Hi-C data (https://js2264.github.io/HiCExperiment/reference/HiCExperiment.html#ref-examples), Arithmetic with Hi-C data (https://js2264.github.io/HiContacts/reference/arithmetics.html#examples) and Plotting Hi-C matrices (https://js2264.github.io/HiContacts/reference/plotMatrix.html). HiCExperiment is freely available on Bioconductor (https://bioconductor.org/packages/HiCExperiment), and the source code is hosted on a GitHub repository (https://github.com/js2264/HiCExperiment). HiContacts, HiCool, fourDNData and DNAZooData packages are also freely provided as Bioconductor packages (https://bioconductor.org/packages) and publicly hosted on GitHub. hicstuff is publicly available as a standalone python package from bioconda.
References
Serizay, J. & Ahringer, J. Genome organization at different scales: nature, formation and function. Curr. Opin. Cell Biol. 52, 145–153 (2018).
Rowley, M. J. & Corces, V. G. Organizational principles of 3D genome architecture. Nat. Rev. Genet. 19, 789–800 (2018).
Szabo, Q., Bantignies, F. & Cavalli, G. Principles of genome folding into topologically associating domains. Sci. Adv. 5, eaaw1668 (2019).
Misteli, T. The self-organizing genome: principles of genome architecture and function. Cell 183, 28–45 (2020).
Mirny, L. & Dekker, J. Mechanisms of chromosome folding and nuclear organization: their interplay and open questions. Cold Spring Harb. Perspect. Biol. 14, a040147 (2022).
Marie-Nelly, H. et al. High-quality genome (re)assembly using chromosomal contact data. Nat. Commun. 5, 5695 (2014).
Kaplan, N. & Dekker, J. High-throughput genome scaffolding from in vivo DNA interaction frequency. Nat. Biotechnol. 31, 1143–1147 (2013).
Dudchenko, O. et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science 356, 92–95 (2017).
Baudry, L. et al. instaGRAAL: chromosome-level quality scaffolding of genomes using a proximity ligation-based scaffolder. Genome Biol. 21, 148 (2020).
Beitel, C. W. et al. Strain- and plasmid-level deconvolution of a synthetic metagenome by sequencing proximity ligation products. PeerJ 2, e415 (2014).
Marbouty, M. et al. Metagenomic chromosome conformation capture (meta3C) unveils the diversity of chromosome organization in microorganisms. Elife 3, e03318 (2014).
Marbouty, M., Baudry, L., Cournac, A. & Koszul, R. Scaffolding bacterial genomes and probing host-virus interactions in gut microbiome by proximity ligation (chromosome capture) assay. Sci. Adv. 3, e1602105 (2017).
Marbouty, M., Thierry, A., Millot, G. A. & Koszul, R. MetaHiC phage-bacteria infection network reveals active cycling phages of the healthy human gut. Elife 10, e60608 (2021).
Lamy-Besnier, Q. et al. Chromosome folding and prophage activation reveal specific genomic architecture for intestinal bacteria. Microbiome 11, 111 (2023).
Reiff, S. B. et al. The 4D nucleome data portal as a resource for searching and visualizing curated nucleomics data. Nat. Commun. 13, 2365 (2022).
Hsieh, T.-H. S. et al. Mapping nucleosome resolution chromosome folding in yeast by micro-C. Cell 162, 108–119 (2015).
Ramani, V. et al. Mapping 3D genome architecture through in situ DNase Hi-C. Nat. Protoc. 11, 2104–21 (2016).
Lieberman Aiden, E. et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289–293 (2009).
Lee, S., Bakker, C. R., Vitzthum, C., Alver, B. H. & Park, P. J. Pairs and Pairix: a file format and a tool for efficient storage and retrieval for Hi-C read pairs. Bioinformatics 38, 1729–1731 (2022).
Servant, N. et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol. 16, 259 (2015).
Durand, N. C. et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst. 3, 95–98 (2016).
Abdennur, N. & Mirny, L. A. Cooler: scalable storage for Hi-C data and other genomically labeled arrays. Bioinformatics 36, 311–316 (2020).
Durand, N. C. et al. Juicebox provides a visualization system for Hi-C contact maps with unlimited zoom. Cell Syst. 3, 99–101 (2016).
Dudchenko, O. et al. The juicebox assembly tools module facilitates de novo assembly of mammalian genomes with chromosome-length scaffolds for under $1000. bioRxiv https://doi.org/10.1101/254797 (2018).
Kerpedjiev, P. et al. HiGlass: web-based visual exploration and analysis of genome interaction maps. Genome Biol. 19, 125 (2018).
Open, C. et al. Cooltools: enabling high-resolution Hi-C analysis in Python. bioRxiv https://doi.org/10.1101/254547 (2022).
Flyamer, I. M., Illingworth, R. S. & Bickmore, W. A. Coolpup.py: versatile pile-up analysis of Hi-C data. Bioinformatics 36, 2980–2985 (2020).
Kruse, K., Hug, C. B. & Vaquerizas, J. M. FAN-C: a feature-rich framework for the analysis and visualisation of chromosome conformation capture data. Genome Biol. 21, 303 (2020).
Ramírez, F. et al. High-resolution TADs reveal DNA sequences underlying genome organization in flies. Nat. Commun. 9, 189 (2018).
Wingett, S. et al. HiCUP: pipeline for mapping and processing Hi-C data. F1000Res. 4, 1310 (2015).
van der Weide, R. H. et al. Hi-C analyses with GENOVA: a case study with cohesin variants. NAR Genom. Bioinform. 3, lqab040 (2021).
Gentleman, R. C. et al. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 5, R80 (2004).
Lawrence, M. et al. Software for computing and annotating genomic ranges. PLoS Comput. Biol. 9, e1003118 (2013).
Huber, W. et al. Orchestrating high-throughput genomic analysis with Bioconductor. Nat. Methods 12, 115–121 (2015).
Amezquita, R. A. et al. Orchestrating single-cell analysis with bioconductor. Nat. Methods 17, 137–145 (2020).
Lun, A. T. L., Perry, M. & Ing-Simmons, E. Infrastructure for genomic interactions: Bioconductor classes for Hi-C, ChIA-PET and related experiments. F1000Res. 5, 950 (2016).
Stansfield, J. C., Cresswell, K. G., Vladimirov, V. I. & Dozmorov, M. G. HiCcompare: an R-package for joint normalization and comparison of HI-C datasets. BMC Bioinform. 19, 279 (2018).
Ewels, P. A. et al. The nf-core framework for community-curated bioinformatics pipelines. Nat. Biotechnol. 38, 276–278 (2020).
Goloborodko, A. et al. Open2c/Distiller https://github.com/open2c/distiller-nf (2022).
Lun, A. T. L. basilisk: a Bioconductor package for managing Python environments. J. Open Source Softw. 7, 4742 (2022).
Matthey-Doret, C. et al. Normalization of chromosome contact maps: matrix balancing and visualization. Methods Mol. Biol. 2301, 1–15 (2022).
Cournac, A., Marie-Nelly, H., Marbouty, M., Koszul, R. & Mozziconacci, J. Normalization of a chromosomal contact map. BMC Genom. 13, 436 (2012).
Imakaev, M. et al. Iterative correction of Hi-C data reveals hallmarks of chromosome organization. Nat. Methods 9, 999–1003 (2012).
Rao, S. S. P. et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–1680 (2014).
Cremer, T. & Cremer, M. Chromosome territories, nuclear architecture and gene regulation in mammalian cells. Nat. Rev. Genet. 2, 292–301 (2001).
Akgol Oksuz, B. et al. Systematic evaluation of chromosome conformation capture assays. Nat. Methods 18, 1046–1055 (2021).
Lazar‐Stefanita, L. et al. Cohesins and condensins orchestrate the 4D dynamics of yeast chromosomes during the cell cycle. EMBO J. 36, 2684–2697 (2017).
Kakui, Y., Rabinowitz, A., Barry, D. J. & Uhlmann, F. Condensin-mediated remodeling of the mitotic chromatin landscape in fission yeast. Nat. Genet. 49, 1553–1557 (2017).
Schalbetter, S. A. et al. SMC complexes differentially compact mitotic chromosomes according to genomic context. Nat. Cell Biol. 19, 1071–1080 (2017).
Lioy, V. S. et al. Multiscale structuring of the E. coli chromosome by nucleoid-associated and condensin proteins. Cell 172, 771–783.e18 (2018).
Dixon, J. R. et al. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature 485, 376–380 (2012).
Crane, E. et al. Condensin-driven remodelling of X chromosome topology during dosage compensation. Nature 523, 240–244 (2015).
Matthey-Doret, C. et al. Computer vision for pattern detection in chromosome contact maps. Nat. Commun. 11, 5795 (2020).
Servant, N. et al. HiTC: exploration of high-throughput ‘C’ experiments. Bioinformatics 28, 2843–2844 (2012).
Kurylo, C., Zytnicki, M., Foissac, S., Maigné, E. HiCDOC (Bioconductor, 2022).
Liu, Y. et al. Systematic inference and comparison of multi-scale chromatin sub-compartments connects spatial organization to cell phenotypes. Nat. Commun. 12, 2439 (2021).
Dauban, L. et al. Regulation of cohesin-mediated chromosome folding by Eco1 and other partners. Mol. Cell 77, 1279–1293.e4 (2020).
Yang, T. et al. HiCRep: assessing the reproducibility of Hi-C data using a stratum-adjusted correlation coefficient. Genome Res. 27, 1939–1949 (2017).
Stansfield, J. C., Cresswell, K. G. & Dozmorov, M. G. multiHiCcompare: joint normalization and comparative analysis of complex Hi-C experiments. Bioinformatics 35, 2916–2923 (2019).
Gibcus, J. H. et al. A pathway for mitotic chromosome formation. Science 6135, eaao6135 (2018).
Fischle, W. et al. Regulation of HP1–chromatin binding by histone H3 methylation and phosphorylation. Nature 438, 1116–1122 (2005).
Peng, Q. et al. Coordinated histone modifications and chromatin reorganization in a single cell revealed by FRET biosensors. Proc. Natl Acad. Sci. USA 115, E11681–E11690 (2018).
Hendzel, M. J. et al. Mitosis-specific phosphorylation of histone H3 initiates primarily within pericentromeric heterochromatin during G2 and spreads in an ordered fashion coincident with mitotic chromosome condensation. Chromosoma 106, 348–360 (1997).
Hutchison, W. J. et al. The tidyomics ecosystem: enhancing omic data analyses. bioRxiv https://doi.org/10.1101/2023.09.10.557072 (2023).
Schatz, M. C. et al. Inverting the model of genomics data sharing with the NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-space. Cell Genom. 2, 100085 (2022).
Lupiáñez, D. G. et al. Disruptions of topological chromatin domains cause pathogenic rewiring of gene-enhancer interactions. Cell 161, 1012–1025 (2015).
Melo, U. S. et al. Hi-C identifies complex genomic rearrangements and TAD-shuffling in developmental diseases. Am. J. Hum. Genet. 106, 872–884 (2020).
Chapard, C. et al. Exogenous chromosomes reveal how sequence composition drives chromatin assembly, activity, folding and compartmentalization. bioRxiv https://doi.org/10.1101/2023.09.10.596134 (2022).
Krietenstein, N. et al. Ultrastructural details of mammalian chromosome architecture. Mol. Cell 78, 554–565.e7 (2020).
Acknowledgements
We thank all our colleagues from the laboratory Régulation spatiale des génomes for fruitful discussions. This research was supported by fundings from the European Research Council under the Horizon 2020 Program grant agreement 771813, the Q-life program and the Agence Nationale pour la Recherche to R.K. (ANR-22-CE12-0013-01; ANR-19-CE13-0027-02). J.S. is recipient of a Postdoctoral ARC fellowship.
Author information
Authors and Affiliations
Contributions
Conceptualization: J.S.; Methodology: J.S., C.M.-D.; Software: J.S., C.M.-D., L.B., A.B.; Formal analysis: J.S.; Investigation: J.S.; Resources: J.S., C.M.-D.; Writing—Original Draft: J.S.; Writing—other versions: J.S., R.K.; Visualization: J.S.; Supervision: R.K.; Project administration: R.K.; Funding acquisition: R.K.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks the anonymous reviewers for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Serizay, J., Matthey-Doret, C., Bignaud, A. et al. Orchestrating chromosome conformation capture analysis with Bioconductor. Nat Commun 15, 1072 (2024). https://doi.org/10.1038/s41467-024-44761-x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467-024-44761-x
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.