The ability to identify regulatory interactions that mediate gene expression changes through distal elements, such as risk loci, is transforming our understanding of how genomes are spatially organized and regulated. Capture Hi-C (CHi-C) is a powerful tool to delineate such regulatory interactions. However, primary analysis and downstream interpretation of CHi-C profiles remains challenging and relies on disparate tools with ad-hoc input/output formats and specific assumptions for statistical modeling. Here we present a data processing and interaction calling toolkit (CHiCANE), specialized for the analysis and meaningful interpretation of CHi-C assays. In this protocol, we demonstrate applications of CHiCANE to region capture Hi-C (rCHi-C) and promoter capture Hi-C (pCHi-C) libraries, followed by quality assessment of interaction peaks, as well as downstream analysis specific to rCHi-C and pCHi-C to aid functional interpretation. For a typical rCHi-C/pCHi-C dataset this protocol takes up to 3 d for users with a moderate understanding of R programming and statistical concepts, although this is dependent on dataset size and compute power available. CHiCANE is freely available at https://cran.r-project.org/web/packages/chicane.
Subscribe to Journal
Get full journal access for 1 year
only $9.92 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Baitmaps for both sets of CHi-C libraries used in this study (rCHi-C T-47D10, pCHi-C MK9), hg38 HindIII in silico digest, HiCUP reports, CHiCANE’s unfiltered interactions, filtered interaction peaks (q-value < 0.05), and negative binomial model fit plots and statistics are available at https://doi.org/10.5281/zenodo.4073433.
The CHiCANE R package is freely available through CRAN: https://cran.r-project.org/web/packages/chicane.
Dekker, J., Rippe, K., Dekker, M. & Kleckner, N. Capturing chromosome conformation. Science 295, 1306–1311 (2002).
Dostie, J. et al. Chromosome Conformation Capture Carbon Copy (5C): a massively parallel solution for mapping interactions between genomic elements. Genome Res. 16, 1299–1309 (2006).
Simonis, M. et al. Nuclear organization of active and inactive chromatin domains uncovered by chromosome conformation capture-on-chip (4C). Nat. Genet. 38, 1348–1354 (2006).
Belton, J. M. et al. Hi-C: a comprehensive technique to capture the conformation of genomes. Methods 58, 268–276 (2012).
Schmitt, A. D., Hu, M. & Ren, B. Genome-wide mapping and analysis of chromosome architecture. Nat. Rev. Mol. Cell Biol. 17, 743–755 (2016).
Dryden, N. H. et al. Unbiased analysis of potential targets of breast cancer susceptibility loci by Capture Hi-C. Genome Res. 24, 1854–1868 (2014).
Mifsud, B. et al. Mapping long-range promoter contacts in human cells with high-resolution capture Hi-C. Nat. Genet. 47, 598–606 (2015).
Davies, J. O. et al. Multiplexed analysis of chromosome conformation at vastly improved sensitivity. Nat. Methods 13, 74–80 (2016).
Javierre, B. M. et al. Lineage-specific genome architecture links enhancers and non-coding disease variants to target gene promoters. Cell 167, 1369–1384.e1319 (2016).
Baxter, J. S. et al. Capture Hi-C identifies putative target genes at 33 breast cancer risk loci. Nat. Commun. 9, 1028 (2018).
Jager, R. et al. Capture Hi-C identifies the chromatin interactome of colorectal cancer risk loci. Nat. Commun. 6, 6178 (2015).
Martin, P. et al. Capture Hi-C reveals novel candidate genes and complex long-range interactions with related autoimmune risk loci. Nat. Commun. 6, 10069 (2015).
Orlando, G. et al. Promoter capture Hi-C-based identification of recurrent noncoding mutations in colorectal cancer. Nat. Genet. 50, 1375–1380 (2018).
Wingett, S. et al. HiCUP: pipeline for mapping and processing Hi-C data. F1000Res. 4, 1310 (2015).
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Sanyal, A., Lajoie, B. R., Jain, G. & Dekker, J. The long-range interaction landscape of gene promoters. Nature 489, 109–113 (2012).
Kleiber, C. & Zeileis, A. Visualizing count data regressions using rootograms. Am. Stat. 70, 296–303 (2016).
Ben Zouari, Y., Molitor, A. M., Sikorska, N., Pancaldi, V. & Sexton, T. ChiCMaxima: a robust and simple pipeline for detection and visualization of chromatin looping in Capture Hi-C. Genome Biol. 20, 102 (2019).
Cairns, J. et al. CHiCAGO: robust detection of DNA looping interactions in Capture Hi-C data. Genome Biol. 17, 127 (2016).
Mifsud, B. et al. GOTHiC, a probabilistic model to resolve complex biases and to identify real interactions in Hi-C data. PLoS ONE 12, e0174744 (2017).
Forcato, M. et al. Comparison of computational methods for Hi-C data analysis. Nat. Methods 14, 679–685 (2017).
Rigby, R. & Stasinopoulos, D. Generalized additive models for location, scale and shape. Applied Statistics 54, 507–554 (2005).
Yaffe, E. & Tanay, A. Probabilistic modeling of Hi-C contact maps eliminates systematic biases to characterize global chromosomal architecture. Nat. Genet. 43, 1059–1065 (2011).
Ay, F. & Noble, W. S. Analysis methods for studying the 3D architecture of the genome. Genome Biol. 16, 183 (2015).
Kong, S. & Zhang, Y. Deciphering Hi-C: from 3D genome to function. Cell Biol. Toxicol. 35, 15–32 (2019).
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Li, H. & Durbin, R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 26, 589–595 (2010).
Servant, N. et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol. 16, 259 (2015).
Haider, S. et al. A bedr way of genomic interval processing. Source Code Biol. Med. 11, 14 (2016).
The ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium. Pan-cancer analysis of whole genomes. Nature 578, 82–93 (2020).
Li, D., Hsu, S., Purushotham, D., Sears, R. L. & Wang, T. WashU Epigenome Browser update 2019. Nuc. Acids Res. 47, W158–W165 (2019).
Hahne, F. & Ivanek, R. Visualizing genomic data using Gviz and Bioconductor. Methods Mol. Biol. 1418, 335–351 (2016).
Di Tommaso, P. et al. Nextflow enables reproducible computational workflows. Nat. Biotechnol. 35, 316–319 (2017).
Koster, J. & Rahmann, S. Snakemake-a scalable bioinformatics workflow engine. Bioinformatics 34, 3600 (2018).
Ghoussaini, M. et al. Evidence that breast cancer risk at the 2q35 locus is mediated through IGFBP5 regulation. Nat. Commun. 4, 4999 (2014).
Fudenberg, G., Getz, G., Meyerson, M. & Mirny, L. A. High order chromatin architecture shapes the landscape of chromosomal alterations in cancer. Nat. Biotechnol. 29, 1109–1113 (2011).
De, S. & Michor, F. DNA replication timing and long-range DNA interactions predict mutational landscapes of cancer genomes. Nat. Biotechnol. 29, 1103–1108 (2011).
Zhang, Y. et al. Spatial organization of the mouse genome and its role in recurrent chromosomal translocations. Cell 148, 908–921 (2012).
Brodie, A., Azaria, J. R. & Ofran, Y. How far from the SNP may the causative genes be? Nuc. Acids Res. 44, 6046–6054 (2016).
Consortium, E. P. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
Hahne, F. & Ivanek, R. Visualizing genomic data using Gviz and Bioconductor. in Statistical Genomics: Methods and Protocols 335–351 (Springer Science+Business Media, 2016).
Lawrence, M., Daujat, S. & Schneider, R. Lateral thinking: how histone modifications regulate gene expression. Trends Genet. 32, 42–56 (2016).
Bannister, A. J. & Kouzarides, T. Regulation of chromatin by histone modifications. Cell Res. 21, 381–395 (2011).
Stunnenberg, H. G., International Human Epigenome, C. & Hirst, M. The International Human Epigenome Consortium: a blueprint for scientific collaboration and discovery. Cell 167, 1897 (2016).
Szabo, Q., Bantignies, F. & Cavalli, G. Principles of genome folding into topologically associating domains. Sci. Adv. 5, eaaw1668 (2019).
Dowen, J. M. et al. Control of cell identity genes occurs in insulated neighborhoods in mammalian chromosomes. Cell 159, 374–387 (2014).
Servant, N., Varoquaux, N., Heard, E., Barillot, E. & Vert, J. P. Effective normalization for copy number variation in Hi-C data. BMC Bioinformatics 19, 313 (2018).
Franke, M. et al. Formation of new chromatin domains determines pathogenicity of genomic duplications. Nature 538, 265–269 (2016).
We thank Breast Cancer Now for funding this work as part of Programme Funding to The Breast Cancer Now Toby Robins Research Centre. This study makes use of data generated by the PCHI-C Consortium. A full list of the investigators who contributed to the generation of the data is available in Javierre et al.9, which was funded by the National Institute for Health Research of England, UK Medical Research Council (MR/L007150/1) and UK Biotechnology and Biological Research Council (BB/J004480/1). We also thank D. Li from the WashU Epigenome Browser team for implementing support for CHiCANE’s standard format in the Epigenome Browser.
The authors declare no competing interests.
Peer review information Nature Protocols thanks Peter Robinson and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Key references using this protocol
Baxter, J. S. et al. Nat. Commun. 9, 1028 (2018): https://doi.org/10.1038/s41467-018-03411-9
Dryden, N. H. et al. Genome Res. 24, 1854–1868 (2014): https://doi.org/10.1101/gr.175034.114
Image from WashU Epigenome Browser showing non-bait to bait interaction peaks (q-value < 0.05) called at known breast cancer risk loci by CHiCANE in the Baxter T-47D libraries at a, 16q12.2 locus and b, 14q24.1 locus. Yellow boxes show the captured regions.
Representative examples of hanging rootograms depicting the negative binomial model fits on the Javierre MK library. Observed counts are shown as histogram bins (gray bars) while the CHiCANE fitted expected counts distribution is in red. The y-axis represents square root transformed density estimates of observed (gray bars) and expected (red line) counts. For observed counts, the height of the bars is shifted to align the top of the bar with the expected counts fit. Bars above and below the reference line (x-axis) indicate over- and under-prediction by the CHiCANE model, respectively.
Examples of interpretation of interaction calling on Javierre MK library. a, Bar plots showing the proportion of interaction peaks (q-value < 0.05) by type (cis interactions include bait-to-bait interactions). b, Bar plots showing the number of interaction peaks (q-value < 0.05) across distance bins. c, Bar plots showing breakdown of region 1–10 Mb shown in (b).
Somatic mutations overlapping with 2q35 target fragments. Example of interaction peaks called by CHiCANE from 2q35 locus of T-47D library annotated (target fragments only) with PCAWG SNV/MNV data using bedtools intersect. Column vcf_info contains information about the variant including allelic fraction, number of reads supporting variant and reference alleles (in tumor sample) and variant’s classification. The column ‘vcf_file’ contains the name of the vcf file i.e a unique patient id recorded in the PCAWG study.
INDELs overlapping with 2q35 target fragments. Example of interaction peaks called by CHiCANE from the 2q35 locus of the T-47D library annotated (target fragments only) with PCAWG INDELs data using bedtools intersect. The column ‘vcf_info’ contains information about the variant including allelic fraction, number of reads supporting variant and reference alleles (in tumor sample) and variant’s classification. The column ‘vcf_file’ contains the name of the vcf file i.e a unique patient id recorded in the PCAWG study.
About this article
Cite this article
Holgersen, E.M., Gillespie, A., Leavy, O.C. et al. Identifying high-confidence capture Hi-C interactions using CHiCANE. Nat Protoc 16, 2257–2285 (2021). https://doi.org/10.1038/s41596-021-00498-1