Capture Hi-C is widely used to obtain high-resolution profiles of chromosomal interactions involving, at least on one end, regions of interest such as gene promoters. Signal detection in Capture Hi-C data is challenging and cannot be adequately accomplished with tools developed for other chromosome conformation capture methods, including standard Hi-C. Capture Hi-C Analysis of Genomic Organization (CHiCAGO) is a computational pipeline developed specifically for Capture Hi-C analysis. It implements a statistical model accounting for biological and technical background components, as well as bespoke normalization and multiple testing procedures for this data type. Here we provide a step-by-step guide to the CHiCAGO workflow that is aimed at users with basic experience of the command line and R. We also describe more advanced strategies for tuning the key parameters for custom experiments and provide guidance on data preprocessing and downstream analysis using companion tools. In a typical experiment, CHiCAGO takes ~2–3 h to run, although pre- and postprocessing steps may take much longer.
Your institute does not have access to this article
Subscribe to Nature+
Get immediate online access to the entire Nature family of 50+ journals
Subscribe to Journal
Get full journal access for 1 year
only $9.92 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Get time limited or full article access on ReadCube.
All prices are NET prices.
All of the figures for this paper were produced using publicly available data from Ray-Jones et al.36, Montefiori et al.33, and Choy et al.34. We provide downsampled FASTQ files and all intermediate file types (.bam, .chinput, .Rds) from Ray-Jones et al. on the OSF repository (https://osf.io/kt67f) to allow readers to test either the full pipeline or specific analysis steps.
The Chicago and PCHiCdata R packages are available from Bioconductor and from the Bitbucket repository: https://bitbucket.org/chicagoTeam/chicago. The chicagoTools suite of auxiliary scripts is available from the same Bitbucket repository. Full documentation and installation instructions for HiCUP are available from https://www.bioinformatics.babraham.ac.uk/projects/hicup/. The Peaky R package is available from the GitHub repository: http://github.com/cqgd/pky. The Chicdiff R package is available from the GitHub repository: https://github.com/RegulatoryGenomicsGroup/chicdiff. The code presented in the Procedure and the versions of the software used in this protocol are deposited on OSF: https://osf.io/kt67f/ (DOI 10.17605/OSF.IO/KT67F).
Schoenfelder, S. & Fraser, P. Long-range enhancer-promoter contacts in gene expression control. Nat. Rev. Genet. 20, 437–455 (2019).
Schmitt, A. D., Hu, M. & Ren, B. Genome-wide mapping and analysis of chromosome architecture. Nat. Rev. Mol. Cell Biol. 17, 743–755 (2016).
Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289–293 (2009).
van Berkum, N. L. et al. Hi-C: a method to study the three-dimensional architecture of genomes. J. Vis. Exp. https://doi.org/10.3791/1869 (2020).
Dixon, J. R. et al. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature 485, 376–380 (2012).
Nora, E. P. et al. Spatial partitioning of the regulatory landscape of the X-inactivation centre. Nature 485, 381–385 (2012).
Sexton, T. et al. Three-dimensional folding and functional organization principles of the Drosophila genome. Cell 148, 458–472 (2012).
Schoenfelder, S. et al. The pluripotent regulatory circuitry connecting promoters to their long-range interacting elements. Genome Res. 25, 582–597 (2015).
Mifsud, B. et al. Mapping long-range promoter contacts in human cells with high-resolution capture Hi-C. Nat. Genet. 47, 598–606 (2015).
Sahlén, P. et al. Genome-wide mapping of promoter-anchored interactions with close to single-enhancer resolution. Genome Biol. 16, 156 (2015).
Hughes, J. R. et al. Analysis of hundreds of cis-regulatory landscapes at high resolution in a single, high-throughput experiment. Nat. Genet. 46, 205–212 (2014).
Würtele, H. & Chartrand, P. Genome-wide scanning of HoxB1-associated loci in mouse ES cells using an open-ended chromosome conformation capture methodology. Chromosome Res. 14, 477–495 (2006).
Simonis, M. et al. Nuclear organization of active and inactive chromatin domains uncovered by chromosome conformation capture-on-chip (4C). Nat. Genet. 38, 1348–1354 (2006).
Zhao, Z. et al. Circular chromosome conformation capture (4C) uncovers extensive networks of epigenetically regulated intra- and interchromosomal interactions. Nat. Genet. 38, 1341–1347 (2006).
Cairns, J. et al. CHiCAGO: robust detection of DNA looping interactions in Capture Hi-C data. Genome Biol. 17, 127 (2016).
Huber, W. et al. Orchestrating high-throughput genomic analysis with Bioconductor. Nat. Methods 12, 115–121 (2015).
Rosa, A., Becker, N. B. & Everaers, R. Looping probabilities in model interphase chromosomes. Biophys. J. 98, 2410–2419 (2010).
Bohn, M. & Heermann, D. W. Diffusion-driven looping provides a consistent framework for chromatin organization. PLoS One 5, e12218 (2010).
Wingett, S. et al. HiCUP: pipeline for mapping and processing Hi-C data. F1000Res. 4, 1310 (2015).
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B Methodol. 57, 289–300 (1995).
Genovese, C. R., Roeder, K. & Wasserman, L. False discovery control with p-value weighting. Biometrika 93, 509–524 (2006).
Ignatiadis, N., Klaus, B., Zaugg, J. B. & Huber, W. Data-driven hypothesis weighting increases detection power in genome-scale multiple testing. Nat. Methods https://doi.org/10.1038/nmeth.3885 (2016).
Freire-Pritchett, P. et al. Global reorganisation of cis-regulatory units upon lineage commitment of human embryonic stem cells. eLife 6, e21926 (2017).
Novo, C. L. et al. Long-range enhancer interactions are prevalent in mouse embryonic stem cells and are reorganized upon pluripotent state transition. Cell Rep. 22, 2615–2627 (2018).
Chovanec, P. et al. Widespread reorganisation of pluripotent factor binding and gene regulatory interactions between human pluripotent states. Nat. Commun. 12, 2098 (2021).
Siersbæk, R. et al. Dynamic rewiring of promoter-anchored chromatin loops during adipocyte differentiation. Mol. Cell 66, 420–435.e5 (2017).
Rubin, A. J. et al. Lineage-specific dynamic and pre-established enhancer-promoter contacts cooperate in terminal differentiation. Nat. Genet. 49, 1522–1528 (2017).
Thiecke, M. J. et al. Cohesin-dependent and -independent mechanisms mediate chromosomal contacts between promoters and enhancers. Cell Rep. 32, 107929 (2020).
Javierre, B. M. et al. Lineage-specific genome architecture links enhancers and non-coding disease variants to target gene promoters. Cell 167, 1369–1384.e19 (2016).
Burren, O. S. et al. Chromosome contacts in activated T cells identify autoimmune disease candidate genes. Genome Biol. 18, 165 (2017).
Petersen, R. et al. Platelet function is modified by common sequence variation in megakaryocyte super enhancers. Nat. Commun. 8, 16058 (2017).
Litchfield, K. et al. Identification of 19 new risk loci and potential regulatory mechanisms influencing susceptibility to testicular germ cell tumor. Nat. Genet. 49, 1133–1140 (2017).
Montefiori, L. E. et al. A promoter interaction map for cardiovascular disease genetics. eLife 7, e35788 (2018).
Choy, M. K. et al. Promoter interactome of human embryonic stem cell-derived cardiomyocytes connects GWAS regions to cardiac gene networks. Nat. Commun. 9, 2526 (2018).
Joshi, O. et al. Dynamic reorganization of extremely long-range promoter-promoter interactions between two states of pluripotency. Cell Stem Cell 17, 748–757 (2015).
Ray-Jones, H. et al. Mapping DNA interaction landscapes in psoriasis susceptibility loci highlights KLF4 as a target gene in 9q31. BMC Biol. 18, 47 (2020).
Martin, P. et al. Chromatin interactions reveal novel gene targets for drug repositioning in rheumatic diseases. Ann. Rheum. Dis. 78, 1127–1134 (2019).
Ghavi-Helm, Y. et al. Highly rearranged chromosomes reveal uncoupling between genome topology and gene expression. Nat. Genet. 51, 1272–1282 (2019).
Andrey, G. et al. Characterization of hundreds of regulatory landscapes in developing limbs reveals two regimes of chromatin folding. Genome Res. 27, 223–233 (2017).
Su, C. et al. Mapping effector genes at lupus GWAS loci using promoter Capture-C in follicular helper T cells. Nat. Commun. 11, 3294 (2020).
Chesi, A. et al. Genome-scale Capture C promoter interactions implicate effector genes at GWAS loci for bone mineral density. Nat. Commun. 10, 1260 (2019).
Anil, A., Spalinskas, R., Åkerborg, Ö. & Sahlén, P. HiCapTools: a software suite for probe design and proximity detection for targeted chromosome conformation capture applications. Bioinformatics 34, 675–677 (2018).
Ben Zouari, Y., Molitor, A. M., Sikorska, N., Pancaldi, V. & Sexton, T. ChiCMaxima: a robust and simple pipeline for detection and visualization of chromatin looping in Capture Hi-C. Genome Biol. 20, 102 (2019).
Eijsbouts, C. Q., Burren, O. S., Newcombe, P. J. & Wallace, C. Fine mapping chromatin contacts in capture Hi-C data. BMC Genomics 20, 77 (2019).
Cairns, J., Orchard, W. R., Malysheva, V. & Spivakov, M. Chicdiff: a computational pipeline for detecting differential chromosomal interactions in Capture Hi-C data. Bioinformatics 35, 4764–4766 (2019).
Holgersen, E. M. et al. Identifying high-confidence capture Hi-C interactions using CHiCANE. Nat. Protoc. 16, 2257–2285 (2021).
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).
Thiecke, M. J. et al. Cohesin-dependent and -independent mechanisms mediate chromosomal contacts between promoters and enhancers. Cell Rep. 32, 107929 (2020).
Ay, F., Bailey, T. L. & Noble, W. S. Statistical confidence estimation for Hi-C data reveals regulatory chromatin contacts. Genome Res. 24, 999–1011 (2014).
Heinz, S. et al. Transcription elongation can affect genome 3D structure. Cell 174, 1522–1536.e22 (2018).
Imakaev, M. et al. Iterative correction of Hi-C data reveals hallmarks of chromosome organization. Nat. Methods 9, 999–1003 (2012).
Beccari, L. et al. Dbx2 regulation in limbs suggests inter-TAD sharing of enhancers. Dev. Dyn. https://doi.org/10.1002/dvdy.303 (2021).
Su, C., Pahl, M. C., Grant, S. F. A. & Wells, A. D. Restriction enzyme selection dictates detection range sensitivity in chromatin conformation capture-based variant-to-gene mapping approaches. Preprint at bioRxiv https://doi.org/10.1101/2020.12.15.422932 (2020).
Disney-Hogg, L., Kinnersley, B. & Houlston, R. Algorithmic considerations when analysing capture Hi-C data. Wellcome Open Res. 5, 289 (2020).
Feldmann, A., Dimitrova, E., Kenney, A., Lastuvkova, A. & Klose, R. J. CDK-Mediator and FBXL19 prime developmental genes for activation by promoting atypical regulatory interactions. Nucleic Acids Res. 48, 2942–2955 (2020).
Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Zhou, X. et al. The Human Epigenome Browser at Washington University. Nat. Methods 8, 989–990 (2011).
Zhou, X. et al. Exploring long-range genome interactions using the WashU Epigenome Browser. Nat. Methods 10, 375–376 (2013).
We thank all users of CHiCAGO and associated packages for providing test data and reporting issues. Research in M.S. lab is supported by core funding from the UK’s Medical Research Council (MRC) (MC-A652-5QA20). S.W.W. acknowledges core support from the UK’s Biotechnology and Biological Sciences Research Council (BBSRC). C.W. is supported by the MRC (MC_UU_00002/4) and the Wellcome Trust (WT107881, 215097/Z/18/Z).
P.F.P. is currently an employee of Inivata Limited. J.C. is currently an employee of AstraZeneca and may or may not own stock options. M.S. is a cofounder of Enhanc3D Genomics Ltd. The rest of the authors declare no competing interests.
Peer review information Nature Protocols thanks Andrea M. Chiariello, Fulai Jin and Yun Li for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Original methodological articles
Cairns, J. et al. Genome Biol. 17, 127 (2016): https://doi.org/10.1186/s13059-016-0992-2
Cairns, J. et al. Bioinformatics 35, 4764–4766 (2019): https://doi.org/10.1093/bioinformatics/btz450
Eijsbouts, C. et al. BMC Genomics 20, 77 (2019): https://doi.org/10.1186/s12864-018-5314-5
Wingett, S. et al. F1000Res. 4, 1310 (2015): https://doi.org/10.12688/f1000research.7334.1
Key data used in this protocol
Ray-Jones, H. et al. BMC Biol. 18, 47 (2020): https://doi.org/10.1186/s12915-020-00779-3
Choy, M. et al. Nat. Commun. 9, 2526 (2018): https://doi.org/10.1038/s41467-018-07399-0
Montefiori, L. et al. eLife 7, (2018): https://doi.org/10.7554/eLife.35788.001
Key articles using this protocol
Javierre, B. M. et al. Cell 167, 1369–1384.e19 (2016): https://doi.org/10.1016/j.cell.2016.09.037
Siersbæk, R. et al. Mol. Cell 66, 420–435.e5 (2017): https://doi.org/10.1016/j.molcel.2017.04.010
Rubin, A. et al. Nat. Genet. 49, 1522–1528 (2017): https://doi.org/10.1038/ng.3935
Orlando, G. et al. Nat. Genet. 50, 1375–1380 (2018): https://doi.org/10.1038/s41588-018-0211-z
Extended Data Fig. 1 Comparative analysis of PCHi-C data generated with a four- and a six-cutter restriction enzyme.
Three MboI PCHi-C replicates obtained from iPSC-derived cardiomyocytes (iPSC CMs33) were processed by CHiCAGO either at the restriction fragment level, using standard 4 bp cutter settings or in 5 kb bins, as described in the Procedure. Three HindIII PCHi-C replicates obtained from hESC-derived cardiomyocytes (hESC CMs34) were processed using standard 6 bp cutter settings. Only genes baited in both iPSC CMs and hESC CMs were included in the comparative analysis. An interaction was considered shared when the middle of the significantly interacting fragments in the MboI data fell within the respective interacting fragments in the HindIII dataset (CHiCAGO score >5). When several interactions in MboI data overlapped with the same HindIII interaction, it was counted as a single shared interaction to avoid double-counting. a,b, Comparison between MboI and HindIII PCHi-C datasets in nonbinned mode (a) and binned mode (b). The violin plots show the distance distribution of significant interactions belonging to shared, MboI- and HindIII-specific groups. The number of significant interactions in each group is indicated in gray. The barplots show enrichment for regulatory histone marks (as a ratio between observed and expected) in each group of interactions.
MyLa CHi-C36 replicate 1 was downsampled to 20 million raw read pairs and processed using HiCUP19, as described in the Procedure. a, Truncation, alignment to GRCh37 and pairing results for read 1 (dark blue) and read 2 (light blue). The ~15 million paired reads are taken forwards for filtering. b, Detection of valid Hi-C di-tags (dark blue) and removal of Hi-C artifacts such as religation products (turquoise) and di-tags falling outside the specified size range (orange). c, Size distribution of di-tags with limits shown as red lines. d, Interacting fragments are grouped into cis < 10 kb (dark blue), cis > 10 kb (light blue) and trans (green) for di-tags before removal of PCR duplicates (left) and after (right).
Downsampled CHi-C datasets36 were processed by CHiCAGO using both replicates per cell line as described in the Procedure. a, Barplot showing the scaling factors (si’s) computed for each pool of other ends for MyLa. b, Boxplots showing distribution of technical noise estimates for each pool of baits/viewpoints (top) and for each pool of other ends (bottom) for MyLa. c, Distance dependency of background counts and computed fit (red curve), plotted on a log–log scale for MyLa. d, Interaction profiles for the bait 670997, assigned to rs4141001, in MyLa (top) and HaCaT (bottom). High-scoring interactions detected by CHiCAGO (score ≥5) are shown in red, and subthreshold interactions (3 ≤ score < 5) are shown in blue. e, Number of overlaps between chromatin features of interacting fragments detected using CHiCAGO (yellow bars) versus number of overlaps from 100 random distance-matched subsets of HindIII fragments (blue bars) in MyLa (top) and HaCaT (bottom). Error bars represent 95% confidence intervals.
a, Dendrogram for downsampled HaCaT and MyLa samples36 obtained from running getPeakMatrix as outlined in the Procedure. b, Chicdiff45 bait profiles were generated for four loci as described in the Procedure. The plots show the raw read counts versus linear distance from the bait fragment as mirror images for HaCaT and MyLa. Other-end interacting fragments are pooled and color-coded by their adjusted weighted P-value.
The full MyLa CHi-C36 data were processed by CHiCAGO using both replicates and then analyzed using Peaky44. The top panel shows the distribution of raw read counts for other end fragments for the bait 642001, with high-scoring interactions (CHiCAGO score ≥ 5) highlighted in blue. The second panel shows the CHiCAGO adjusted read counts with high-scoring interactions (CHiCAGO score ≥ 5) highlighted in blue and with the Peaky model fitted as a green line. The third panel shows CHiCAGO scores for those interactions with the blue dashed line showing the score cutoff of 5. In the bottom panel, the probability of each other-end fragment being a causal contact is quantified as the marginal posterior probability of contact (MPPC). Based on this metric, a number of fragments with CHiCAGO score ≥ 5 (points highlighted in blue) have MPPC very close to zero. After discounting these, a smaller subset of fine-mapped interactions may be identified.
About this article
Cite this article
Freire-Pritchett, P., Ray-Jones, H., Della Rosa, M. et al. Detecting chromosomal interactions in Capture Hi-C data with CHiCAGO and companion tools. Nat Protoc 16, 4144–4176 (2021). https://doi.org/10.1038/s41596-021-00567-5
Nature Protocols (2022)