Identifying high-confidence capture Hi-C interactions using CHiCANE

Holgersen, Erle M.; Gillespie, Andrea; Leavy, Olivia C.; Baxter, Joseph S.; Zvereva, Alisa; Muirhead, Gareth; Johnson, Nichola; Sipos, Orsolya; Dryden, Nicola H.; Broome, Laura R.; Chen, Yi; Kozin, Igor; Dudbridge, Frank; Fletcher, Olivia; Haider, Syed

doi:10.1038/s41596-021-00498-1

Protocol
Published: 09 April 2021

Identifying high-confidence capture Hi-C interactions using CHiCANE

Nature Protocols volume 16, pages 2257–2285 (2021)Cite this article

4866 Accesses
10 Citations
10 Altmetric
Metrics details

Subjects

Abstract

The ability to identify regulatory interactions that mediate gene expression changes through distal elements, such as risk loci, is transforming our understanding of how genomes are spatially organized and regulated. Capture Hi-C (CHi-C) is a powerful tool to delineate such regulatory interactions. However, primary analysis and downstream interpretation of CHi-C profiles remains challenging and relies on disparate tools with ad-hoc input/output formats and specific assumptions for statistical modeling. Here we present a data processing and interaction calling toolkit (CHiCANE), specialized for the analysis and meaningful interpretation of CHi-C assays. In this protocol, we demonstrate applications of CHiCANE to region capture Hi-C (rCHi-C) and promoter capture Hi-C (pCHi-C) libraries, followed by quality assessment of interaction peaks, as well as downstream analysis specific to rCHi-C and pCHi-C to aid functional interpretation. For a typical rCHi-C/pCHi-C dataset this protocol takes up to 3 d for users with a moderate understanding of R programming and statistical concepts, although this is dependent on dataset size and compute power available. CHiCANE is freely available at https://cran.r-project.org/web/packages/chicane.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 2: Model fitting of count data.**

**Fig. 3: Interpretation of interaction peaks by distance.**

**Fig. 4: Replicate concordance by distance bins.**

**Fig. 6: Visualizing interaction peaks.**

**Fig. 7: Enrichment of enhancer marks.**

Identification of significant chromatin contacts from HiChIP data by FitHiChIP

Article Open access 17 September 2019

Detecting chromosomal interactions in Capture Hi-C data with CHiCAGO and companion tools

Article 09 August 2021

HiC-DC+ enables systematic 3D interaction calls and differential analysis for Hi-C and HiChIP

Article Open access 07 June 2021

Data availability

Baitmaps for both sets of CHi-C libraries used in this study (rCHi-C T-47D¹⁰, pCHi-C MK⁹), hg38 HindIII in silico digest, HiCUP reports, CHiCANE’s unfiltered interactions, filtered interaction peaks (q-value < 0.05), and negative binomial model fit plots and statistics are available at https://doi.org/10.5281/zenodo.4073433.

Code availability

The CHiCANE R package is freely available through CRAN: https://cran.r-project.org/web/packages/chicane.

References

Dekker, J., Rippe, K., Dekker, M. & Kleckner, N. Capturing chromosome conformation. Science 295, 1306–1311 (2002).
Article CAS PubMed Google Scholar
Dostie, J. et al. Chromosome Conformation Capture Carbon Copy (5C): a massively parallel solution for mapping interactions between genomic elements. Genome Res. 16, 1299–1309 (2006).
Article CAS PubMed PubMed Central Google Scholar
Simonis, M. et al. Nuclear organization of active and inactive chromatin domains uncovered by chromosome conformation capture-on-chip (4C). Nat. Genet. 38, 1348–1354 (2006).
Article CAS PubMed Google Scholar
Belton, J. M. et al. Hi-C: a comprehensive technique to capture the conformation of genomes. Methods 58, 268–276 (2012).
Article CAS PubMed Google Scholar
Schmitt, A. D., Hu, M. & Ren, B. Genome-wide mapping and analysis of chromosome architecture. Nat. Rev. Mol. Cell Biol. 17, 743–755 (2016).
Article CAS PubMed PubMed Central Google Scholar
Dryden, N. H. et al. Unbiased analysis of potential targets of breast cancer susceptibility loci by Capture Hi-C. Genome Res. 24, 1854–1868 (2014).
Article CAS PubMed PubMed Central Google Scholar
Mifsud, B. et al. Mapping long-range promoter contacts in human cells with high-resolution capture Hi-C. Nat. Genet. 47, 598–606 (2015).
Article CAS PubMed Google Scholar
Davies, J. O. et al. Multiplexed analysis of chromosome conformation at vastly improved sensitivity. Nat. Methods 13, 74–80 (2016).
Article CAS PubMed Google Scholar
Javierre, B. M. et al. Lineage-specific genome architecture links enhancers and non-coding disease variants to target gene promoters. Cell 167, 1369–1384.e1319 (2016).
Article CAS PubMed PubMed Central Google Scholar
Baxter, J. S. et al. Capture Hi-C identifies putative target genes at 33 breast cancer risk loci. Nat. Commun. 9, 1028 (2018).
Article PubMed PubMed Central Google Scholar
Jager, R. et al. Capture Hi-C identifies the chromatin interactome of colorectal cancer risk loci. Nat. Commun. 6, 6178 (2015).
Article CAS PubMed Google Scholar
Martin, P. et al. Capture Hi-C reveals novel candidate genes and complex long-range interactions with related autoimmune risk loci. Nat. Commun. 6, 10069 (2015).
Article CAS PubMed Google Scholar
Orlando, G. et al. Promoter capture Hi-C-based identification of recurrent noncoding mutations in colorectal cancer. Nat. Genet. 50, 1375–1380 (2018).
Article CAS PubMed PubMed Central Google Scholar
Wingett, S. et al. HiCUP: pipeline for mapping and processing Hi-C data. F1000Res. 4, 1310 (2015).
Article PubMed PubMed Central Google Scholar
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Article CAS PubMed PubMed Central Google Scholar
Sanyal, A., Lajoie, B. R., Jain, G. & Dekker, J. The long-range interaction landscape of gene promoters. Nature 489, 109–113 (2012).
Article CAS PubMed PubMed Central Google Scholar
Kleiber, C. & Zeileis, A. Visualizing count data regressions using rootograms. Am. Stat. 70, 296–303 (2016).
Article Google Scholar
Ben Zouari, Y., Molitor, A. M., Sikorska, N., Pancaldi, V. & Sexton, T. ChiCMaxima: a robust and simple pipeline for detection and visualization of chromatin looping in Capture Hi-C. Genome Biol. 20, 102 (2019).
Article PubMed PubMed Central Google Scholar
Cairns, J. et al. CHiCAGO: robust detection of DNA looping interactions in Capture Hi-C data. Genome Biol. 17, 127 (2016).
Article PubMed PubMed Central Google Scholar
Mifsud, B. et al. GOTHiC, a probabilistic model to resolve complex biases and to identify real interactions in Hi-C data. PLoS ONE 12, e0174744 (2017).
Article PubMed PubMed Central Google Scholar
Forcato, M. et al. Comparison of computational methods for Hi-C data analysis. Nat. Methods 14, 679–685 (2017).
Article CAS PubMed PubMed Central Google Scholar
Rigby, R. & Stasinopoulos, D. Generalized additive models for location, scale and shape. Applied Statistics 54, 507–554 (2005).
Google Scholar
Yaffe, E. & Tanay, A. Probabilistic modeling of Hi-C contact maps eliminates systematic biases to characterize global chromosomal architecture. Nat. Genet. 43, 1059–1065 (2011).
Article CAS PubMed Google Scholar
Ay, F. & Noble, W. S. Analysis methods for studying the 3D architecture of the genome. Genome Biol. 16, 183 (2015).
Article PubMed PubMed Central Google Scholar
Kong, S. & Zhang, Y. Deciphering Hi-C: from 3D genome to function. Cell Biol. Toxicol. 35, 15–32 (2019).
Article CAS PubMed Google Scholar
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Article CAS PubMed PubMed Central Google Scholar
Li, H. & Durbin, R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 26, 589–595 (2010).
Article PubMed PubMed Central Google Scholar
Servant, N. et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol. 16, 259 (2015).
Article PubMed PubMed Central Google Scholar
Haider, S. et al. A bedr way of genomic interval processing. Source Code Biol. Med. 11, 14 (2016).
Article PubMed PubMed Central Google Scholar
The ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium. Pan-cancer analysis of whole genomes. Nature 578, 82–93 (2020).
Li, D., Hsu, S., Purushotham, D., Sears, R. L. & Wang, T. WashU Epigenome Browser update 2019. Nuc. Acids Res. 47, W158–W165 (2019).
Article CAS Google Scholar
Hahne, F. & Ivanek, R. Visualizing genomic data using Gviz and Bioconductor. Methods Mol. Biol. 1418, 335–351 (2016).
Article PubMed Google Scholar
Di Tommaso, P. et al. Nextflow enables reproducible computational workflows. Nat. Biotechnol. 35, 316–319 (2017).
Article PubMed Google Scholar
Koster, J. & Rahmann, S. Snakemake-a scalable bioinformatics workflow engine. Bioinformatics 34, 3600 (2018).
Article PubMed Google Scholar
Ghoussaini, M. et al. Evidence that breast cancer risk at the 2q35 locus is mediated through IGFBP5 regulation. Nat. Commun. 4, 4999 (2014).
Article PubMed Google Scholar
Fudenberg, G., Getz, G., Meyerson, M. & Mirny, L. A. High order chromatin architecture shapes the landscape of chromosomal alterations in cancer. Nat. Biotechnol. 29, 1109–1113 (2011).
Article CAS PubMed PubMed Central Google Scholar
De, S. & Michor, F. DNA replication timing and long-range DNA interactions predict mutational landscapes of cancer genomes. Nat. Biotechnol. 29, 1103–1108 (2011).
Article CAS PubMed PubMed Central Google Scholar
Zhang, Y. et al. Spatial organization of the mouse genome and its role in recurrent chromosomal translocations. Cell 148, 908–921 (2012).
Article CAS PubMed PubMed Central Google Scholar
Brodie, A., Azaria, J. R. & Ofran, Y. How far from the SNP may the causative genes be? Nuc. Acids Res. 44, 6046–6054 (2016).
Article CAS Google Scholar
Consortium, E. P. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
Article Google Scholar
Hahne, F. & Ivanek, R. Visualizing genomic data using Gviz and Bioconductor. in Statistical Genomics: Methods and Protocols 335–351 (Springer Science+Business Media, 2016).
Cui, Y. et al. BioCircos.js: an interactive Circos JavaScript library for biological data visualization on web applications. Bioinformatics 32, 1740–1742 (2016).
Article CAS PubMed Google Scholar
Lawrence, M., Daujat, S. & Schneider, R. Lateral thinking: how histone modifications regulate gene expression. Trends Genet. 32, 42–56 (2016).
Article CAS PubMed Google Scholar
Bannister, A. J. & Kouzarides, T. Regulation of chromatin by histone modifications. Cell Res. 21, 381–395 (2011).
Article CAS PubMed PubMed Central Google Scholar
Stunnenberg, H. G., International Human Epigenome, C. & Hirst, M. The International Human Epigenome Consortium: a blueprint for scientific collaboration and discovery. Cell 167, 1897 (2016).
Article CAS PubMed Google Scholar
Szabo, Q., Bantignies, F. & Cavalli, G. Principles of genome folding into topologically associating domains. Sci. Adv. 5, eaaw1668 (2019).
Article CAS PubMed PubMed Central Google Scholar
Dowen, J. M. et al. Control of cell identity genes occurs in insulated neighborhoods in mammalian chromosomes. Cell 159, 374–387 (2014).
Article CAS PubMed PubMed Central Google Scholar
Servant, N., Varoquaux, N., Heard, E., Barillot, E. & Vert, J. P. Effective normalization for copy number variation in Hi-C data. BMC Bioinformatics 19, 313 (2018).
Article PubMed PubMed Central Google Scholar
Franke, M. et al. Formation of new chromatin domains determines pathogenicity of genomic duplications. Nature 538, 265–269 (2016).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

We thank Breast Cancer Now for funding this work as part of Programme Funding to The Breast Cancer Now Toby Robins Research Centre. This study makes use of data generated by the PCHI-C Consortium. A full list of the investigators who contributed to the generation of the data is available in Javierre et al.⁹, which was funded by the National Institute for Health Research of England, UK Medical Research Council (MR/L007150/1) and UK Biotechnology and Biological Research Council (BB/J004480/1). We also thank D. Li from the WashU Epigenome Browser team for implementing support for CHiCANE’s standard format in the Epigenome Browser.

Author information

These authors contributed equally: Erle M. Holgersen, Andrea Gillespie.

Authors and Affiliations

The Breast Cancer Now Toby Robins Research Centre, The Institute of Cancer Research, London, UK
Erle M. Holgersen, Andrea Gillespie, Joseph S. Baxter, Alisa Zvereva, Gareth Muirhead, Nichola Johnson, Orsolya Sipos, Nicola H. Dryden, Laura R. Broome, Olivia Fletcher & Syed Haider
Department of Non-communicable Disease Epidemiology, London School of Hygiene and Tropical Medicine, London, UK
Olivia C. Leavy
Department of Health Sciences, University of Leicester, Leicester, UK
Olivia C. Leavy & Frank Dudbridge
Scientific Computing, The Institute of Cancer Research, London, UK
Yi Chen & Igor Kozin

Authors

Erle M. Holgersen
View author publications
You can also search for this author in PubMed Google Scholar
Andrea Gillespie
View author publications
You can also search for this author in PubMed Google Scholar
Olivia C. Leavy
View author publications
You can also search for this author in PubMed Google Scholar
Joseph S. Baxter
View author publications
You can also search for this author in PubMed Google Scholar
Alisa Zvereva
View author publications
You can also search for this author in PubMed Google Scholar
Gareth Muirhead
View author publications
You can also search for this author in PubMed Google Scholar
Nichola Johnson
View author publications
You can also search for this author in PubMed Google Scholar
Orsolya Sipos
View author publications
You can also search for this author in PubMed Google Scholar
Nicola H. Dryden
View author publications
You can also search for this author in PubMed Google Scholar
Laura R. Broome
View author publications
You can also search for this author in PubMed Google Scholar
Yi Chen
View author publications
You can also search for this author in PubMed Google Scholar
Igor Kozin
View author publications
You can also search for this author in PubMed Google Scholar
Frank Dudbridge
View author publications
You can also search for this author in PubMed Google Scholar
Olivia Fletcher
View author publications
You can also search for this author in PubMed Google Scholar
Syed Haider
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

E.M.H., O.F. and S.H. designed the study. E.M.H., O.C.L., O.F., F.D. and S.H. designed CHiCANE. E.M.H. and S.H. implemented the R package. E.M.H., A.G., O.C.L., G.M., O.S., O.F., F.D. and S.H. performed statistical experiments and interpreted data. J.B., A.Z., N.J., N.D. and L.B. generated capture Hi-C data. E.M.H., A.G., O.F. and S.H. wrote the manuscript with contributions from all authors. Y.C. and I.K. implemented dissemination of processed data. O.F. and S.H. supervised the experiments.

Corresponding authors

Correspondence to Olivia Fletcher or Syed Haider.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Protocols thanks Peter Robinson and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Visualizing interaction peaks in WashU Epigenome Browser.

Image from WashU Epigenome Browser showing non-bait to bait interaction peaks (q-value < 0.05) called at known breast cancer risk loci by CHiCANE in the Baxter T-47D libraries at a, 16q12.2 locus and b, 14q24.1 locus. Yellow boxes show the captured regions.

Extended Data Fig. 2 Model fitting of counts data (pCHi-C library).

Representative examples of hanging rootograms depicting the negative binomial model fits on the Javierre MK library. Observed counts are shown as histogram bins (gray bars) while the CHiCANE fitted expected counts distribution is in red. The y-axis represents square root transformed density estimates of observed (gray bars) and expected (red line) counts. For observed counts, the height of the bars is shifted to align the top of the bar with the expected counts fit. Bars above and below the reference line (x-axis) indicate over- and under-prediction by the CHiCANE model, respectively.

Extended Data Fig. 3 Interpretation of interaction peaks by distance.

Examples of interpretation of interaction calling on Javierre MK library. a, Bar plots showing the proportion of interaction peaks (q-value < 0.05) by type (cis interactions include bait-to-bait interactions). b, Bar plots showing the number of interaction peaks (q-value < 0.05) across distance bins. c, Bar plots showing breakdown of region 1–10 Mb shown in (b).

Supplementary information

Supplementary Table 1

Somatic mutations overlapping with 2q35 target fragments. Example of interaction peaks called by CHiCANE from 2q35 locus of T-47D library annotated (target fragments only) with PCAWG SNV/MNV data using bedtools intersect. Column vcf_info contains information about the variant including allelic fraction, number of reads supporting variant and reference alleles (in tumor sample) and variant’s classification. The column ‘vcf_file’ contains the name of the vcf file i.e a unique patient id recorded in the PCAWG study.

Supplementary Table 2

INDELs overlapping with 2q35 target fragments. Example of interaction peaks called by CHiCANE from the 2q35 locus of the T-47D library annotated (target fragments only) with PCAWG INDELs data using bedtools intersect. The column ‘vcf_info’ contains information about the variant including allelic fraction, number of reads supporting variant and reference alleles (in tumor sample) and variant’s classification. The column ‘vcf_file’ contains the name of the vcf file i.e a unique patient id recorded in the PCAWG study.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Holgersen, E.M., Gillespie, A., Leavy, O.C. et al. Identifying high-confidence capture Hi-C interactions using CHiCANE. Nat Protoc 16, 2257–2285 (2021). https://doi.org/10.1038/s41596-021-00498-1

Download citation

Received: 20 April 2020
Accepted: 12 January 2021
Published: 09 April 2021
Issue Date: April 2021
DOI: https://doi.org/10.1038/s41596-021-00498-1

This article is cited by

Widespread allele-specific topological domains in the human genome are not confined to imprinted gene clusters
- Stephen Richer
- Yuan Tian
- Giuseppina Pisignano
Genome Biology (2023)
SMCHD1 has separable roles in chromatin architecture and gene silencing that could be targeted in disease
- Andres Tapia del Fierro
- Bianca den Hamer
- Marnie E. Blewitt
Nature Communications (2023)
Detecting chromosomal interactions in Capture Hi-C data with CHiCAGO and companion tools
- Paula Freire-Pritchett
- Helen Ray-Jones
- Valeriya Malysheva
Nature Protocols (2021)
Transcriptional enhancers and their communication with gene promoters
- Helen Ray-Jones
- Mikhail Spivakov
Cellular and Molecular Life Sciences (2021)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Identifying high-confidence capture Hi-C interactions using CHiCANE

Subjects

Abstract

Access options

Similar content being viewed by others

Identification of significant chromatin contacts from HiChIP data by FitHiChIP

Detecting chromosomal interactions in Capture Hi-C data with CHiCAGO and companion tools

HiC-DC+ enables systematic 3D interaction calls and differential analysis for Hi-C and HiChIP

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Related links

Extended data

Extended Data Fig. 1 Visualizing interaction peaks in WashU Epigenome Browser.

Extended Data Fig. 2 Model fitting of counts data (pCHi-C library).

Extended Data Fig. 3 Interpretation of interaction peaks by distance.

Supplementary information

Supplementary Table 1

Supplementary Table 2

Rights and permissions

About this article

Cite this article

This article is cited by

Widespread allele-specific topological domains in the human genome are not confined to imprinted gene clusters

SMCHD1 has separable roles in chromatin architecture and gene silencing that could be targeted in disease

Detecting chromosomal interactions in Capture Hi-C data with CHiCAGO and companion tools

Transcriptional enhancers and their communication with gene promoters

Comments

Capture Hi-C identifies putative target genes at 33 breast cancer risk loci

Search

Quick links

Subjects

Abstract

Access options

Similar content being viewed by others

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Related links

Extended data

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Comments

Search

Quick links