Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Protocol
  • Published:

Identifying high-confidence capture Hi-C interactions using CHiCANE

Abstract

The ability to identify regulatory interactions that mediate gene expression changes through distal elements, such as risk loci, is transforming our understanding of how genomes are spatially organized and regulated. Capture Hi-C (CHi-C) is a powerful tool to delineate such regulatory interactions. However, primary analysis and downstream interpretation of CHi-C profiles remains challenging and relies on disparate tools with ad-hoc input/output formats and specific assumptions for statistical modeling. Here we present a data processing and interaction calling toolkit (CHiCANE), specialized for the analysis and meaningful interpretation of CHi-C assays. In this protocol, we demonstrate applications of CHiCANE to region capture Hi-C (rCHi-C) and promoter capture Hi-C (pCHi-C) libraries, followed by quality assessment of interaction peaks, as well as downstream analysis specific to rCHi-C and pCHi-C to aid functional interpretation. For a typical rCHi-C/pCHi-C dataset this protocol takes up to 3 d for users with a moderate understanding of R programming and statistical concepts, although this is dependent on dataset size and compute power available. CHiCANE is freely available at https://cran.r-project.org/web/packages/chicane.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: CHiCANE protocol workflow.
Fig. 2: Model fitting of count data.
Fig. 3: Interpretation of interaction peaks by distance.
Fig. 4: Replicate concordance by distance bins.
Fig. 5: eQTL analysis.
Fig. 6: Visualizing interaction peaks.
Fig. 7: Enrichment of enhancer marks.
Fig. 8: MK locus plot with TADs.
Fig. 9: Testing different models.

Similar content being viewed by others

Data availability

Baitmaps for both sets of CHi-C libraries used in this study (rCHi-C T-47D10, pCHi-C MK9), hg38 HindIII in silico digest, HiCUP reports, CHiCANE’s unfiltered interactions, filtered interaction peaks (q-value < 0.05), and negative binomial model fit plots and statistics are available at https://doi.org/10.5281/zenodo.4073433.

Code availability

The CHiCANE R package is freely available through CRAN: https://cran.r-project.org/web/packages/chicane.

References

  1. Dekker, J., Rippe, K., Dekker, M. & Kleckner, N. Capturing chromosome conformation. Science 295, 1306–1311 (2002).

    Article  CAS  PubMed  Google Scholar 

  2. Dostie, J. et al. Chromosome Conformation Capture Carbon Copy (5C): a massively parallel solution for mapping interactions between genomic elements. Genome Res. 16, 1299–1309 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Simonis, M. et al. Nuclear organization of active and inactive chromatin domains uncovered by chromosome conformation capture-on-chip (4C). Nat. Genet. 38, 1348–1354 (2006).

    Article  CAS  PubMed  Google Scholar 

  4. Belton, J. M. et al. Hi-C: a comprehensive technique to capture the conformation of genomes. Methods 58, 268–276 (2012).

    Article  CAS  PubMed  Google Scholar 

  5. Schmitt, A. D., Hu, M. & Ren, B. Genome-wide mapping and analysis of chromosome architecture. Nat. Rev. Mol. Cell Biol. 17, 743–755 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Dryden, N. H. et al. Unbiased analysis of potential targets of breast cancer susceptibility loci by Capture Hi-C. Genome Res. 24, 1854–1868 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Mifsud, B. et al. Mapping long-range promoter contacts in human cells with high-resolution capture Hi-C. Nat. Genet. 47, 598–606 (2015).

    Article  CAS  PubMed  Google Scholar 

  8. Davies, J. O. et al. Multiplexed analysis of chromosome conformation at vastly improved sensitivity. Nat. Methods 13, 74–80 (2016).

    Article  CAS  PubMed  Google Scholar 

  9. Javierre, B. M. et al. Lineage-specific genome architecture links enhancers and non-coding disease variants to target gene promoters. Cell 167, 1369–1384.e1319 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Baxter, J. S. et al. Capture Hi-C identifies putative target genes at 33 breast cancer risk loci. Nat. Commun. 9, 1028 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  11. Jager, R. et al. Capture Hi-C identifies the chromatin interactome of colorectal cancer risk loci. Nat. Commun. 6, 6178 (2015).

    Article  CAS  PubMed  Google Scholar 

  12. Martin, P. et al. Capture Hi-C reveals novel candidate genes and complex long-range interactions with related autoimmune risk loci. Nat. Commun. 6, 10069 (2015).

    Article  CAS  PubMed  Google Scholar 

  13. Orlando, G. et al. Promoter capture Hi-C-based identification of recurrent noncoding mutations in colorectal cancer. Nat. Genet. 50, 1375–1380 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Wingett, S. et al. HiCUP: pipeline for mapping and processing Hi-C data. F1000Res. 4, 1310 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  15. Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Sanyal, A., Lajoie, B. R., Jain, G. & Dekker, J. The long-range interaction landscape of gene promoters. Nature 489, 109–113 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Kleiber, C. & Zeileis, A. Visualizing count data regressions using rootograms. Am. Stat. 70, 296–303 (2016).

    Article  Google Scholar 

  18. Ben Zouari, Y., Molitor, A. M., Sikorska, N., Pancaldi, V. & Sexton, T. ChiCMaxima: a robust and simple pipeline for detection and visualization of chromatin looping in Capture Hi-C. Genome Biol. 20, 102 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  19. Cairns, J. et al. CHiCAGO: robust detection of DNA looping interactions in Capture Hi-C data. Genome Biol. 17, 127 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  20. Mifsud, B. et al. GOTHiC, a probabilistic model to resolve complex biases and to identify real interactions in Hi-C data. PLoS ONE 12, e0174744 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  21. Forcato, M. et al. Comparison of computational methods for Hi-C data analysis. Nat. Methods 14, 679–685 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Rigby, R. & Stasinopoulos, D. Generalized additive models for location, scale and shape. Applied Statistics 54, 507–554 (2005).

    Google Scholar 

  23. Yaffe, E. & Tanay, A. Probabilistic modeling of Hi-C contact maps eliminates systematic biases to characterize global chromosomal architecture. Nat. Genet. 43, 1059–1065 (2011).

    Article  CAS  PubMed  Google Scholar 

  24. Ay, F. & Noble, W. S. Analysis methods for studying the 3D architecture of the genome. Genome Biol. 16, 183 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  25. Kong, S. & Zhang, Y. Deciphering Hi-C: from 3D genome to function. Cell Biol. Toxicol. 35, 15–32 (2019).

    Article  CAS  PubMed  Google Scholar 

  26. Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Li, H. & Durbin, R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 26, 589–595 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  28. Servant, N. et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol. 16, 259 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  29. Haider, S. et al. A bedr way of genomic interval processing. Source Code Biol. Med. 11, 14 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  30. The ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium. Pan-cancer analysis of whole genomes. Nature 578, 82–93 (2020).

  31. Li, D., Hsu, S., Purushotham, D., Sears, R. L. & Wang, T. WashU Epigenome Browser update 2019. Nuc. Acids Res. 47, W158–W165 (2019).

    Article  CAS  Google Scholar 

  32. Hahne, F. & Ivanek, R. Visualizing genomic data using Gviz and Bioconductor. Methods Mol. Biol. 1418, 335–351 (2016).

    Article  PubMed  Google Scholar 

  33. Di Tommaso, P. et al. Nextflow enables reproducible computational workflows. Nat. Biotechnol. 35, 316–319 (2017).

    Article  PubMed  Google Scholar 

  34. Koster, J. & Rahmann, S. Snakemake-a scalable bioinformatics workflow engine. Bioinformatics 34, 3600 (2018).

    Article  PubMed  Google Scholar 

  35. Ghoussaini, M. et al. Evidence that breast cancer risk at the 2q35 locus is mediated through IGFBP5 regulation. Nat. Commun. 4, 4999 (2014).

    Article  PubMed  Google Scholar 

  36. Fudenberg, G., Getz, G., Meyerson, M. & Mirny, L. A. High order chromatin architecture shapes the landscape of chromosomal alterations in cancer. Nat. Biotechnol. 29, 1109–1113 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. De, S. & Michor, F. DNA replication timing and long-range DNA interactions predict mutational landscapes of cancer genomes. Nat. Biotechnol. 29, 1103–1108 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Zhang, Y. et al. Spatial organization of the mouse genome and its role in recurrent chromosomal translocations. Cell 148, 908–921 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Brodie, A., Azaria, J. R. & Ofran, Y. How far from the SNP may the causative genes be? Nuc. Acids Res. 44, 6046–6054 (2016).

    Article  CAS  Google Scholar 

  40. Consortium, E. P. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).

    Article  Google Scholar 

  41. Hahne, F. & Ivanek, R. Visualizing genomic data using Gviz and Bioconductor. in Statistical Genomics: Methods and Protocols 335–351 (Springer Science+Business Media, 2016).

  42. Cui, Y. et al. BioCircos.js: an interactive Circos JavaScript library for biological data visualization on web applications. Bioinformatics 32, 1740–1742 (2016).

    Article  CAS  PubMed  Google Scholar 

  43. Lawrence, M., Daujat, S. & Schneider, R. Lateral thinking: how histone modifications regulate gene expression. Trends Genet. 32, 42–56 (2016).

    Article  CAS  PubMed  Google Scholar 

  44. Bannister, A. J. & Kouzarides, T. Regulation of chromatin by histone modifications. Cell Res. 21, 381–395 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Stunnenberg, H. G., International Human Epigenome, C. & Hirst, M. The International Human Epigenome Consortium: a blueprint for scientific collaboration and discovery. Cell 167, 1897 (2016).

    Article  CAS  PubMed  Google Scholar 

  46. Szabo, Q., Bantignies, F. & Cavalli, G. Principles of genome folding into topologically associating domains. Sci. Adv. 5, eaaw1668 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Dowen, J. M. et al. Control of cell identity genes occurs in insulated neighborhoods in mammalian chromosomes. Cell 159, 374–387 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Servant, N., Varoquaux, N., Heard, E., Barillot, E. & Vert, J. P. Effective normalization for copy number variation in Hi-C data. BMC Bioinformatics 19, 313 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  49. Franke, M. et al. Formation of new chromatin domains determines pathogenicity of genomic duplications. Nature 538, 265–269 (2016).

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

We thank Breast Cancer Now for funding this work as part of Programme Funding to The Breast Cancer Now Toby Robins Research Centre. This study makes use of data generated by the PCHI-C Consortium. A full list of the investigators who contributed to the generation of the data is available in Javierre et al.9, which was funded by the National Institute for Health Research of England, UK Medical Research Council (MR/L007150/1) and UK Biotechnology and Biological Research Council (BB/J004480/1). We also thank D. Li from the WashU Epigenome Browser team for implementing support for CHiCANE’s standard format in the Epigenome Browser.

Author information

Authors and Affiliations

Authors

Contributions

E.M.H., O.F. and S.H. designed the study. E.M.H., O.C.L., O.F., F.D. and S.H. designed CHiCANE. E.M.H. and S.H. implemented the R package. E.M.H., A.G., O.C.L., G.M., O.S., O.F., F.D. and S.H. performed statistical experiments and interpreted data. J.B., A.Z., N.J., N.D. and L.B. generated capture Hi-C data. E.M.H., A.G., O.F. and S.H. wrote the manuscript with contributions from all authors. Y.C. and I.K. implemented dissemination of processed data. O.F. and S.H. supervised the experiments.

Corresponding authors

Correspondence to Olivia Fletcher or Syed Haider.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Protocols thanks Peter Robinson and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Related links

Key references using this protocol

Baxter, J. S. et al. Nat. Commun. 9, 1028 (2018): https://doi.org/10.1038/s41467-018-03411-9

Dryden, N. H. et al. Genome Res. 24, 1854–1868 (2014): https://doi.org/10.1101/gr.175034.114

Extended data

Extended Data Fig. 1 Visualizing interaction peaks in WashU Epigenome Browser.

Image from WashU Epigenome Browser showing non-bait to bait interaction peaks (q-value < 0.05) called at known breast cancer risk loci by CHiCANE in the Baxter T-47D libraries at a, 16q12.2 locus and b, 14q24.1 locus. Yellow boxes show the captured regions.

Extended Data Fig. 2 Model fitting of counts data (pCHi-C library).

Representative examples of hanging rootograms depicting the negative binomial model fits on the Javierre MK library. Observed counts are shown as histogram bins (gray bars) while the CHiCANE fitted expected counts distribution is in red. The y-axis represents square root transformed density estimates of observed (gray bars) and expected (red line) counts. For observed counts, the height of the bars is shifted to align the top of the bar with the expected counts fit. Bars above and below the reference line (x-axis) indicate over- and under-prediction by the CHiCANE model, respectively.

Extended Data Fig. 3 Interpretation of interaction peaks by distance.

Examples of interpretation of interaction calling on Javierre MK library. a, Bar plots showing the proportion of interaction peaks (q-value < 0.05) by type (cis interactions include bait-to-bait interactions). b, Bar plots showing the number of interaction peaks (q-value < 0.05) across distance bins. c, Bar plots showing breakdown of region 1–10 Mb shown in (b).

Supplementary information

Supplementary Table 1

Somatic mutations overlapping with 2q35 target fragments. Example of interaction peaks called by CHiCANE from 2q35 locus of T-47D library annotated (target fragments only) with PCAWG SNV/MNV data using bedtools intersect. Column vcf_info contains information about the variant including allelic fraction, number of reads supporting variant and reference alleles (in tumor sample) and variant’s classification. The column ‘vcf_file’ contains the name of the vcf file i.e a unique patient id recorded in the PCAWG study.

Supplementary Table 2

INDELs overlapping with 2q35 target fragments. Example of interaction peaks called by CHiCANE from the 2q35 locus of the T-47D library annotated (target fragments only) with PCAWG INDELs data using bedtools intersect. The column ‘vcf_info’ contains information about the variant including allelic fraction, number of reads supporting variant and reference alleles (in tumor sample) and variant’s classification. The column ‘vcf_file’ contains the name of the vcf file i.e a unique patient id recorded in the PCAWG study.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Holgersen, E.M., Gillespie, A., Leavy, O.C. et al. Identifying high-confidence capture Hi-C interactions using CHiCANE. Nat Protoc 16, 2257–2285 (2021). https://doi.org/10.1038/s41596-021-00498-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41596-021-00498-1

This article is cited by

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing