Mapping chromatin loops from noisy Hi-C heatmaps remains a major challenge. Here we present DeepLoop, which performs rigorous bias correction followed by deep-learning-based signal enhancement for robust chromatin interaction mapping from low-depth Hi-C data. DeepLoop enables loop-resolution, single-cell Hi-C analysis. It also achieves a cross-platform convergence between different Hi-C protocols and micrococcal nuclease (micro-C). DeepLoop allowed us to map the genetic and epigenetic determinants of allele-specific chromatin interactions in the human genome. We nominate new loci with allele-specific interactions governed by imprinting or allelic DNA methylation. We also discovered that, in the inactivated X chromosome (Xi), local loops at the DXZ4 ‘megadomain’ boundary escape X-inactivation but the FIRRE ‘superloop’ locus does not. Importantly, DeepLoop can pinpoint heterozygous single-nucleotide polymorphisms and large structure variants that cause allelic chromatin loops, many of which rewire enhancers with transcription consequences. Taken together, DeepLoop expands the use of Hi-C to provide loop-resolution insights into the genetics of the three-dimensional genome.
This is a preview of subscription content, access via your institution
Subscribe to Nature+
Get immediate online access to the entire Nature family of 50+ journals
Subscribe to Journal
Get full journal access for 1 year
only $8.25 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Get time limited or full article access on ReadCube.
All prices are NET prices.
Accession numbers for third-party data used in this study can be found in Supplementary Table 1. The raw data of H9 Hi-C and 4C–seq generated in this study, and reanalyzed published data, can be found at accession no. GSE167200. The 40 Hi-C datasets analyzed by DeepLoop can be found at https://hiview.case.edu/public/DeepLoop/.
Dixon, J. R. et al. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature 485, 376–380 (2012).
Nora, E. P. et al. Spatial partitioning of the regulatory landscape of the X-inactivation centre. Nature 485, 381–385 (2012).
Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289–293 (2009).
Denker, A. & de Laat, W. The second decade of 3C technologies: detailed insights into nuclear organization. Genes Dev. 30, 1357–1382 (2016).
Yaffe, E. & Tanay, A. Probabilistic modeling of Hi-C contact maps eliminates systematic biases to characterize global chromosomal architecture. Nat. Genet. 43, 1059–1065 (2011).
Jin, F. et al. A high-resolution map of the three-dimensional chromatin interactome in human cells. Nature 503, 290–294 (2013).
Rao, S. S. et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–1680 (2014).
Forcato, M. et al. Comparison of computational methods for Hi-C data analysis. Nat. Methods 14, 679–685 (2017).
Lu, L. et al. Robust Hi-C maps of enhancer-promoter interactions reveal the function of non-coding genome in neural development and diseases. Mol. Cell 79, 521–534 (2020).
Schoenfelder, S. et al. The pluripotent regulatory circuitry connecting promoters to their long-range interacting elements. Genome Res. 25, 582–597 (2015).
Mifsud, B. et al. Mapping long-range promoter contacts in human cells with high-resolution capture Hi-C. Nat. Genet. 47, 598–606 (2015).
Javierre, B. M. et al. Lineage-specific genome architecture links enhancers and non-coding disease variants to target gene promoters. Cell 167, 1369–1384 (2016).
Zhang, Y. et al. Chromatin connectivity maps reveal dynamic promoter-enhancer long-range associations. Nature 504, 306–310 (2013).
Mumbach, M. R. et al. Enhancer connectome in primary human cells identifies target genes of disease-associated DNA elements. Nat. Genet. 49, 1602–1612 (2017).
Fang, R. et al. Mapping of long-range chromatin interactions by proximity ligation-assisted ChIP-seq. Cell Res. 26, 1345–1348 (2016).
Hu, M. et al. HiCNorm: removing biases in Hi-C data via Poisson regression. Bioinformatics 28, 3131–3133 (2012).
Imakaev, M. et al. Iterative correction of Hi-C data reveals hallmarks of chromosome organization. Nat. Methods 9, 999–1003 (2012).
Ay, F., Bailey, T. L. & Noble, W. S. Statistical confidence estimation for Hi-C data reveals regulatory chromatin contacts. Genome Res. 24, 999–1011 (2014).
Xiong, K. & Ma, J. Revealing Hi-C subcompartments by imputing inter-chromosomal chromatin interactions. Nat. Commun. 10, 5069 (2019).
Zhang, Y. et al. Enhancing Hi-C data resolution with deep convolutional neural network HiCPlus. Nat. Commun. 9, 750 (2018).
Liu, T. & Wang, Z. HiCNN: a very deep convolutional neural network to better enhance the resolution of Hi-C data. Bioinformatics 35, 4222–4228 (2019).
Hong, H. et al. DeepHiC: a generative adversarial network for enhancing Hi-C data resolution. PLoS Comput. Biol. 16, e1007287 (2020).
Li, Z. & Dai, Z. SRHiC: a deep learning model to enhance the resolution of Hi-C data. Front. Genet. 11, 353 (2020).
Won, H. et al. Chromosome conformation elucidates regulatory relationships in developing human brain. Nature 538, 523–527 (2016).
Sanborn, A. L. et al. Chromatin extrusion explains key features of loop and domain formation in wild-type and engineered genomes. Proc. Natl Acad. Sci. USA 112, E6456–E6465 (2015).
Selvaraj, S., J, R. D., Bansal, V. & Ren, B. Whole-genome haplotype reconstruction using proximity-ligation and shotgun sequencing. Nat. Biotechnol. 31, 1111–1118 (2013).
Dixon, J. R. et al. Chromatin architecture reorganization during stem cell differentiation. Nature 518, 331–336 (2015).
Consortium, E. P. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
Hawkins, R. D. et al. Distinct epigenomic landscapes of pluripotent and lineage-committed human cells. Cell Stem Cell 6, 479–491 (2010).
Bernstein, B. E. et al. The NIH roadmap epigenomics mapping consortium. Nat. Biotechnol. 28, 1045–1048 (2010).
Shen, Y. et al. A map of the cis-regulatory sequences in the mouse genome. Nature 488, 116–120 (2012).
Lettice, L. A. et al. A long-range Shh enhancer regulates expression in the developing limb and fin and is associated with preaxial polydactyly. Hum. Mol. Genet 12, 1725–1735 (2003).
Li, Y. et al. CRISPR reveals a distal super-enhancer required for Sox2 expression in mouse embryonic stem cells. PLoS ONE 9, e114485 (2014).
Lupianez, D. G. et al. Disruptions of topological chromatin domains cause pathogenic rewiring of gene-enhancer interactions. Cell 161, 1012–1025 (2015).
Smemo, S. et al. Obesity-associated variants within FTO form long-range functional connections with IRX3. Nature 507, 371–375 (2014).
Won, H., Huang, J., Opland, C. K., Hartl, C. L. & Geschwind, D. H. Human evolved regulatory elements modulate genes involved in cortical expansion and neurodevelopmental disease susceptibility. Nat. Commun. 10, 2396 (2019).
de la Torre-Ubieta, L. et al. The dynamic landscape of open chromatin during human cortical neurogenesis. Cell 172, 289–304 (2018).
Ronneberger, O., Fischer, P. & Brox, T. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention 234–241 (Springer, 2015).
Jung, I. et al. A compendium of promoter-centered long-range chromatin interactions in the human genome. Nat. Genet. 51, 1442–1449 (2019).
Heidari, N. et al. Genome-wide map of regulatory interactions in the human genome. Genome Res. 24, 1905–1917 (2014).
Tang, Z. et al. CTCF-mediated human 3D genome architecture reveals chromatin topology for transcription. Cell 163, 1611–1627 (2015).
Mumbach, M. R. et al. HiChIP: efficient and sensitive analysis of protein-directed genome architecture. Nat. Methods 13, 919–922 (2016).
Li, G., Chen, Y., Snyder, M. P. & Zhang, M. Q. ChIA-PET2: a versatile and flexible pipeline for ChIA-PET data analysis. Nucleic Acids Res. 45, e4 (2017).
Liu, T. & Wang, Z. HiCNN2: enhancing the resolution of Hi-C data using an ensemble of convolutional neural networks. Genes (Basel) 10, 862 (2019).
Krietenstein, N. et al. Ultrastructural details of mammalian chromosome architecture. Mol. Cell 78, 554–565 e7 (2020).
Reiff, S. B. et al. The 4D Nucleome Data Portal: a resource for searching and visualizing curated nucleomics data. Nat. Commun. 13, 2365 (2022).
Akgol Oksuz, B. et al. Systematic evaluation of chromosome conformation capture assays. Nat. Methods 18, 1046–1055 (2021).
Hsieh, T. S. et al. Resolving the 3D landscape of transcription-linked mammalian chromatin folding. Mol. Cell 78, 539–553 (2020).
Schmitt, A. D. et al. A compendium of chromatin contact maps reveals spatially active regions in the human genome. Cell Rep. 17, 2042–2059 (2016).
Nagano, T. et al. Cell-cycle dynamics of chromosomal organization at single-cell resolution. Nature 547, 61–67 (2017).
Cairns, J. et al. CHiCAGO: robust detection of DNA looping interactions in Capture Hi-C data. Genome Biol. 17, 127 (2016).
Lee, D. S. et al. Simultaneous profiling of 3D genome structure and DNA methylation in single human cells. Nat. Methods 16, 999–1006 (2019).
Splinter, E. et al. CTCF mediates long-range chromatin looping and local histone modification in the beta-globin locus. Genes Dev. 20, 2349–2354 (2006).
Murrell, A., Heeson, S. & Reik, W. Interaction between differentially methylated regions partitions the imprinted genes Igf2 and H19 into parent-specific chromatin loops. Nat. Genet. 36, 889–893 (2004).
Kurukuti, S. et al. CTCF binding at the H19 imprinting control region mediates maternally inherited higher-order chromatin conformation to restrict enhancer access to Igf2. Proc. Natl Acad. Sci. USA 103, 10684–10689 (2006).
Lleres, D. et al. CTCF modulates allele-specific sub-TAD organization and imprinted gene activity at the mouse Dlk1-Dio3 and Igf2-H19 domains. Genome Biol. 20, 272 (2019).
Kuleshov, V. et al. Whole-genome haplotyping using long reads and statistical methods. Nat. Biotechnol. 32, 261–266 (2014).
Barlow, D. P. & Bartolomei, M. S. Genomic imprinting in mammals. Cold Spring Harb. Perspect. Biol. 6, 952–965 (2014).
Kobayashi, S. et al. Human PEG1/MEST, an imprinted gene on chromosome 7. Hum. Mol. Genet. 6, 781–786 (1997).
Deng, X. et al. Bipartite structure of the inactive mouse X chromosome. Genome Biol. 16, 152 (2015).
Giorgetti, L. et al. Structural organization of the inactive X chromosome in the mouse. Nature 535, 575–579 (2016).
Minajigi, A. et al. Chromosomes. A comprehensive Xist interactome reveals cohesin repulsion and an RNA-directed chromosome conformation. Science 17, 349 (2015).
Horakova, A. H., Moseley, S. C., McLaughlin, C. R., Tremblay, D. C. & Chadwick, B. P. The macrosatellite DXZ4 mediates CTCF-dependent long-range intrachromosomal interactions on the human inactive X chromosome. Hum. Mol. Genet. 21, 4367–4377 (2012).
Yang, F. et al. The lncRNA Firre anchors the inactive X chromosome to the nucleolus by binding CTCF and maintains H3K27me3 methylation. Genome Biol. 16, 52 (2015).
Fang, H. et al. Trans- and cis-acting effects of Firre on epigenetic features of the inactive X chromosome. Nat. Commun. 11, 6053 (2020).
Kriz, A. J., Colognori, D., Sunwoo, H., Nabet, B. & Lee, J. T. Balancing cohesin eviction and retention prevents aberrant chromosomal interactions, Polycomb-mediated repression, and X-inactivation. Mol. Cell 81, 1970–1987 (2021).
Dixon, J. R. et al. Integrative detection and analysis of structural variation in cancer genomes. Nat. Genet. 50, 1388–1398 (2018).
Mahmoud, M. et al. Structural variant calling: the long and the short of it. Genome Biol. 20, 246 (2019).
Chaisson, M. J., Wilson, R. K. & Eichler, E. E. Genetic variation and the de novo assembly of human genomes. Nat. Rev. Genet. 16, 627–640 (2015).
Chaisson, M. J. P. et al. Multi-platform discovery of haplotype-resolved structural variation in human genomes. Nat. Commun. 10, 1784 (2019).
Jain, M. et al. Nanopore sequencing and assembly of a human genome with ultra-long reads. Nat. Biotechnol. 36, 338–345 (2018).
Kidd, J. M. et al. Mapping and sequencing of structural variation from eight human genomes. Nature 453, 56–64 (2008).
Puig, M., Casillas, S., Villatoro, S. & Caceres, M. Human inversions and their functional consequences. Brief. Funct. Genomics 14, 369–379 (2015).
Giner-Delgado, C. et al. Evolutionary and functional impact of common polymorphic inversions in the human genome. Nat. Commun. 10, 4222 (2019).
Schoenfelder, S. et al. Preferential associations between co-regulated genes reveal a transcriptional interactome in erythroid cells. Nat. Genet. 42, 53–61 (2010).
Di Giammartino, D. C. et al. KLF4 is involved in the organization and regulation of pluripotency-associated three-dimensional enhancer networks. Nat. Cell Biol. 21, 1179–1190 (2019).
Wei, Z. et al. Klf4 organizes long-range chromosomal interactions with the oct4 locus in reprogramming and pluripotency. Cell Stem Cell 13, 36–47 (2013).
Tarjan, D. R., Flavahan, W. A. & Bernstein, B. E. Epigenome editing strategies for the functional annotation of CTCF insulators. Nat. Commun. 10, 4258 (2019).
Jia, Z. et al. Tandem CTCF sites function as insulators to balance spatial chromatin contacts and topological enhancer-promoter selection. Genome Biol. 21, 75 (2020).
Krijger, P. H. L., Geeven, G., Bianchi, V., Hilvering, C. R. E. & de Laat, W. 4C-seq from beginning to end: a detailed protocol for sample preparation and data analysis. Methods 170, 17–32 (2020).
Gu, B. et al. Transcription-coupled changes in nuclear mobility of mammalian cis-regulatory elements. Science 359, 1050–1055 (2018).
Labuhn, M. et al. Refined sgRNA efficacy prediction improves large- and small-scale CRISPR-Cas9 applications. Nucleic Acids Res. 46, 1375–1385 (2018).
Stemmer, M., Thumberger, T., Del Sol Keyer, M., Wittbrodt, J. & Mateo, J. L. CCTop: an intuitive, flexible and reliable CRISPR/Cas9 target prediction tool. PLoS ONE 10, e0124633 (2015).
Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).
Durand, N. C. et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst. 3, 95–98 (2016).
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 37, 907–915 (2019).
Krueger, F. & Andrews, S. R. SNPsplit: allele-specific splitting of alignments between genomes with known SNP genotypes. F1000Res. 5, 1479 (2016).
Zhang, Y. et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 9, R137 (2008).
Xiao, X. et al. Endogenous reprogramming of alpha cells into beta cells, induced by viral gene therapy, reverses autoimmune diabetes. Cell Stem Cell 22, 78–90 (2018).
Gondara, L. Medical image denoising using convolutional denoising autoencoders. In 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW) 241–246 (IEEE, 2016).
Vincent, P., Larochelle, H., Bengio, Y. & Manzagol, P.-A. Extracting and composing robust features with denoising autoencoders. Citeseer http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.569.2442&rep=rep1&type=pdf (2008).
Kingma, D. P. & BA, J. Adam: a method for stochastic optimization https://doi.org/10.48550/arXiv.1412.6980 (2014).
Kent, W. J. et al. The human genome browser at UCSC. Genome Res. 12, 996–1006 (2002).
This work is supported by grants from the National Institutes of Health (nos. R01HG009658 to F.J. and R01DK113185 to Y.L.) and Mount Sinai Health Care Foundation (nos. OSA510113 to F.J. and OSA510114 to Y.L.). F.J. is also supported by a subaward from the University of Miami (no. NIH U01AG072579) and a Cancer Data Sciences pilot grant from Case Comprehensive Cancer Center Support Grant (no. NIH P30CA043703). J.L. is supported in part by National Science Foundation grant nos. CCF-2006780 and CCF-1815139. D.P. is supported by a NIH training grant (no. T32HL007567) and a fellowship from the Callahan Foundation. This work made use of the High-Performance Computing Resource in the Core Facility for Advanced Research Computing at Case Western Reserve University.
The authors declare no competing interests.
Peer review information
Nature Genetics thanks the anonymous reviewers for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
a, Detailed LoopDenoise convolutional autoencoder model architecture showing five convolution layers, two in the encoding path using eight 13 × 13 filters, two transpose convolution layers in the decoding path using eight 2 × 2 filters and one final convolution layer using a single 13 × 13 filter. The matrices dimensions of each layer output were also shown. Each layer is visualized by the filters used, the output of convolving the input with this filter, the result of applying ReLU activation and the result of max pooling. The convolution operation is denoted by *. b, Venn diagram showing the reproducible loop pixels between three human fetal brain replicates. The table showing the number of overlapped pixels between significant pixels in the pooled data and each part of pixels shown in the Venn diagram. The pixels that are significant in both pooled data and at least one of the three replicates are the training target in the LoopDenoise model (P < 0.05, negative binomial test). The significance of loop pixels come from the negative binomial test wrapped in HiCorr package. c, Pairwise reproducibility at pixel level (defined as the fraction of common ones when calling the same number of loop pixels from two datasets) between biological replicates of human fetal cortex Hi-C data, when the same numbers of the loop pixels were called. d, The heatmap examples from 7 locus in three human fetal brain replicates, and LoopDenoise output showing more reproducible contact patterns.
a, Eight heatmap examples in GM12878, the highlight row is the output from LoopDenoise. b, The distance distribution of top 300K pixels in H1(hESC), GM12878, IMR90 and mESC. Upper and lower limits of boxes indicate interquartile ranges, center lines indicate median values, whiskers indicate values with a maximum of 1.5 times the interquartile range and outliers indicate values beyond 1.5 times the interquartile range. c, The number of loops pixels with at least one anchor overlapped with ChIP-seq peaks out of top 300K pixels. d, Density plots show the distribution of distances between loop anchors (top 100K loop pixels used) and their nearest ChIP-seq peaks in GM12878, IMR90, H1(hESC) and mESC. e, The heatmap examples of six loci with known long-range gene regulation. The height of browser tracks indicating the raw counts of ChIP-seq.
a, Scatterplots showing the pixel-level correlation between CP and GZ sample in human fetal cortex before and after LoopDenoise. The R-square values were also shown in the plots. b, GO analyses of genes associated with GZ- or CP-specific loops. Fisher’s Exact test was used to measure the gene-enrichment in annotation terms. c, The contact heatmaps of selected gene loci with top GZ- or CP-specific loop pixels. ATAC-seq tracks in CP (yellow) and GZ (blue) are also included for comparison. The height of browser tracks indicating the raw counts of ATAC-seq.
Extended Data Fig. 4 Compare the performance of different pipelines on 6-cutter and 4-cutter Hi-C data in GM12878 cells.
For 4-cutter Hi-C datasets, we chose a 94M down-sampled dataset (1/16 of the original depth) used in HiCPlus, HiCNN2 and SRHiC studies, and the 1.35 billion full-depth as reference. For 6-cutter Hi-C datasets, we chose a 50M down-sampled dataset and the 380M full-depth as reference. For locus chr5:87,964,000-88,764000, the left side showed the contact heatmaps from 6-cutter (HindIII) GM12878 Hi-C processed by different pipelines (colored in background). The right side showed the 4-cutter (MboI) GM12878 Hi-C. The height of browser tracks indicating the raw counts of ChIP-seq.
a, Similar to Fig. 3a, b, more heatmap examples at 4 loci. b, Size breakdown of recovered micro-C HICCUPS loops by 50M deep HindIII- or DpnII- Hi-C after enhancement.
Applying LoopEnhance to low depth Hi-C data from 14 human tissues. Contact heatmaps of three tissue-specifically expressed genes in all the tissues were shown. a, ALB, highly expressed in liver. b, MYOZ2, highly expressed in heart tissues. c, ADD2, highly expressed in brain tissues.
Same as Fig. 4e,f, single cells from the same cell type are pooled and enhanced by DeepLoop. The tSNE plots show the identities of each cell population (left) and the methylation level at the locus of interest (right).
Extended Data Fig. 8 Large heterozygous deletions and inversions detected by allelic DeepLoop analysis.
a, The scatterplots highlight the loop pixels within the entire four SVs region (two inversions and two deletions). b, The contact heatmaps of paternal deletion Del-chr14 and maternal deletion Del-chr22. c, The contact heatmaps of Inv-chr7. d, The genome track of Inv-chr7 shows the chromatin interactions, CTCF and H3K27ac binding on the un-inverted allele and ‘inversion-fix’ allele. In this region, the un-inverted paternal genome has A1-A4 and A5-A6 cross-boundary CTCF loops. The maternal inversion created new A1-A5 and A4-A6 cross-boundary loops due to the inverted orientation the CTCF motifs. Note that in paternal genome, the A1-A4 loop encompass multiple enhancers, while in the inverted maternal genome the A1-A5 loop lack enhancers. e, The gene expression level of gene CCZ1 in two alleles. The height of browser tracks indicating the raw counts of ChIP-seq.
Extended Data Fig. 9 The contact heatmaps and browser snapshots of 24 loci containing 27 SNPs associated with both allelic CTCF binding and allelic DNA looping.
For each SNP, the paternal (blue) and maternal (red) genotypes are included. The allelic loops are circled in the heatmaps. The CTCF motif orientation are indicated with triangles. The height of browser tracks indicating the raw counts of ChIP-seq.
a, 3C assays showing the loss of chromatin loop between the SNP (highlight in yellow) and ACBD7 locus after displacing CTCF binding with either dCas9-KRAB or dCas9 protein. b,c, Bar plots showing the changes of allelic gene expression upon blocking CTCF loops with dCas9 or dCas9-KRAB. d–f, CTCF blocking experiments at GPNMB locus. n = 2 biologically independent experiments. All data are presented as means ± SEM from 4 replicated experiments. **P < 0.01, ***P < 0.001. NS, no significant difference. Two-sided Wilcoxon test. The height of browser tracks indicating the raw counts of ChIP-seq.
About this article
Cite this article
Zhang, S., Plummer, D., Lu, L. et al. DeepLoop robustly maps chromatin interactions from sparse allele-resolved or single-cell Hi-C data at kilobase resolution. Nat Genet 54, 1013–1025 (2022). https://doi.org/10.1038/s41588-022-01116-w