Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Genome-wide detection of enhancer-hijacking events from chromatin interaction data in rearranged genomes

Abstract

Recent efforts have shown that structural variations (SVs) can disrupt three-dimensional genome organization and induce enhancer hijacking, yet no computational tools exist to identify such events from chromatin interaction data. Here, we develop NeoLoopFinder, a computational framework to identify the chromatin interactions induced by SVs, including interchromosomal translocations, large deletions and inversions. Our framework can automatically resolve complex SVs, reconstruct local Hi-C maps surrounding the breakpoints, normalize copy number variation and allele effects and predict chromatin loops induced by SVs. We applied NeoLoopFinder in Hi-C data from 50 cancer cell lines and primary tumors and identified tens of recurrent genes associated with enhancer hijacking. To experimentally validate NeoLoopFinder, we deleted the hijacked enhancers in prostate adenocarcinoma cells using CRISPR–Cas9, which significantly reduced expression of the target oncogene. In summary, NeoLoopFinder enables identification of critical oncogenic regulatory elements that can potentially reveal therapeutic targets.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: Overall design of the NeoLoopFinder framework.
Fig. 2: Detection of neoloops in 50 cancer cell lines or patient samples.
Fig. 3: Cancer-type specificity of neoloop-involved genes.
Fig. 4: Analysis of expression of genes in neoloops.
Fig. 5: Deletion of hijacked enhancers reduced oncogene expression in prostate adenocarcinoma.

Data availability

The cancer Hi-C datasets analyzed in this study are summarized in Supplementary Table 1. Details of all the other datasets collected for the validation and downstream analysis are summarized in Supplementary Table 7. Data used for survival analysis in leukemia and gastric cancer were downloaded from the cBioPortal for Cancer Genomics (https://www.cbioportal.org)45. The list of cancer-related genes was obtained from the Bushman Lab (http://www.bushmanlab.org/assets/doc/allOnco_May2018.tsv). The 4C-seq data for the MYC gene promoter in SK-N-MC cells, and the Hi-C data generated for wild-type and enhancer-deleted LNCaP cells have been uploaded to the gene expression omnibus (GEO; https://www.ncbi.nlm.nih.gov/geo/) under accession code GSE161493. Source data are provided with this paper.

Code availability

The NeoLoopFinder source code is publicly available in GitHub at https://github.com/XiaoTaoWang/NeoLoopFinder. The NeoLoopFinder code is also available at Code Ocean (https://doi.org/10.24433/CO.1323561.v1).

References

  1. 1.

    Futreal, P. A. et al. A census of human cancer genes. Nat. Rev. Cancer 4, 177–183 (2004).

    CAS  PubMed  PubMed Central  Google Scholar 

  2. 2.

    Consortium, E. P. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).

    Google Scholar 

  3. 3.

    Bernstein, B. E. et al. The NIH Roadmap Epigenomics Mapping Consortium. Nat. Biotechnol. 28, 1045–1048 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  4. 4.

    Weischenfeldt, J. et al. Pan-cancer analysis of somatic copy-number alterations implicates IRS4 and IGF2 in enhancer hijacking. Nat. Genet. 49, 65–74 (2017).

    CAS  PubMed  Google Scholar 

  5. 5.

    Spielmann, M., Lupianez, D. G. & Mundlos, S. Structural variation in the 3D genome. Nat. Rev. Genet. 19, 453–467 (2018).

    CAS  PubMed  Google Scholar 

  6. 6.

    Groschel, S. et al. A single oncogenic enhancer rearrangement causes concomitant EVI1 and GATA2 deregulation in leukemia. Cell 157, 369–381 (2014).

    CAS  PubMed  Google Scholar 

  7. 7.

    Drier, Y. et al. An oncogenic MYB feedback loop drives alternate cell fates in adenoid cystic carcinoma. Nat. Genet. 48, 265–272 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  8. 8.

    Franke, M. et al. Formation of new chromatin domains determines pathogenicity of genomic duplications. Nature 538, 265–269 (2016).

    CAS  PubMed  Google Scholar 

  9. 9.

    Northcott, P. A. et al. The whole-genome landscape of medulloblastoma subtypes. Nature 547, 311–317 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  10. 10.

    Yang, M. et al. 13q12.2 deletions in acute lymphoblastic leukemia lead to upregulation of FLT3 through enhancer hijacking. Blood https://doi.org/10.1182/blood.2019004684 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  11. 11.

    Ooi, W. F. et al. Integrated paired-end enhancer profiling and whole-genome sequencing reveals recurrent CCNE1 and IGF2 enhancer hijacking in primary gastric adenocarcinoma. Gut 69, 1039–1052 (2020).

    CAS  PubMed  Google Scholar 

  12. 12.

    Martin-Garcia, D. et al. CCND2 and CCND3 hijack immunoglobulin light-chain enhancers in cyclin D1(−) mantle cell lymphoma. Blood 133, 940–951 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  13. 13.

    Haller, F. et al. Enhancer hijacking activates oncogenic transcription factor NR4A3 in acinic cell carcinomas of the salivary glands. Nat. Commun. 10, 368 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  14. 14.

    Zimmerman, M. W. et al. MYC drives a subset of high-risk pediatric neuroblastomas and is activated through mechanisms including enhancer hijacking and focal enhancer amplification. Cancer Discov. 8, 320–335 (2018).

    CAS  PubMed  Google Scholar 

  15. 15.

    Ryan, R. J. et al. Detection of enhancer-associated rearrangements reveals mechanisms of oncogene dysregulation in B-cell lymphoma. Cancer Discov. 5, 1058–1071 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  16. 16.

    Northcott, P. A. et al. Enhancer hijacking activates GFI1 family oncogenes in medulloblastoma. Nature 511, 428–434 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  17. 17.

    He, B. et al. Diverse noncoding mutations contribute to deregulation of cis-regulatory landscape in pediatric cancers. Sci. Adv. 6, eaba3064 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  18. 18.

    Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289–293 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  19. 19.

    Wang, S. et al. HiNT: a computational method for detecting copy number variations and translocations from Hi-C data. Genome Biol. 21, 73 (2020).

    PubMed  PubMed Central  Google Scholar 

  20. 20.

    Dixon, J. R. et al. Integrative detection and analysis of structural variation in cancer genomes. Nat. Genet. 50, 1388–1398 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  21. 21.

    Chakraborty, A. & Ay, F. Identification of copy number variations and translocations in cancer cells from Hi-C data. Bioinformatics 34, 338–345 (2018).

    CAS  PubMed  Google Scholar 

  22. 22.

    Rao, S. S. P. et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–1680 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  23. 23.

    Imakaev, M. et al. Iterative correction of Hi-C data reveals hallmarks of chromosome organization. Nat. Methods 9, 999–1003 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  24. 24.

    Yang, T. et al. HiCRep: assessing the reproducibility of Hi-C data using a stratum-adjusted correlation coefficient. Genome Res 27, 1939–1949 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  25. 25.

    Wu, H. J. & Michor, F. A computational strategy to adjust for copy number in tumor Hi-C data. Bioinformatics 32, 3695–3701 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  26. 26.

    Vidal, E. et al. OneD: increasing reproducibility of Hi-C samples with abnormal karyotypes. Nucleic Acids Res. 46, e49 (2018).

    PubMed  PubMed Central  Google Scholar 

  27. 27.

    Servant, N., Varoquaux, N., Heard, E., Barillot, E. & Vert, J. P. Effective normalization for copy number variation in Hi-C data. BMC Bioinformatics 19, 313 (2018).

    PubMed  PubMed Central  Google Scholar 

  28. 28.

    Salameh, T. J. et al. A supervised learning framework for chromatin loop detection in genome-wide contact maps. Nat. Commun. 11, 3428 (2020).

    PubMed  PubMed Central  Google Scholar 

  29. 29.

    Fudenberg, G. et al. Formation of chromosomal domains by loop extrusion. Cell Rep. 15, 2038–2049 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  30. 30.

    Sanborn, A. L. et al. Chromatin extrusion explains key features of loop and domain formation in wild-type and engineered genomes. Proc. Natl Acad. Sci. USA 112, E6456–E6465 (2015).

    CAS  PubMed  Google Scholar 

  31. 31.

    Crane, E. et al. Condensin-driven remodelling of X chromosome topology during dosage compensation. Nature 523, 240–244 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  32. 32.

    Dixon, J. R. et al. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature 485, 376–380 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  33. 33.

    Derderian, C., Orunmuyi, A. T., Olapade-Olaopa, E. O. & Ogunwobi, O. O. PVT1 signaling is a mediator of cancer progression. Front Oncol. 9, 502 (2019).

    PubMed  PubMed Central  Google Scholar 

  34. 34.

    Quereda, V. et al. Therapeutic targeting of CDK12/CDK13 in triple-negative breast cancer. Cancer Cell 36, 545–558 e547 (2019).

    CAS  PubMed  Google Scholar 

  35. 35.

    Parolia, A. et al. Distinct structural classes of activating FOXA1 alterations in advanced prostate cancer. Nature 571, 413–418 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  36. 36.

    Kuleshov, M. V. et al. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Res. 44, W90–W97 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  37. 37.

    Spangle, J. M. et al. PI3K/AKT signaling regulates H3K4 methylation in breast cancer. Cell Rep. 15, 2692–2704 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  38. 38.

    Baena, E. et al. ETV1 directs androgen metabolism and confers aggressive prostate cancer in targeted mice and patients. Genes Dev. 27, 683–698 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  39. 39.

    Gasi, D. et al. Overexpression of full-length ETV1 transcripts in clinical prostate cancer due to gene translocation. PLoS ONE 6, e16332 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  40. 40.

    Kragesteen, B. K. et al. Dynamic 3D chromatin architecture contributes to enhancer specificity and limb morphogenesis. Nat. Genet. 50, 1463–1473 (2018).

    CAS  PubMed  Google Scholar 

  41. 41.

    Despang, A. et al. Functional dissection of the Sox9-Kcnj2 locus identifies nonessential and instructive roles of TAD architecture. Nat. Genet. 51, 1263–1271 (2019).

    CAS  PubMed  Google Scholar 

  42. 42.

    Ay, F., Bailey, T. L. & Noble, W. S. Statistical confidence estimation for Hi-C data reveals regulatory chromatin contacts. Genome Res 24, 999–1011 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  43. 43.

    Wang, Y. et al. The 3D Genome Browser: a web-based browser for visualizing 3D genome organization and long-range chromatin interactions. Genome Biol. 19, 151 (2018).

    PubMed  PubMed Central  Google Scholar 

  44. 44.

    Boeva, V. et al. Control-FREEC: a tool for assessing copy number and allelic content using next-generation sequencing data. Bioinformatics 28, 423–425 (2012).

    CAS  PubMed  Google Scholar 

  45. 45.

    Liu, J. et al. An integrated TCGA pan-cancer clinical data resource to drive high-quality survival outcome analytics. Cell 173, 400–416 e411 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  46. 46.

    Wang, X. T., Cui, W. & Peng, C. HiTAD: detecting the structural and functional hierarchies of topologically associating domains from chromatin interactions. Nucleic Acids Res. 45, e163 (2017).

    PubMed  PubMed Central  Google Scholar 

  47. 47.

    Abdennur, N. & Mirny, L. A. Cooler: scalable storage for Hi-C data and other genomically labeled arrays. Bioinformatics 36, 311–316 (2020).

    CAS  PubMed  Google Scholar 

  48. 48.

    Xu, W. et al. CoolBox: a interactive genomic data explorer for Jupyter Notebook. Preprint at bioRxiv https://doi.org/10.1101/614222 (2019).

  49. 49.

    Ramirez, F. et al. deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Res. 44, W160–W165 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  50. 50.

    Kaul, A., Bhattacharyya, S. & Ay, F. Identifying statistically significant chromatin contacts from Hi-C data with FitHiC2. Nat. Protoc. 15, 991–1012 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

F.Y. is supported by NIH grants R35GM124820 and R01HG009906. We thank H. Yang and the rest of the Yue laboratory for discussion.

Author information

Affiliations

Authors

Contributions

F.Y. conceived, designed and supervised the project. X.W. implemented the algorithm and performed the data analysis. J.X. performed the 4C-seq experiment in SK-N-MC cells. B.Z. performed the CRISPR–Cas9 deletion experiments while working in Northwestern University. Y.H. performed Hi-C experiments in wild-type and M1-deleted LNCaP cells. F.S. processed the 4C-seq data. H.L. contributed to the proofreading and editing of the manuscript. X.W. and F.Y. wrote the manuscript with input from all the authors.

Corresponding author

Correspondence to Feng Yue.

Ethics declarations

Competing interests

F.Y. and X.W. are listed as inventors of a provisional patent titled ‘Detection of chromatin interactions in re-arranged genomes’. F.Y. is a cofounder of Sariant Therapeutics, Inc.

Additional information

Peer review Information Nature Methods thanks the anonymous reviewers for their contribution to the peer review of this work. L. Tang was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Evaluation of the CNV segmentation and CNV normalization module in NeoLoopFinder.

a–g, We implemented the same generalized additive model (GAM) used by HiNT-CNV to estimate the copy number profile directly from Hi-C. For CNV segmentation, we applied a different algorithm based on Hidden Markov Model (HMM). a, We compared the copy number profiles estimated by NeoLoopFinder with the CNV profiles computed by Control-FREEC with whole genome sequencing (WGS) data. Each dot represents a 25kb bin. Bins with zero reads were excluded from the calculation. b, Similar to a, but for the comparison between NeoLoopFinder and HiNT-CNV. c, The number of CNV segments identified by NeoLoopFinder and HiNT-CNV, with or without WGS support. Only segments with a copy number ratio larger than 1.5 or smaller than 0.3 were considered in the calculation. d, The fraction of WGS-detected CNV segments that are recalled by NeoLoopFinder or HiNT-CNV. e, Comparison of F1 scores between NeoLoopFinder and HiNT-CNV in eight cancer cell lines. f, Comparisons of CNV segments inferred from HiNT-CNV and NeoLoopFinder based on Hi-C data in K562 cells. We compared their results with the CNV segments computed by Control-FREEC with whole genome sequencing (WGS) data. Results from both HiNT-CNV and NeoLoopFinder are similar to Control-FREEC. However, for more fragmented regions (green circles), NeoLoopFinder’s performance is better. g, A similar example in SK-N-MC cells. h–i, Comparison of different Hi-C normalization methods in K562 (Resolution: 10kb). Hi-C contact heatmaps and copy number variation profiles are shown for two example regions: ‘chr22: 22,340,000 – 24,200,000’ (h) and ‘chr9: 130,000,000 – 131,280,000’ (i). The CAIC method is excluded from the analysis due to memory error (Supplementary Table 2).

Source data

Extended Data Fig. 2 Evaluating the performance of CNV normalization of Hi-C data by simulation.

The inputs in our simulation are Hi-C data in GM12878 (normal lymphoblastic cells) and K562 (chronic myeloid leukemia) cells, and the CNV profiles in K562 cells. a, Our algorithm learns the trans-/cis-scaling factors separately for all possible copy number pairs from K562 Hi-C data. The CNV effects of K562 are then imposed on GM12878 Hi-C by linearly transforming the signals with the factor of corresponding copy number pairs. The resulting simulated GM12878 Hi-C matrix with K562 CNV is highly similar to the original K562 Hi-C matrix. b, We applied ICE and the newly designed CNV normalization method in this project to the simulated matrix from Supplementary Fig. 2a (GM12878 Hi-C matrix with CNVs in K562). By visual inspection, the CNV normalized Hi-C is more similar to the original GM12878. c, We used HiCRep to calculate the Stratum-adjusted Correlation Coefficients (SCCs) between ICE normalized and CNV normalized matrix to the original GM12878 Hi-C data. The distributions of SCC scores are presented in box-and-whisker plots, where the box represents the interquartile range (IQR, Q3-Q1), the horizontal thick line represents the median, the upper whisker extends to the last datum less than Q3 + 1.5×IQR, and the lower whisker extends to the first datum greater than Q1-1.5×IQR. Each dot represents an individual chromosome.

Source data

Extended Data Fig. 3 Complex SV detection based on Hi-C maps.

a, Illustration of how we re-construct Hi-C map surrounding breakpoints. There are four orientation types of inter-chromosomal translocations. b, Overall workflow of the complex SV assembling module in NeoLoopFinder. c, In the first step of the pipeline, we determine the whole rearranged fragments (green box) of the input SV breakpoints within the checking window (gray box, by default 5Mb extended from the breakpoints). d, The algorithm for determining rearranged fragments. First, correlation matrices are calculated by rows (top) and columns (bottom) of the contact matrix separately within the checking window; then the rearranged fragment boundaries are determined by checking the first principal component profile (PC1) of the correlation matrices. e,f, Determination of the R2 cutoff for SV filtering. e, The distribution of R2 across all large SVs (intra-chromosomal rearrangements larger than 1Mb and inter-chromosomal translocations). The number was summarized from all 50 cancer samples. f, The fraction of SVs after filtering as a function of R2 cutoff. Data were merged from eight cancer cell lines: A549, Caki2, K562, LNCaP, NCI-H460, PANC-1, SK-N-MC, and T47D.

Source data

Extended Data Fig. 4 Examples of complex SVs assembled by NeoLoopFinder using Hi-C data.

All these complex SVs were also reported in our previous work, where the structure of each assembly was manually reconstructed by combining optical mapping, Hi-C and WGS. a–c, Assembly of a complex SV in K562 cells. a, The abnormal signals on the original Hi-C map indicate the complex SV structures between chromosome 13 and chromosome 9 in K562 cells. b, The correct lengths and orders of the rearranged regions B2 (chr13: 92.3M-92.66M), B3 (chr13: 93.2M-93.37M), C (chr13: 107.84M-108M) and D (chr9: 130.72M-131.28M) can be automatically identified and assembled by NeoLoopFinder solely based on Hi-C data. c, Linear regression of the global distance averaged contact frequencies and local distance averaged contact frequencies within the indicated contact regions, for example, ‘D-C’ represents the contacts between region D and region C in b. d–f, Assembly of another complex SV in K562 cells. g–i, Reconstruction of a complex SV in T47D cells.

Source data

Extended Data Fig. 5 Cancer-specific allele normalization and neo-TAD identification.

a–c, We show an example here in SK-N-MC cells. As shown in a, there is a 1.5 Mb deletion event (chr8: 127.88M, +; chr8: 129.37M, -). Since this is a heterozygous deletion, the chromatin interactions in area A3 are between loci (127.28M – 127.88M) and loci (129.37M – 129.97M) on one allele where the deletion happens. On the contrary, chromatin interactions within A1 (or A2) area are from all alleles and therefore, the overall intensities in area A3 are much weaker than A1 or A2. We need to normalize the signals so that we can predict neo-TADs and neo-Loops. a, Hi-C matrix without normalizing cancer-specific allele. The neo-TAD is undetectable as shown by the directionality index (DI) track. b, Reconstructed Hi-C map after cancer-specific allele normalization. Now the neo-TAD becomes detectable, and the number of detected neo-loops is also enhanced. c, Linear regression of the local distance averaged contact frequencies and the global distance averaged contact frequencies in different regions (A1, A2 and A3) of the Hi-C map in a. d,e, Detection of neo-TADs in 50 cancer cell lines or patient samples. d, The number of neo-TADs detected in each sample. e, Aggregate analysis of neo-TADs and distribution of breakpoint locations. Hi-C signals were distance-normalized, averaged, and centered at neo-TAD midpoints.

Source data

Extended Data Fig. 6 Comparison of loops predicted by FitHiC2 and NeoLoopFinder in SV regions.

a–c, Neo-loops predicted by NeoLoopFinder and FitHiC2 in LNCaP cells. a, Global view of the Hi-C map of chromosomes 7 and 14 shows that there is an inter-chromosomal translocation (marked by arrow). b, High-resolution Hi-C map showing contact frequencies between ETV1 and its hijacked enhancers on chr14. Blue circles indicate neo-loops identified by NeoLoopFinder. c, Significant interactions (blue circles) identified by FitHiC2. d–g, For this analysis, we used all the loops from ten cancer cell lines with DNase-Seq data available in ENCODE data portal (MCF7, A549, LNCaP, T47D, HL-60, KBM7, RPMI 8226, SK-N-MC, SW480 and K562). d, Upset plot of chromatin loops detected by NeoLoopFinder and FitHiC2 within SV regions. e, Chromatin accessibility around anchors of NeoLoopFinder-unique loops and FitHiC2-unique loops. f-g, Aggregate Peak Analysis (APA) for NeoLoopFinder-unique loops (f) or FitHiC2-unique loops (g).

Source data

Extended Data Fig. 7 Reconstruction of complex SVs in 50 cancer samples.

a, Number of assembled complex SVs in each sample. Only samples with complex SVs detected are shown. b, Size distributions of complex SV fragments, simple inversions and simple deletions/duplications. Data were merged from eight cancer cell lines: A549, Caki2, K562, LNCaP, NCI-H460, PANC-1, SK-N-MC, and T47D. For the boxplot, the box represents the interquartile range (IQR, Q3-Q1), the white dot represents the median, the upper whisker extends to the last datum less than Q3 + 1.5×IQR, and the lower whisker extends to the first datum greater than Q1-1.5×IQR. c, Percentage of transition between different SV types in a complex SV assembly, averaged from our analysis in 50 cancer cell lines/tissues.

Source data

Extended Data Fig. 8 Examples of neo-loop-involved genes.

a, (left) Reconstructed Hi-C map for the translocation (chr19: 6.62M, -; chr11: 118.91M, -) in a T-ALL (T-cell Acute Lymphoblastic Leukemia) patient. Blue circles indicate the predicted neo-loops. (middle) Kaplan-Meier survival analysis of TCGA leukemia patients with high (top 35%) and low (bottom 35%) expressions of the UPK2 gene. The means and the 95% confidential intervals are shown for each group of patients. The p-value was calculated from two-sided log-rank test. (right) Log2 converted copy number ratios of the UPK2 gene with high (top 35%) or low (bottom 35%) expressions in patients. The horizontal bar in each violin plot represents the median. Genes with higher expression levels do not have higher copies. The p-value was computed from two-sided Mann-Whitney U test. b-c, Similar examples in a gastric cancer patient T2000877. The Kaplan-Meier survival and copy-number analysis were performed using TCGA stomach adenocarcinoma patient data.

Source data

Extended Data Fig. 9 Genes with hijacked enhancers are up-regulated in cancer.

Quantile normalized gene expression signals (Transcripts Per Kilobase Million, TPM) are compared between cancer cells and corresponding normal cells. Here the genes with hijacked enhancers are defined as expressed neo-loop-involved genes (TPM > 1 in cancer cells) when there are at least one DNase-Seq peaks in the other anchor of the neo-loop. Each dot represents an individual gene. The p-values were computed using the two-sided Wilcoxon signed-rank test.

Source data

Extended Data Fig. 10 Application of the NeoLoopFinder framework to developmental diseases with genomic rearrangements.

a, CHi-C map reconstruction and neo-loop detection for an inversion event (inv1) in the mouse forelimb at embryonic day 11.5 (E11.5). Data were downloaded from Kragesteen BK et al. Nature Genetics 2018. (left) CHi-C map of the wild-type forelimb. The Pitx1 gene shows weak interactions with the Pen enhancer. Blue circles indicate the predicted chromatin loops by Peakachu. (middle) Original CHi-C map of the forelimb that contains a homozygous inversion of a 113-kb fragment containing Pen. (right) Reconstructed CHi-C map for the inversion. Note the neo-loop between Pitx1 and the Pen enhancer was correctly detected by NeoLoopFinder. b, CHi-C map reconstruction and neo-loop detection for a duplication event (Dup-C) in the mouse limb buds at E12.5. Data were downloaded from Franke M et al. Nature 2016. The duplicated region (blue and green arrows) contains both the Sox9 enhancers (marked by H3K27ac peaks) and the Kcnj2 gene. The rightmost panel shows the reconstructed chromatin interaction map near the duplication breakpoints. Yellow circles highlight the Kcnj2-involved neo-loops. There are also two more predicted neo-loops in this region (blue circles). c, CHi-C map reconstruction and neo-loop detection for an inversion event (InvC) in E12.5 limb buds. Data were downloaded from Despang A et al. Nature Genetics 2019. The inverted region (green bar in the middle panel) contains Sox9 enhancers (marked by H3K27ac peaks) and the TAD boundary separating the Sox9 enhancers and the Kcnj2 gene. The rightmost panel shows the reconstructed map for the whole region. The blue circles indicate the detected neo-loops, and yellow circles highlight the Kcnj2-involved neo-loops.

Supplementary information

Supplementary Information

Supplementary Tables 2, 6, 9 and 10.

Reporting Summary

Supplementary Table 1

Details of the 50 cancer Hi-C datasets analyzed in this study.

Supplementary Table 3

List of large SVs detected in each sample.

Supplementary Table 4

Genomic coordinates of the detected neoloops in each sample.

Supplementary Table 5

List of neoloop-involved genes identified in each sample.

Supplementary Table 7

Details of the datasets collected for the validation and downstream analysis, including WGS, ChIP–seq, DNase-seq, RNA-seq and Capture Hi-C datasets in various cell lines or tissues.

Supplementary Table 8

List of the annotated enhancer-hijacking events in 11 cancer cell lines: A549, K562, LNCaP, MCF7, T47D, HepG2, SK-MEL-5, NCI-H460, PANC-1, HT-1080 and C4-2B.

Source data

Source Data Fig. 2

Statistical source data.

Source Data Fig. 3

Statistical source data.

Source Data Fig. 4

Statistical source data.

Source Data Fig. 5

Statistical source data.

Source Data Extended Data Fig. 1

Statistical source data.

Source Data Extended Data Fig. 2

Statistical source data.

Source Data Extended Data Fig. 3

Statistical source data.

Source Data Extended Data Fig. 4

Statistical source data.

Source Data Extended Data Fig. 5

Statistical source data.

Source Data Extended Data Fig. 6

Statistical source data.

Source Data Extended Data Fig. 7

Statistical source data.

Source Data Extended Data Fig. 8

Statistical source data.

Source Data Extended Data Fig. 9

Statistical source data.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Wang, X., Xu, J., Zhang, B. et al. Genome-wide detection of enhancer-hijacking events from chromatin interaction data in rearranged genomes. Nat Methods 18, 661–668 (2021). https://doi.org/10.1038/s41592-021-01164-w

Download citation

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing